TSRM: make CG, EG, SCNG and AG compile-time offsets#22287
Conversation
saves an offset load every time they're accessed
| TSRM_ALIGNED_SIZE(sizeof(zend_compiler_globals)) + | ||
| TSRM_ALIGNED_SIZE(sizeof(zend_executor_globals)) + | ||
| TSRM_ALIGNED_SIZE(sizeof(zend_php_scanner_globals)) + | ||
| TSRM_ALIGNED_SIZE(zend_mm_globals_size())); // AG size, exposed through function call |
There was a problem hiding this comment.
really unhappy about this, same issue as below:
| #if defined(__cplusplus) || defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 202311L) | ||
| # define TSRM_STATIC_ASSERT(c, m) static_assert((c), m) | ||
| #elif defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 201112L) /* C11 */ | ||
| # define TSRM_STATIC_ASSERT(c, m) _Static_assert((c), m) | ||
| #else | ||
| # define TSRM_STATIC_ASSERT(c, m) | ||
| #endif | ||
| TSRM_STATIC_ASSERT(TSRM_ALIGNED_SIZE(sizeof(tsrm_tls_entry)) == TSRM_FAST_RESERVED_BASE, | ||
| "TSRM_FAST_RESERVED_BASE must equal the tsrm_tls_entry header size"); |
There was a problem hiding this comment.
so this is a two-fold problem:
- duplicated ZEND_STATIC_ASSERT copy - tsrm can't use zend_portability, because zend_portability depends on tsrm.h
- tsrm_ls_entry size is only known in this file, not in tsrm.h consumers.
- AG size can be consumed at runtime, so it gets the last slot, but it's structurally the same problem
In C++ this would be a simple consteval function in the header, but I have no idea how to make it work in C. So I kept the definition private, but force it to the specific size it has.
There was a problem hiding this comment.
Possibly, the cache could be set to the end of tsrm_tls_entry so that code outside of TSRM.c do not have to be aware of the size of tsrm_tls_entry. Alternatively, set _tsrm_ls_cache to the start of tsrm_tls_entry, but allocate the front elements before it and use negative offsets.
There was a problem hiding this comment.
I don't think moving it to the end makes sense, it would just move the implementation size leak from the fast path to the slow TSRMG_BULK path.
The alternative might work, I'm not sure yet. Let me explore it.
There was a problem hiding this comment.
It's possible to get rid of the size leak, but the concept is harder for me to wrap my head around. I'll push the commit, but I'm not sure it's a good idea.
There was a problem hiding this comment.
I don't think moving it to the end makes sense, it would just move the implementation size leak from the fast path to the slow TSRMG_BULK path.
Indeed
It's possible to get rid of the size leak, but the concept is harder for me to wrap my head around. I'll push the commit, but I'm not sure it's a good idea.
This seems fine to me, but I let you chose what you prefer.
| #define ZEND_CGG_DIAGNOSTIC_IGNORED_END ZEND_DIAGNOSTIC_IGNORED_END | ||
|
|
||
| #if defined(__cplusplus) | ||
| #if defined(__cplusplus) || defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 202311L) |
There was a problem hiding this comment.
technically unrelated, but I didn't want to copy an incomplete definition into tsrm.c
There was a problem hiding this comment.
I'd be in favour, getting php-src and extensions to compile in extremely old environments is a pain anyway (I do rhel-7 builds for frankenphp). Should I replace this then?
There was a problem hiding this comment.
Should I replace this then?
Not yet, probably. See: https://news-web.php.net/php.internals/130714
| { | ||
| bool ret = tsrm_startup(expected_threads, 1, 0, NULL); | ||
| php_reserve_tsrm_memory(); | ||
| /* Must reserve exactly the prefix laid out by ZEND_*_OFFSET (zend_globals.h). */ |
There was a problem hiding this comment.
Nit: This comment is not necessarily true? Only the total size is critical here?
| TSRM_API ts_rsrc_id ts_allocate_fast_id_at(ts_rsrc_id *rsrc_id, size_t *offset, ptrdiff_t fixed_offset, size_t size, ts_allocate_ctor ctor, ts_allocate_dtor dtor) | ||
| {/*{{{*/ | ||
| tsrm_fixed_offset = fixed_offset; | ||
| return ts_allocate_fast_id(rsrc_id, offset, size, ctor, dtor); |
There was a problem hiding this comment.
This could be inverted to avoid indirectly passing the fixed offset via a global: ts_allocate_fast_id() should compute the offset and then call ts_allocate_fast_id_at().
| #if defined(__cplusplus) || defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 202311L) | ||
| # define TSRM_STATIC_ASSERT(c, m) static_assert((c), m) | ||
| #elif defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 201112L) /* C11 */ | ||
| # define TSRM_STATIC_ASSERT(c, m) _Static_assert((c), m) | ||
| #else | ||
| # define TSRM_STATIC_ASSERT(c, m) | ||
| #endif | ||
| TSRM_STATIC_ASSERT(TSRM_ALIGNED_SIZE(sizeof(tsrm_tls_entry)) == TSRM_FAST_RESERVED_BASE, | ||
| "TSRM_FAST_RESERVED_BASE must equal the tsrm_tls_entry header size"); |
There was a problem hiding this comment.
I don't think moving it to the end makes sense, it would just move the implementation size leak from the fast path to the slow TSRMG_BULK path.
Indeed
It's possible to get rid of the size leak, but the concept is harder for me to wrap my head around. I'll push the commit, but I'm not sure it's a good idea.
This seems fine to me, but I let you chose what you prefer.
…ocating version of ts_allocate_fast_id_at)
saves an offset load every time they're accessed
fifth split from #22231
a bit of a different approach (keep everything on the existing heap allocations, just reordered) so we don't need to work around the apache module tls surplus problem and don't need to fall back to global-exec
it's not the entire gain of the other PR, but still something. measurements without the local-exec or -mtls-size change (which will benefit on top, together they restore around half the ZTS penalty over NTS):