gh-145685: per-type method cache implementation#150160
Conversation
|
🤖 New build scheduled with the buildbot fleet by @kumaraditya303 for commit 728021c 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F150160%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
colesbury
left a comment
There was a problem hiding this comment.
A few more comments below (mostly found by Claude). Also:
- Have you run single-threaded benchmarks to ensure that we haven't introduced any regressions with this?
- Does the
type_lookupbenchmark scale as well as the other scaling benchmarks? In other words, should we consider this a partial or full fix for the type cache scaling issues?
Documentation build overview
47 files changed ·
|
| } | ||
| } | ||
| new_cache->version_tag = type->tp_version_tag; | ||
| cache_set(type, new_cache); |
There was a problem hiding this comment.
The racy cache_set here can mask that two or more threads have concurrently resized and set the new cache, possibly leaking one or more allocations.
Personally, I think resizing here should not be lock-free. I don't see any good reason why it should.
Also, taking a quick glance at the tests, I don't see any that exercise the resizing path.
There was a problem hiding this comment.
How? cache_resize is called under TYPE_LOCK.
There was a problem hiding this comment.
FYI, the atomic in cache_set is because there can be concurrent readers (but not other writers).
There was a problem hiding this comment.
Ah, sorry I missed that comment. I don't suppose an assert(type_lock is held) would be easy to add?
There was a problem hiding this comment.
Also, the comment at the top "Lock-free per type method cache implementation" is misleading in light of this.
Benchmark results show that it is 1% slower on gil enabled build https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20260616-3.16.0a0-4fbec93/bm-20260616-vultr-x86_64-kumaraditya303-mrocache-3.16.0a0-4fbec93-vs-base.md I think it it is within the error of benchmark machines and not significant. |
|
I'm very wary of dismissing 1% slower as "within the noise" without some other evidence that there is no slowdown for a change like this. It also looks like you've missed several of the stats. How does this change memory use? |
|
Results from running locally on my machine (via Claude, GIL-enabled, clang-20, optimized, lto=thin) look good: https://gist.github.com/colesbury/1e5c1eb4ccc1e626696b568d18d8bbc4. Cache misses drop substantially, but the overall effect is small (possibly within the noise margin) because:
A few more suggestions:
|
I've added stats for resizes and invalidations and removed collisions. |
_PyType_Lookup/_PyType_LookupStackRefAndVersion#145685