-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GR-35746] Decrease aligned chunk size to 512 KB. #6115
Conversation
Did you measure any effect on GC activitiy? Aligned chunks are used for TLABs, right? They are not resized, therefore reducing the size of a TLAB would impact the GC activity. |
We did a good amount of benchmarking for this and overall, the improvements in memory usage and image size outweigh regressions in individual cases. Aligned chunks are indeed used for TLABs and not resized, but decreasing their size does not automatically mean more garbage collections will happen. It means the memory is divided into more chunks, and that threads might need to get a new TLAB more often. It also means that threads with a low allocation rate hoard less memory in their TLABs, and that we might do fewer GCs because we decide that based on the size of all allocated chunks, not the bytes allocated in them. Smaller aligned chunks also mean less waste from chunk alignment in images (particularly small ones), and that pinning an object keeps fewer other objects alive (because a pinned object currently keeps the entire chunk alive). This overall change in behavior in turn affects the decisions of the GC policy about the size of spaces and whether to do incremental or full collections, which also has a major impact. |
thank you for your explanation. There insightful.
How did you measure it? I have a very large applicaiton running on aarch64 and could do some tests as well. Our allocation rate is quite high.
I was reading https://shipilev.net/jvm/anatomy-quarks/4-tlab-allocation/ and might have conclude something wrong. If a allocation can not be done in TLAB, will it be done in a new TLAB or in the heap? On OpenJDK I can remember slow path allocations in sampled stacks. Therefore, bigger TLABs would increase the throughput, because slow allocations would happens less, at least on OpenJDK.
If I were to change the aligned chunk size to 512k, would it behave just like this branch, or do I need to test with this branch? Thank you for your time and insight. |
We have CI infrastructure which uses
That article specifically describes HotSpot and the Epsilon GC. Native Image always allocates small objects in a TLAB (aligned chunk) and large arrays that exceed a certain threshold in unaligned chunks (with near-exact size). The threshold is 1/8 of the aligned chunk size and has therefore become smaller as part of this change (128K -> 64K), but a larger threshold did not fare better. When a large-ish object that is still below the threshold does not fit into the current TLAB, we retire it and get a new TLAB to allocate the object, which, in the very worst case, leads to nearly 12.5% waste. Indeed it would be preferable to dynamically size TLABs instead of using fixed-size aligned chunks for every workload.
Yes, pinned objects can cause temporary leaks, but are commonly used just briefly to enable native code to access memory on the Java heap. See |
4bf9a65
to
81235de
Compare
I did some testing on aarch64 with 512k and 2MiB AlignedHeapChunkSize and it seems in this case allocation heavy workloads are not impacted, at least not in this case. Tested on the latest 23.1 dev version (aarch64) 512k
2MiB
Note: the GC profiler seems to not work properly on native-image with JMH. |
Thanks for sharing @SergejIsbrecht , so how does that compare to the former default of 1M? |
No description provided.