-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve perf of omrmem_allocate_memory32 #7190
Comments
@ThanHenderson Opened this issue to document our earlier discussion. All future updates can also be posted here. fyi @tajila |
Here's an update. (tldr; This doesn't appear to be an issue currently. And I think this can be closed.) I'll answer some of the questions from above out-of-order:
I'vn't been able to observe this running hundreds of iterations of JDK21 builds on
Though our sub-4GB allocator isn't as efficient as just a call to
I've embedded (without
Since I'vn't been able to reproduce the Skynet regression, I also reached for the ConclusionIn light of the above, I conclude that there shouldn't be any action taken since there is no observable problem with the current implementation of the sub-4GB allocator (at least on our tested workloads). And without the positive performance data to back up the embedding, the cost of supporting, shipping, and maintaining a custom third-party allocator is not ameliorated. That being said, I will document -- in a comment below -- how I embedded AsideI was thinking we could also explore using [1] https://github.com/jemalloc/jemalloc |
For future interest/reference, here are the patches to embed a third-party allocator (well It is a little funky to get the To keep things internal to the project, I used
Regardless of the allocator used, OMR just expects that Everything should be smooth sailing after that. |
Thanks for your analysis @ThanHenderson So it sounds like swapping the our current allocator for another one is not the answer. However, perhaps there is a way to tune our allocator to be more optimal for vthread workloads? Have you looked into toggling the initial region size of the heaps? the amount that is initially committed? Perhaps, there is a size that is more optimal for vthread heavy workloads. |
@tajila I have not, but it seems like there is an opportunity for a limit study here. Other than region size, are there any other parameters that you think I should look into? |
There is also Overall, I think we should consider increasing the region size (static increase), but also look at a dynamic policy where the region size grows as we allocate more (we do something similar for class memory segments). |
@tajila is there a collection of vthread heavy workloads somewhere? I've only been running the Skynet stress benchmark, and am not observing any difference from tweaking the |
You can use helidon nima https://github.com/tomas-langer/helidon-nima-example. I would configure jmeter to hit the endpoints (you can toggle the amount of load). The relevant endpoint in this case would be I noticed pretty significant deltas between compressedrefs and non-compressedrefs with this example |
I am noticing differences for I was testing with spawning a new server I iterated over
On this workload, a static increase does show some benefit. It would be beneficial to test this on a set of workloads though that have observable improvements that may motivate a dynamic solution. |
@dmitripivkine @amicic |
This is more VM than GC question. Suballocator area located below 4G bar is used for artifacts must be 32-bit for Compressed refs (j9class, j9vmthread etc.). The only GC aspect is not prevent most performant 0-shift run where entire heap is located below 4G bar as well. So, by taking more memory for Suballocator initially you can compromise 0-shilt runs. |
A larger initial size reduces the number of subsequent sub-4G allocations which can be costly due to how -- a brute force looping strategy -- sub-4G memory is reserved. |
You comment is not very practical I afraid. The question where bottle neck is. We looked to the code with @tajila and seems the logic is:
Would you please try to change this hardcoded value of 8m to larger one (50m or more) and try to measure performance again? |
If increasing Note: On AIX, |
There is another possibility you can try (please do it separate from increasing 8m size for clean result). In the same line of the code I mentioned before, last parameter in |
I added an Baseline with The following table has the results from the
|
There is also this reference here: https://github.com/eclipse/omr/blob/b5ef5eda4680b6b5cf0c2f954362f9f47353ce04/port/common/omrmem32helpers.c#L50 Is |
@pshipton might have a local backup. |
See https://ibm.ent.box.com/file/1073877430132 for the the VMDesign DB html. I've copied some of 1761 below. Introduction Currently, the J9Class referenced in the object header is allocated with the port library call j9mem_allocate_memory. On a 64-bit platform, this call may return memory outside of the low 4GB region, which is addressable only by the least significant 32 bits. The goal is to have J9Class reside in the low 4GB so that we can compress class pointer in the object header. High Level Design Currently, we have port library calls j9mem_allocate_memory32 and j9mem_free_memory32 that does what we want, but in a pretty inefficient way. The function j9mem_allocate_memory32 uses a brute-force algorithm that linearly scans the 4GB region, attempts to allocate memory at locations starting from the beginning, until it either gets a successful allocation or has reached the end, indicating we are out of virtual address in the low 4GB. Obviously, they are not suitable for frequent use, as in the case of RAM class segment allocation. They will be rewritten to use a sub-allocator mechanism through the port library heap functions (implemented in Define the low-32 heap structure as
where J9HeapWrapperis defined as:
J9SubAllocateHeap32will be stored in J9PortPlatformGlobals and indef’ed by defined(J9VM_ENV_DATA64) && !defined(J9ZOS390). Port library startup. We only initialize fields in J9SubAllocateHeap32,no backing storage is allocated (firstHeapWrapper = NULL). allocate memory from the heap Note: for z/OS, the OS already provides API to allocate in low 4GB region (malloc31and free). The following design for allocate_memory32 and free_memory32 is for non-z/OS platforms. There are two cases where an allocation cannot be satisfied: After traversing the list of heaps, we cannot find a free block large enough. We then allocate a new regular-sized heap and the requested block will be sub-allocated from there. The newly allocated heap is prepended at the beginning of heap list, and toalSize, occupiedSize and firstHeapWrapperwill be refreshed. The requested size is larger than the heap size. In this case, we just allocate the requested size using the existing brute-force algorithm described previously and don't bother initializing it as a heap. Its J9HeapWrapper struct will have heap field set to NULL, indicating it's not a valid J9Heap and therefore will be skipped when walking the list. free memory from the heap by calling void j9mem_free_memory32(struct J9PortLibrary *portLibrary, void *memPointer) When freeing a block, we first determine its containing heap by traversing the linked list of heaps. We assume that there would normally be only a few heaps along the chain, so this work should not introduce much overhead. port library shutdown We iterate through the list of heaps and free them by calling vmem_free_memory on each one. Risks RAS Considerations |
There is a link to design 1754 which I didn't copy here. Let me know if you have problems accessing the VMDesign DB html. |
I added an Here are the results:
Using Like Dmitri mentioned, it is only currently implemented on Linux, so we would need to update code in the If enabled by default, we could maintain the Separately, I think it would be good to also keep the cmdline option that I have for controlling |
1. On linux, use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. On linux, adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds sanity.functional tests Addresses: eclipse-omr/omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. On linux, use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. On linux, adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds an omrport_copy_suballocator_globals procedure that correctly initializes the PPG suballoctor globals for memCheckPortLib when -Xcheck is provided 5. Updates related documentation Addresses: eclipse-omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. On linux, use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. On linux, adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds sanity.functional tests Addresses: eclipse-omr/omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds sanity.functional tests Addresses: eclipse-omr/omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds an omrport_copy_suballocator_globals procedure that correctly initializes the PPG suballoctor globals for memCheckPortLib when -Xcheck is provided 5. Updates related documentation Addresses: eclipse-omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds an omrport_copy_suballocator_globals procedure that correctly initializes the PPG suballoctor globals for memCheckPortLib when -Xcheck is provided 5. Updates related documentation Addresses: eclipse-omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds an omrport_copy_suballocator_globals procedure that correctly initializes the PPG suballoctor globals for memCheckPortLib when -Xcheck is provided 5. Updates related documentation Addresses: eclipse-omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds sanity.functional tests Addresses: eclipse-omr/omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds an omrport_copy_suballocator_globals procedure that correctly initializes the PPG suballoctor globals for memCheckPortLib when -Xcheck is provided 5. Updates related documentation Addresses: eclipse-omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds sanity.functional tests Addresses: eclipse-omr/omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds sanity.functional tests Addresses: eclipse-omr/omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds sanity.functional tests Addresses: eclipse-omr/omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds an omrport_copy_suballocator_globals procedure that correctly initializes the PPG suballoctor globals for memCheckPortLib when -Xcheck is provided 5. Updates related documentation Addresses: eclipse-omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds an omrport_copy_suballocator_globals procedure that correctly initializes the PPG suballoctor globals for memCheckPortLib when -Xcheck is provided 5. Updates related documentation Addresses: eclipse-omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds sanity.functional tests Addresses: eclipse-omr/omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds sanity.functional tests Addresses: eclipse-omr/omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds an omrport_copy_suballocator_globals procedure that correctly initializes the PPG suballoctor globals for memCheckPortLib when -Xcheck is provided 5. Updates related documentation Addresses: eclipse-omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds an omrport_copy_suballocator_globals procedure that correctly initializes the PPG suballoctor globals for memCheckPortLib when -Xcheck is provided 5. Updates related documentation Addresses: eclipse-omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds an omrport_copy_suballocator_globals procedure that correctly initializes the PPG suballoctor globals for memCheckPortLib when -Xcheck is provided 5. Updates related documentation Addresses: eclipse-omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds an omrport_copy_suballocator_globals procedure that correctly initializes the PPG suballoctor globals for memCheckPortLib when -Xcheck is provided 5. Updates related documentation Addresses: eclipse-omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds sanity.functional tests Addresses: eclipse-omr/omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size Addresses: eclipse-omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size Addresses: eclipse-omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds sanity.functional tests Addresses: eclipse-omr/omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds sanity.functional tests Addresses: eclipse-omr/omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
1. Use VMEM_ALLOC_QUICK by default for allocateRegion in allocate_memory32 2. Adds -Xgc:suballocatorQuickAllocDisable option that disables the default VMEM_ALLOC_QUICK 3. Adds -Xgc:suballocatorIncrementSize option that replaces the HEAP_SIZE_BYTES macro and controls the heap increment size 4. Adds sanity.functional tests Addresses: eclipse-omr/omr#7190 Signed-off-by: Nathan Henderson <nathan.henderson@ibm.com>
Problem
omrmem_allocate_memory32
allocates native memory below 4G.While running a Java virtual thread benchmark named Skynet, poor performance is seen in
omrmem_allocate_memory32
as the benchmark exhausts the native memory below 4G. This either results in a time out or an OutOfMemoryError.Potential Issues
Two Potential Solutions
omrmem_allocate_memory32
implementation; orApproach 2 is preferred since there are existing implementations for the memory allocator, which address the above perf issues.
Examples of Existing Memory Allocators
Verify Feasibility of Approach 2
The text was updated successfully, but these errors were encountered: