8252500: ZGC on aarch64: Unable to allocate heap for certain Linux kernel configurations #40

mychris · 2020-09-07T07:27:05Z

The patch introduces a new function to probe for the highest valid bit in the virtual address space for userspace programs on Linux.

I guarded the whole implementation to only probe on Linux, other platforms will remain unaffected. Possibly, it would be nicer to move the probing code into an OS+ARCH specific source file. But since this is only a single function, I thought it would be better to put it right next to the caller and guard it with an #ifdef LINUX.

The probing mechanism uses a combination of msync + mmap, to first check if the address is valid using msync (if msync succeeds, the address was valid). If msync fails, mmap is used to check if msync failed because the memory wasn't mapped, or if it failed because the address is invalid. Due to some undefined behavior (documented in the msync man page), I also use a single mmap at the end, if the msync approach failed before.

I tested msync with different combinations of mappings, and also with sbrk, and it always succeeded, or failed with ENOMEM. I never got back any other error code.

The specified minimum value has been chosen "randomly". The JVM terminates (unable to allocate heap), if this minimum value is smaller than the requested Java Heap size, so it might be better to make the minimum dependent on the MaxHeapSize and not a compile time constant? I didn't want to make the minimum too big, since for aarch64 on Linux, the documented minimum would be 38 (see [1]).

I avoided MAP_FIXED_NOREPLACE, because according to the man page, it has been added in Linux 4.17. There are still plenty of stable kernel versions around which will not have that feature, which means we need to implement a workaround for it. Some of my test devices also have a kernel version lower than that.

I executed the HotSpot tier1 JTreg tests on two different aarch64 devices. One with 4KB pages and 3 page levels and the other with 4KB pages and 4 page levels. Tests passed on both devices.

[1] https://www.kernel.org/doc/Documentation/arm64/memory.txt

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8252500: ZGC on aarch64: Unable to allocate heap for certain Linux kernel configurations

Reviewers

Stefan Karlsson (@stefank - Reviewer) ⚠️ Review applies to 70efdd8
Erik Österlund (@fisk - Reviewer) ⚠️ Review applies to 70efdd8
Per Lidén (@pliden - Reviewer)

Download

$ git fetch https://git.openjdk.java.net/jdk pull/40/head:pull/40
$ git checkout pull/40

…rnel configurations

bridgekeeper · 2020-09-07T07:29:05Z

👋 Welcome back cgo! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2020-09-07T07:29:51Z

@mychris The following label will be automatically applied to this pull request: hotspot.

When this pull request is ready to be reviewed, an RFR email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label (add|remove) "label" command.

mlbridge · 2020-09-07T07:33:36Z

Webrevs

mychris · 2020-09-07T07:34:14Z

/label add hotspot-gc

openjdk · 2020-09-07T07:34:29Z

@mychris
The hotspot-gc label was successfully added.

mychris · 2020-09-07T07:34:54Z

/label remove hotspot

openjdk · 2020-09-07T07:34:58Z

@mychris
The hotspot label was successfully removed.

mychris · 2020-09-07T07:43:33Z

/cc hotspot-gc

openjdk · 2020-09-07T07:43:53Z

@mychris The hotspot-gc label was already applied.

stefank · 2020-09-07T08:46:10Z

src/hotspot/cpu/aarch64/gc/z/zGlobals_aarch64.cpp

+#ifdef ASSERT
+      fatal("Received %s while probing the address space for the highest valid bit", os::errno_name(errno));
+#else // ASSERT
+      log_warning(gc)("Received %s while probing the address space for the highest valid bit", os::errno_name(errno));


If we change this line to log_warning_p, then we get this error message in our hs_err files.

stefank · 2020-09-07T08:47:06Z

src/hotspot/cpu/aarch64/gc/z/zGlobals_aarch64.cpp

+// Default value if probing is not implemented for a certain platform: 128TB
+#define DEFAULT_MAX_ADDRESS_BIT 47
+// Minimum value returned, if probing fails: 64GB
+#define MINIMUM_MAX_ADDRESS_BIT 36


I think this could/should be typed static constants instead of defines.

stefank · 2020-09-07T09:11:28Z

src/hotspot/cpu/aarch64/gc/z/zGlobals_aarch64.cpp

+  for (int i = DEFAULT_MAX_ADDRESS_BIT; i > MINIMUM_MAX_ADDRESS_BIT && max_address_bit == 0; --i) {
+    const uintptr_t base_addr = ((uintptr_t) 1U) << i;
+    if (msync((void*)base_addr, page_size, MS_ASYNC) == 0) {
+      // msync suceeded, the address is valid, and maybe even already mapped.
+      max_address_bit = i;
+      break;
+    }


Is there a one-off error here? Taking i == 47 as an example. This means that you test the base_address == '10000000 00000000 00000000 00000000 00000000 00000000' (in bits). That is, you test that the address range with the 48th bit set (128T-256T) is usable. However, when 47 then is returned to the caller, it interprets it as if the 64T-128T range is usable.

Good catch. I verified your assumption by doing the following:
I returned the result of probe_valid_max_address_bit() + 1 from ZPlatformAddressOffsetBits(). I then executed a VM on a platform with only 3 pagetable levels, which has less than DEFAULT_MAX_ADDRESS_BIT bits available in the address space. The heap allocation succeeded, which means the current patch has a one off error.

However, I would fix this in ZPlatformAddressOffsetBits(). because I interpret the result of probe_valid_max_address_bit() as to be the highest bit index of an address which can be allocated. Which means that mmap the result of 1 << probe_valid_max_address_bit() should succeed (if the page is not already mapped). I think ZPlatformAddressOffsetBits() should add 1 to the result of probe_valid_max_address_bit().

Sorry, the explanation of my verification was wrong. I did the following:

size_t ZPlatformAddressOffsetBits() {
const static size_t valid_max_address_bit = probe_valid_max_address_bit();
const size_t max_address_offset_bits = valid_max_address_bit - 3;
return max_address_offset_bits + 1;
}

and allocation of the Java heap succeeded.

stefank · 2020-09-07T09:13:38Z

src/hotspot/cpu/aarch64/gc/z/zGlobals_aarch64.cpp

+    if ((uintptr_t) result_addr == base_addr) {
+      // address is valid
+      max_address_bit = i;
+    }
+    if (result_addr != MAP_FAILED) {
+      munmap(result_addr, page_size);
+    }
+  }


If you swap the order of these if-statements you could add a break after 172.

This way the exit of the loop would look like the one you have after msync. You would also be able to get rid of the ' && max_address_bit == 0' check in the for statement.

stefank · 2020-09-07T09:29:47Z

src/hotspot/cpu/aarch64/gc/z/zGlobals_aarch64.cpp

+    uintptr_t high_addr = ((uintptr_t) 1U) << DEFAULT_MAX_ADDRESS_BIT;
+    void* result_addr = mmap((void*) high_addr, page_size, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
+    if (result_addr != MAP_FAILED) {
+      max_address_bit = BitsPerSize_t - count_leading_zeros((size_t) result_addr) - 1;


It's not obvious to me that the '- 1' is correct here.

Let's assume the probing failed (for whatever reason) and we are actually able to allocate DEFAULT_MAX_ADDRESS_BIT:
00000000_00000000_10000000_00000000_00000000_00000000_00000000_00000000 = 1U << DEFAULT_MAX_ADDRESS_BIT
Mmap succeeds and returns exactly that address. Now the line would evaluate to the following:
64 - 16 - 1 = 47 <=> DEFAULT_MAX_ADDRESS_BIT

I think this looks OK now that you moved the +1 to ZPlatformAddressOffsetBits().

stefank · 2020-09-07T09:32:13Z

src/hotspot/cpu/aarch64/gc/z/zGlobals_aarch64.cpp

+  size_t max_address_bit = 0;
+  const size_t page_size = os::vm_page_size();
+  for (int i = DEFAULT_MAX_ADDRESS_BIT; i > MINIMUM_MAX_ADDRESS_BIT && max_address_bit == 0; --i) {


There's a bit of a mismatch between the type of 'i' and 'max_address_bit'. See also 'BitsPerSize_t - count_leading_zeros((size_t) result_addr) - 1;'. I think that changing max_address_bit to be an int, and getting rid of the size_t's here will fix that.

I did it the other way around, and made 'i' of type size_t, since 'i' is the only int variable left and all the others are of type size_t.

Changed the log_warning to a log_warning_p, so it is reported in hs_err file. mummap the result of mmap directly after the mmap call to be able to break out of the loop, instead of check the max_address_bit == 0 precondition. Fixed an 'off by one' mistake.

mychris · 2020-09-07T10:15:06Z

Thanks for your review, Stefan. I integrated your suggestions and will execute JTreg again, before integration, since it takes very long on my test devices and we might need to make more adjustments to this PR.

stefank · 2020-09-07T10:29:31Z

src/hotspot/cpu/aarch64/gc/z/zGlobals_aarch64.cpp

 #include "gc/z/zGlobals.hpp"
 #include "runtime/globals.hpp"
 #include "utilities/globalDefinitions.hpp"
 #include "utilities/powerOfTwo.hpp"

+#include <sys/mman.h>


Should this also be guarded by #ifdef LINUX?

stefank

Two small nits, then I think this is good from my point-of-view. It would be good to get an extra pair of eyes to go through and verify that the bit calculation parts are correct.

stefank · 2020-09-07T10:48:08Z

src/hotspot/cpu/aarch64/gc/z/zGlobals_aarch64.cpp

+      munmap(result_addr, page_size);
+    }
+  }
+  log_info_p(gc, init)("Probing address space for the highest valid bit: %zu", max_address_bit);


I mentioned this during our pre-review, but I see that it wasn't updated: %zu should be using SIZE_FORMAT.

stefank · 2020-09-07T10:48:46Z

src/hotspot/cpu/aarch64/gc/z/zGlobals_aarch64.cpp

+#ifdef ASSERT
+      fatal("Received %s while probing the address space for the highest valid bit", os::errno_name(errno));
+#else // ASSERT
+      log_warning_p(gc)("Received %s while probing the address space for the highest valid bit", os::errno_name(errno));


Should explicitly include runtime/os.hpp.

… of %zu.

mychris · 2020-09-07T11:05:17Z

Thanks for your in depth review Stefan. Do you know if I can mark a PR to required more than one reviewer, or do I just wait?

stefank · 2020-09-07T11:44:57Z

I couldn't find a way to add Reviewers. Maybe pinging them? @pliden @fisk @stooart-mon

fisk · 2020-09-07T12:11:46Z

src/hotspot/cpu/aarch64/gc/z/zGlobals_aarch64.cpp


 static size_t probe_valid_max_address_bit() {
 #ifdef LINUX
  size_t max_address_bit = 0;
  const size_t page_size = os::vm_page_size();
-  for (int i = DEFAULT_MAX_ADDRESS_BIT; i > MINIMUM_MAX_ADDRESS_BIT && max_address_bit == 0; --i) {
+  for (size_t i = DEFAULT_MAX_ADDRESS_BIT; i > MINIMUM_MAX_ADDRESS_BIT; --i) {


Should the loop condition be MINIMUM_MAX_ADDRESS_BIT inclusive instead of exclusive? Seems weird that it is exclusive, when we later check if max_address_bit < MINIMUM_MAX_ADDRESS_BIT return MINIMUM_MAX_ADDRESS_BIT, which seems like exactly what will happen if the loop would have advanced to MINIMUM_MAX_ADDRESS_BIT (and then finish the loop).

Not sure how to comment a few lines below where you conditionally return MINIMUM_MAX_ADDRESS_BIT in this new cool GUI. Anyway, you could use MIN2 to return MIN2(MINIMUM_MAX_ADDRESS_BIT, max_address_bit) to make the intention more clear.

We are not processing MINIMUM_MAX_ADDRESS_BIT in the loop, because this is the absolute minimum that should be returned. So, regardless of the probing result, we are going to return that value. Therefor, it is a waste of time to actually perform the probe.

The check max_address_bit < MINIMUM_MAX_ADDRESS_BIT is done, because after the probing (if it failed and we didn't find any valid page), right after the for loop, we do a single mmap and use the returned address to calculate the max_address_bit. So max_address_bit could then be lower than MINIMUM_MAX_ADDRESS_BIT.

For the return part, you are right, we could simplify it by using MAX2, since we are interested in the highest bit, and MINIMUM_MAX_ADDRESS_BIT is the absolute minimum we want to return.

I will push an update for the return part.

I see. Incremental looks good.

…if to make the intention more clear.

openjdk · 2020-09-07T13:26:38Z

@mychris This change now passes all automated pre-integration checks. In addition to the automated checks, the change must also fulfill all project specific requirements

After integration, the commit message will be:

8252500: ZGC on aarch64: Unable to allocate heap for certain Linux kernel configurations

Reviewed-by: stefank, eosterlund, pliden

If you would like to add a summary, use the /summary command.
To credit additional contributors, use the /contributor command.
To add additional solved issues, use the /issue command.

Since the source branch of this PR was last updated there have been 12 commits pushed to the master branch:

5dd1ead: 8252767: URLConnection.setRequestProperty throws IllegalAccessError
2cceeed: 8166554: Avoid compilation blocking in OverloadCompileQueueTest.java
188b0bc: 8252868: Clean up unused function from G1MMUTracker
891886b: 8252887: Zero VM is broken after JDK-8252661
7686e87: 8250968: Symlinks attributes not preserved when using jarsigner on zip files
8d6d43c: 8251193: bin/idea.sh is generating wrong folder definitions for JVMCI modules
70d5cac: 8251152: ARM32: jtreg c2 Test8202414 test crash
e0d5b5f: 8252627: Make it safe for JFR thread to read threadObj
e29c3f6: 8252661: Change SafepointMechanism terminology to talk less about "blocking"
e0c8d44: 8252844: Update check configuration to Skara format
... and 2 more: https://git.openjdk.java.net/jdk/compare/5f76deb2e064ca8a48fb8a638c23aad34bf27f9c...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid automatic rebasing, please merge master into your branch, and then specify the current head hash when integrating, like this: /integrate 5dd1eaded7c0aaea199df462b3e2e3b326e1d3b7.

As you do not have Committer status in this project, an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@stefank, @fisk, @pliden) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

pliden · 2020-09-08T07:37:43Z

src/hotspot/cpu/aarch64/gc/z/zGlobals_aarch64.cpp

+      fatal("Received %s while probing the address space for the highest valid bit", os::errno_name(errno));
+#else // ASSERT
+      log_warning_p(gc)("Received %s while probing the address space for the highest valid bit", os::errno_name(errno));


Please turn the two %s into '%s', otherwise the output will look strange.

pliden · 2020-09-08T07:38:43Z

src/hotspot/cpu/aarch64/gc/z/zGlobals_aarch64.cpp

+    }
+    // Since msync failed with ENOMEM, the page might not be mapped.
+    // Try to map it, to see if the address is valid.
+    void* result_addr = mmap((void*) base_addr, page_size, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);


Please make this "void* const result_addr = ...".

pliden · 2020-09-08T07:39:08Z

src/hotspot/cpu/aarch64/gc/z/zGlobals_aarch64.cpp

+    uintptr_t high_addr = ((uintptr_t) 1U) << DEFAULT_MAX_ADDRESS_BIT;
+    void* result_addr = mmap((void*) high_addr, page_size, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);


Same here, please make these two locals const.

pliden

Looks good!

mychris · 2020-09-08T09:16:28Z

I executed JTreg again last night with both kernel configurations. The last commit was obviously not included, but I think it is trivial enough to not require another test run.
Thanks for all your input.

mychris · 2020-09-08T09:16:33Z

/integrate

openjdk · 2020-09-08T09:17:11Z

@mychris
Your change (at version a89cdcf) is now ready to be sponsored by a Committer.

pliden · 2020-09-08T09:34:03Z

/sponsor

openjdk · 2020-09-08T09:34:54Z

@pliden @mychris Since your change was applied there have been 12 commits pushed to the master branch:

5dd1ead: 8252767: URLConnection.setRequestProperty throws IllegalAccessError
2cceeed: 8166554: Avoid compilation blocking in OverloadCompileQueueTest.java
188b0bc: 8252868: Clean up unused function from G1MMUTracker
891886b: 8252887: Zero VM is broken after JDK-8252661
7686e87: 8250968: Symlinks attributes not preserved when using jarsigner on zip files
8d6d43c: 8251193: bin/idea.sh is generating wrong folder definitions for JVMCI modules
70d5cac: 8251152: ARM32: jtreg c2 Test8202414 test crash
e0d5b5f: 8252627: Make it safe for JFR thread to read threadObj
e29c3f6: 8252661: Change SafepointMechanism terminology to talk less about "blocking"
e0c8d44: 8252844: Update check configuration to Skara format
... and 2 more: https://git.openjdk.java.net/jdk/compare/5f76deb2e064ca8a48fb8a638c23aad34bf27f9c...master

Your commit was automatically rebased without conflicts.

Pushed as commit 73ba3ae.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

This patch optimizes the backend implementation of VectorMaskToLong for AArch64, given a more efficient approach to mov value bits from predicate register to general purpose register as x86 PMOVMSK[1] does, by using BEXT[2] which is available in SVE2. With this patch, the final code (input mask is byte type with SPECIESE_512, generated on an SVE vector reg size of 512-bit QEMU emulator) changes as below: Before: mov z16.b, p0/z, #1 fmov x0, d16 orr x0, x0, x0, lsr openjdk#7 orr x0, x0, x0, lsr openjdk#14 orr x0, x0, x0, lsr openjdk#28 and x0, x0, #0xff fmov x8, v16.d[1] orr x8, x8, x8, lsr openjdk#7 orr x8, x8, x8, lsr openjdk#14 orr x8, x8, x8, lsr openjdk#28 and x8, x8, #0xff orr x0, x0, x8, lsl openjdk#8 orr x8, xzr, #0x2 whilele p1.d, xzr, x8 lastb x8, p1, z16.d orr x8, x8, x8, lsr openjdk#7 orr x8, x8, x8, lsr openjdk#14 orr x8, x8, x8, lsr openjdk#28 and x8, x8, #0xff orr x0, x0, x8, lsl openjdk#16 orr x8, xzr, #0x3 whilele p1.d, xzr, x8 lastb x8, p1, z16.d orr x8, x8, x8, lsr openjdk#7 orr x8, x8, x8, lsr openjdk#14 orr x8, x8, x8, lsr openjdk#28 and x8, x8, #0xff orr x0, x0, x8, lsl openjdk#24 orr x8, xzr, #0x4 whilele p1.d, xzr, x8 lastb x8, p1, z16.d orr x8, x8, x8, lsr openjdk#7 orr x8, x8, x8, lsr openjdk#14 orr x8, x8, x8, lsr openjdk#28 and x8, x8, #0xff orr x0, x0, x8, lsl openjdk#32 mov x8, #0x5 whilele p1.d, xzr, x8 lastb x8, p1, z16.d orr x8, x8, x8, lsr openjdk#7 orr x8, x8, x8, lsr openjdk#14 orr x8, x8, x8, lsr openjdk#28 and x8, x8, #0xff orr x0, x0, x8, lsl openjdk#40 orr x8, xzr, #0x6 whilele p1.d, xzr, x8 lastb x8, p1, z16.d orr x8, x8, x8, lsr openjdk#7 orr x8, x8, x8, lsr openjdk#14 orr x8, x8, x8, lsr openjdk#28 and x8, x8, #0xff orr x0, x0, x8, lsl openjdk#48 orr x8, xzr, #0x7 whilele p1.d, xzr, x8 lastb x8, p1, z16.d orr x8, x8, x8, lsr openjdk#7 orr x8, x8, x8, lsr openjdk#14 orr x8, x8, x8, lsr openjdk#28 and x8, x8, #0xff orr x0, x0, x8, lsl openjdk#56 After: mov z16.b, p0/z, #1 mov z17.b, #1 bext z16.d, z16.d, z17.d mov z17.d, #0 uzp1 z16.s, z16.s, z17.s uzp1 z16.h, z16.h, z17.h uzp1 z16.b, z16.b, z17.b mov x0, v16.d[0] [1] https://www.felixcloutier.com/x86/pmovmskb [2] https://developer.arm.com/documentation/ddi0602/2020-12/SVE-Instructions/BEXT--Gather-lower-bits-from-positions-selected-by-bitmask- Change-Id: Ia983a20c89f76403e557ac21328f2f2e05dd08e0

After JDK-8283091, the loop below can be vectorized partially. Statement 1 can be vectorized but statement 2 can't. ``` // int[] iArr; long[] lArrFld; int i1,i2; for (i1 = 6; i1 < 227; i1++) { iArr[i1] += lArrFld[i1]++; // statement 1 iArr[i1 + 1] -= (i2++); // statement 2 } ``` But we got incorrect results because the vector packs of iArr are scheduled incorrectly like: ``` ... load_vector XMM1,[R8 + openjdk#16 + R11 << openjdk#2] movl RDI, [R8 + openjdk#20 + R11 << openjdk#2] # int load_vector XMM2,[R9 + openjdk#8 + R11 << openjdk#3] subl RDI, R11 # int vpaddq XMM3,XMM2,XMM0 ! add packedL store_vector [R9 + openjdk#8 + R11 << openjdk#3],XMM3 vector_cast_l2x XMM2,XMM2 ! vpaddd XMM1,XMM2,XMM1 ! add packedI addl RDI, openjdk#228 # int movl [R8 + openjdk#20 + R11 << openjdk#2], RDI # int movl RBX, [R8 + openjdk#24 + R11 << openjdk#2] # int subl RBX, R11 # int addl RBX, openjdk#227 # int movl [R8 + openjdk#24 + R11 << openjdk#2], RBX # int ... movl RBX, [R8 + openjdk#40 + R11 << openjdk#2] # int subl RBX, R11 # int addl RBX, openjdk#223 # int movl [R8 + openjdk#40 + R11 << openjdk#2], RBX # int movl RDI, [R8 + openjdk#44 + R11 << openjdk#2] # int subl RDI, R11 # int addl RDI, openjdk#222 # int movl [R8 + openjdk#44 + R11 << openjdk#2], RDI # int store_vector [R8 + openjdk#16 + R11 << openjdk#2],XMM1 ... ``` simplified as: ``` load_vector iArr in statement 1 unvectorized loads/stores in statement 2 store_vector iArr in statement 1 ``` We cannot pick the memory state from the first load for LoadI pack here, as the LoadI vector operation must load the new values in memory after iArr writes 'iArr[i1 + 1] - (i2++)' to 'iArr[i1 + 1]'(statement 2). We must take the memory state of the last load where we have assigned new values ('iArr[i1 + 1] - (i2++)') to the iArr array. In JDK-8240281, we picked the memory state of the first load. Different from the scenario in JDK-8240281, the store, which is dependent on an earlier load here, is in a pack to be scheduled and the LoadI pack depends on the last_mem. As designed[2], to schedule the StoreI pack, all memory operations in another single pack should be moved in the same direction. We know that the store in the pack depends on one of loads in the LoadI pack, so the LoadI pack should be scheduled before the StoreI pack. And the LoadI pack depends on the last_mem, so the last_mem must be scheduled before the LoadI pack and also before the store pack. Therefore, we need to take the memory state of the last load for the LoadI pack here. To fix it, the pack adds additional checks while picking the memory state of the first load. When the store locates in a pack and the load pack relies on the last_mem, we shouldn't choose the memory state of the first load but choose the memory state of the last load. [1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380 [2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232 Jira: ENTLLT-5482 Change-Id: I341d10b91957b60a1b4aff8116723e54083a5fb8 CustomizedGitHooks: yes

Co-authored-by: Xin Liu <xxinliu@amazon.com>

…dk#40)

…ng into ldp/stp on AArch64 Macro-assembler on aarch64 can merge adjacent loads or stores into ldp/stp[1]. For example, it can merge: ``` str w20, [sp, openjdk#16] str w10, [sp, openjdk#20] ``` into ``` stp w20, w10, [sp, openjdk#16] ``` But C2 may generate a sequence like: ``` str x21, [sp, openjdk#8] str w20, [sp, openjdk#16] str x19, [sp, openjdk#24] <--- str w10, [sp, openjdk#20] <--- Before sorting str x11, [sp, openjdk#40] str w13, [sp, openjdk#48] str x16, [sp, openjdk#56] ``` We can't do any merging for non-adjacent loads or stores. The patch is to sort the spilling or unspilling sequence in the order of offset during instruction scheduling and bundling phase. After that, we can get a new sequence: ``` str x21, [sp, openjdk#8] str w20, [sp, openjdk#16] str w10, [sp, openjdk#20] <--- str x19, [sp, openjdk#24] <--- After sorting str x11, [sp, openjdk#40] str w13, [sp, openjdk#48] str x16, [sp, openjdk#56] ``` Then macro-assembler can do ld/st merging: ``` str x21, [sp, openjdk#8] stp w20, w10, [sp, openjdk#16] <--- Merged str x19, [sp, openjdk#24] str x11, [sp, openjdk#40] str w13, [sp, openjdk#48] str x16, [sp, openjdk#56] ``` To justify the patch, we run `HelloWorld.java` ``` public class HelloWorld { public static void main(String [] args) { System.out.println("Hello World!"); } } ``` with `java -Xcomp -XX:-TieredCompilation HelloWorld`. Before the patch, macro-assembler can do ld/st merging for 3688 times. After the patch, the number of ld/st merging increases to 3871 times, by ~5 %. Tested tier1~3 on x86 and AArch64. [1] https://github.com/openjdk/jdk/blob/a95062b39a431b4937ab6e9e73de4d2b8ea1ac49/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2079

8252500: ZGC on aarch64: Unable to allocate heap for certain Linux ke…

5bc71d4

…rnel configurations

openjdk bot added the rfr Pull request is ready for review label Sep 7, 2020

openjdk bot added the hotspot hotspot-dev@openjdk.org label Sep 7, 2020

openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Sep 7, 2020

openjdk bot removed the hotspot hotspot-dev@openjdk.org label Sep 7, 2020

stefank reviewed Sep 7, 2020

View reviewed changes

Guard linux specific include.

e665618

stefank reviewed Sep 7, 2020

View reviewed changes

Include runtime/os.hpp for os::errno_name and use SIZE_FORMAT instead…

c5e11ca

… of %zu.

fisk reviewed Sep 7, 2020

View reviewed changes

return MAX2(max_address_bit, MINIMUM_MAX_ADDRESS_BIT) instead of the …

70efdd8

…if to make the intention more clear.

fisk approved these changes Sep 7, 2020

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Sep 7, 2020

stefank approved these changes Sep 7, 2020

View reviewed changes

pliden suggested changes Sep 8, 2020

View reviewed changes

Added missing const to locals + made log message more clear.

a89cdcf

pliden approved these changes Sep 8, 2020

View reviewed changes

openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 8, 2020

openjdk bot closed this Sep 8, 2020

openjdk bot added integrated Pull request has been integrated and removed sponsor Pull request is ready to be sponsored ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 8, 2020

mychris deleted the 8252500 branch September 8, 2020 09:40

vchrisb mentioned this pull request Jun 1, 2021

Is it possible to detect kAddressBits at runtime? google/tcmalloc#82

Open

weixlu mentioned this pull request Sep 29, 2021

[GC] Backport 8252500 for ZGC heap allocation on AArch64 dragonwell-project/dragonwell11#166

Merged

caojoshua pushed a commit to caojoshua/jdk that referenced this pull request May 24, 2023

Move PEAVerbose to the destructor of Parser. (openjdk#40)

003ca3b

Co-authored-by: Xin Liu <xxinliu@amazon.com>

robehn pushed a commit to robehn/jdk that referenced this pull request Aug 15, 2023

Add UseZtso to optimize memory fences on TSO-enabled platforms (openj…

54f5aa1

…dk#40)

		uintptr_t high_addr = ((uintptr_t) 1U) << DEFAULT_MAX_ADDRESS_BIT;
		void* result_addr = mmap((void*) high_addr, page_size, PROT_NONE, MAP_PRIVATE\|MAP_ANONYMOUS\|MAP_NORESERVE, -1, 0);

8252500: ZGC on aarch64: Unable to allocate heap for certain Linux kernel configurations #40

8252500: ZGC on aarch64: Unable to allocate heap for certain Linux kernel configurations #40

Conversation

mychris commented Sep 7, 2020 • edited by openjdk bot Loading

Progress

Issue

Reviewers

Download

bridgekeeper bot commented Sep 7, 2020

openjdk bot commented Sep 7, 2020

mlbridge bot commented Sep 7, 2020 • edited Loading

Webrevs

mychris commented Sep 7, 2020

openjdk bot commented Sep 7, 2020

mychris commented Sep 7, 2020

openjdk bot commented Sep 7, 2020

mychris commented Sep 7, 2020

openjdk bot commented Sep 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mychris Sep 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mychris commented Sep 7, 2020

Choose a reason for hiding this comment

stefank left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mychris commented Sep 7, 2020

stefank commented Sep 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openjdk bot commented Sep 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pliden Sep 8, 2020 • edited Loading

Choose a reason for hiding this comment

pliden left a comment

Choose a reason for hiding this comment

mychris commented Sep 8, 2020

mychris commented Sep 8, 2020

openjdk bot commented Sep 8, 2020

pliden commented Sep 8, 2020

openjdk bot commented Sep 8, 2020

mychris commented Sep 7, 2020 •

edited by openjdk bot

Loading

mlbridge bot commented Sep 7, 2020 •

edited

Loading

mychris Sep 7, 2020 •

edited

Loading

openjdk bot commented Sep 7, 2020 •

edited

Loading

pliden Sep 8, 2020 •

edited

Loading