Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JvmErgonomicsTests.testExtractValidHeapSizeNoOptionPresent fails on 7.4 #47384

Closed
dakrone opened this issue Oct 1, 2019 · 13 comments
Closed
Assignees
Labels
:Core/Infra/Core Core issues without another label >test-failure Triaged test failures from CI

Comments

@dakrone
Copy link
Member

dakrone commented Oct 1, 2019

It fails with:

org.elasticsearch.tools.launchers.JvmErgonomicsTests > testExtractValidHeapSizeNoOptionPresent FAILED
    java.lang.AssertionError: 
    Expected: a value greater than <0L>
         but: <0L> was equal to <0L>
        at __randomizedtesting.SeedInfo.seed([22C99F519C4644E0:535C63ED1A8DEF9E]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.junit.Assert.assertThat(Assert.java:923)
        at org.elasticsearch.tools.launchers.JvmErgonomicsTests.testExtractValidHeapSizeNoOptionPresent(JvmErgonomicsTests.java:59)

What's strange is that this did not print any reproduction line, so I wasn't able to try the exact test.

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.4+multijob-windows-compatibility/os=windows-2019/64/consoleFull
https://gradle-enterprise.elastic.co/s/mm3dqwj56d4aa/console-log

@dakrone dakrone added :Core/Infra/Core Core issues without another label >test-failure Triaged test failures from CI labels Oct 1, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@droberts195
Copy link
Contributor

Every single Windows 2012r2 build on the 7.4 branch this month has failed with this same error: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.4+multijob-windows-compatibility/os=windows-2012-r2/

@droberts195
Copy link
Contributor

Every single Windows 2012r2 build on the 7.4 branch

The fact that this hasn't been failing in 7.x doesn't mean it's not a problem in 7.x. The 7.x builds all fail because of the minio problem - see #42829 - and the test report shows that JvmErgonomicsTests has not run at all because of this.

There are no differences in either JvmErgonomics.java or JvmErgonomicsTests.java between 7.4 and 7.x so the chances are this is still a problem in 7.x too.

It's working in master though: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-windows-compatibility/os=windows-2012-r2/203/testReport/org.elasticsearch.tools.launchers/JvmErgonomicsTests/

@williamrandolph williamrandolph self-assigned this Oct 17, 2019
@alpar-t
Copy link
Contributor

alpar-t commented Oct 17, 2019

As predicted, here's the failure on 7.x https://gradle-enterprise.elastic.co/s/k6qhteq27zplc

@williamrandolph
Copy link
Contributor

It looks like this is a JDK8 issue. The default value for MaxHeapSize is physical RAM / 4. On large machines, these numbers can be larger than what fits into a 32-bit integer, but that's what JDK 8 tries to print, so max heap sizes will be truncated. Some common values get truncated to 0, unfortunately. You can test this out by supplying such values as arguments to the -Xmx parameter.

Using Powershell, the results come out like this:

PS C:\Users\william_brafford\elasticsearch> C:\Users\jenkins\.java\java8\bin\java.exe -XX:+PrintFlagsFinal -Xmx3g -version | select-string MaxHeapSize

    uintx MaxHeapSize                              := 3221225472                          {product}

That is the correct value, i.e., 3g => 3221225472 bytes. But:


PS C:\Users\william_brafford\elasticsearch> C:\Users\jenkins\.java\java8\bin\java.exe -XX:+PrintFlagsFinal -Xmx8g -version | select-string MaxHeapSize

    uintx MaxHeapSize                              := 0                                   {product}

That's not right! Or rather, (8 * 2^30) % (2^32) = 0. Any multiple of 4 GB will have this problem, since 4GB = 4 * 2^30 B = 2^32 B.

Unfortunately, our Windows tests are executing on instances with 96 GB RAM, meaning the 24GB max heap size is getting truncated to 0 in the output that we're parsing.

JDK 9 and beyond don't have this problem, and the master branch uses JDK 11 for the runtime, so that's why we aren't seeing this issue on master.

As for the fix — is there a way to make this test meaningful, given that there's a bug in the JDK 8 output? Also, is this a potential problem on Linux, or is it Windows-specific? I'll keep digging on this.

@williamrandolph
Copy link
Contributor

The behavior turns out to be system-dependent — "uintx" holds the correct values on my Mac, but not on my Windows test machines.

The JDK bug report flagging this issue is here: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8074459

Since the problem behavior is in the JDK and not in our code, I think I will put out a PR that skips this test only on affected systems.

@williamrandolph
Copy link
Contributor

On second thought, this bug might have some consequences for our ergonomic settings on affected machines. I'll figure out what the worst case would be and loop back for discussion.

@williamrandolph
Copy link
Contributor

Note that we closed a related ticket by muting the tests on Windows: #44669 (comment)

@williamrandolph
Copy link
Contributor

williamrandolph commented Oct 21, 2019

Here's what I've found. Two properties depend on the parsed value of MaxHeapSize:

  • When io.netty.allocator.type is unset, JvmErgonomics sets it to unpooled when MaxHeapSize is less than or equal to 1g and to pooled when MaxHeapSize is greater than 1g. Thus, when this bug is present and the value is not defined, we'll get the wrong io.netty.allocator.type setting when physical memory is between N * 16g and (N * 16g) + 4g, for some natural number N.
  • When MaxDirectMemorySize is unset, JvmErgonomics sets it to MaxHeapSize / 2. On affected systems, this means we will get an incorrect setting for MaxDirectMemorySize when the value is not defined and physical memory is greater than 16g.

In both cases, the workaround is for the user to define the value or upgrade to a JDK more recent than 8. Since we have workarounds, my main question is how to put this information in some place where support or our users can find it.

@alpar-t
Copy link
Contributor

alpar-t commented Oct 22, 2019

Would it be too harsh to bail telling the user about the bug and advising a heap size that doesn't fit the problematic pattern ? If my understanding is correct, just adding a byte to the heap would fix this. That way we wouldn't create any burden on users and support finding it at a slight inconvenience when everything lines up for the bug to reproduce, but would also save some trouble.

@williamrandolph
Copy link
Contributor

A PR has been merged that removes the ergonomic io.netty.allocator.type setting: #48310

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Oct 23, 2019
Just like elastic#48329 (and using the changes) in that PR
we can run into a concurrent repo modification that we
will throw on and must retry until consistent handling of
this situation is implemented.

Closes elastic#47384
@williamrandolph
Copy link
Contributor

This issue has been fixed by #48365 and #48657.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

5 participants