Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception when running a Quarkus native image with JFR enabled #270

Closed
vic-ma opened this issue Jul 14, 2021 · 11 comments
Closed

Exception when running a Quarkus native image with JFR enabled #270

vic-ma opened this issue Jul 14, 2021 · 11 comments
Assignees
Labels
affects/21.2 bug Something isn't working pending upstream Issue or PR pending action on upstream repository (oracle/graal)

Comments

@vic-ma
Copy link

vic-ma commented Jul 14, 2021

Description

I am trying to build and run a native image of a simple Quarkus program, which has some JFR code in it [0]. The program is essentially just the Quarkus getting-started program with a few bits of JFR code added.

The native image builds without any issues, and when I run it without enabling JFR, the program runs as expected. However, when I enable JFR (regardless of starting a recording or not), it will cause an exception [1].

This exception is a little unpredictable though, because I've had a case where I rebuilt the native image, without making any changes to the source code, and the resulting native image did not give an exception when running it with JFR enabled. And when I started a recording on this native image, the resulting recording had the expected events in it [2].

Another interesting thing is that the likelihood of a build working seems to be proportional to the amount of JFR code. When I removed all of the code I added, the build seems to always work; when I changed it to have only one event instead of three, it seems to work about 50% of the time; and with the original code, a working native image is very rare.

Additionally, I've found that the issue is present when using upstream Graal as the native image builder, as well.

How To Reproduce

The commands for the first three steps are from a Mandrel blog post [3], with some modifications.

  1. Download OpenJDK.
$ curl -sL https://github.com/AdoptOpenJDK/openjdk11-upstream-binaries/releases/download/jdk-11.0.10%2B9/OpenJDK11U-jdk_x64_linux_11.0.10_9.tar.gz
-o jdk.tar.gz
$ curl -sL https://github.com/AdoptOpenJDK/openjdk11-upstream-binaries/releases/download/jdk-11.0.10%2B9/OpenJDK11U-static-libs_x64_linux_11.0.10_9.tar.gz
-o jdk-static-libs.tar.gz
$ export JAVA_HOME=$(pwd)/openjdk-11
$ mkdir ${JAVA_HOME}
$ tar xf jdk.tar.gz -C ${JAVA_HOME} --strip-components=1
$ tar xf jdk-static-libs.tar.gz -C ${JAVA_HOME} --strip-components=1
  1. Clone mx and Mandrel and Mandrel Packaging.

I've edited these commands to use the appropriate branches.

$ git clone --depth 1 https://github.com/graalvm/mx
$ git clone --depth 1 --branch mandrel/21.2 https://github.com/graalvm/mandrel
$ git clone --depth 1 --branch 21.2 https://github.com/graalvm/mandrel-packaging
  1. Build Mandrel.

I've added the --mandrel-repo flag to this command. I think the blog post might have left it out by accident.

$ $JAVA_HOME/bin/java -ea ./mandrel-packaging/build.java --mx-home $(pwd)/mx --mandrel-repo $(pwd)/mandrel
$ export GRAALVM_HOME=$(pwd)/mandrel-java11-21.2.0.0-devaf87d84
  1. Clone Quarkus project.
$ git clone https://github.com/vic-ma/getting-started.git
  1. Build the native image.
$ cd getting-started
$ ./mvnw clean && ./mvnw package -Pnative -Dquarkus.native.additional-build-args="-H:+AllowVMInspection"
  1. Run the native image with JFR enabled.
$ ./target/getting-started-1.0.0-SNAPSHOT-runner -XX:+FlightRecorder

Configuration:

  • OS: Fedora 32
  • Architecture: x86_64
  • Mandrel version: mandrel/21.2 af87d84
  • JDK version: OpenJDK 11.0.10+9

Expected behavior

The program should start, and not give an exception [2].

Additional context

[0] https://github.com/vic-ma/getting-started/blob/master/src/main/java/org/acme/getting/started/GreetingResource.java
[1] https://gist.github.com/vic-ma/c3773fbabf91c2126ae4795d2837ec77
[2] https://gist.github.com/vic-ma/cf34e1af9e8db28be6fb2867f0f613bd
[3] https://developers.redhat.com/blog/2021/04/14/mandrel-a-specialized-distribution-of-graalvm-for-quarkus#building_mandrel

@vic-ma vic-ma added the bug Something isn't working label Jul 14, 2021
@jiekang jiekang self-assigned this Jul 15, 2021
@jerboaa
Copy link
Collaborator

jerboaa commented Jul 15, 2021

@vic-ma Please also add the "Actual Behaviour". I.e. the exception you see. Thanks!

@jerboaa
Copy link
Collaborator

jerboaa commented Jul 15, 2021

NVM. it's in the gist. Fails with:

[vma@victor-work ~/TEMP/getting-started]$ ./target/getting-started-1.0.0-SNAPSHOT-runner -XX:+FlightRecorder
Fatal error: java.lang.InternalError: Could not access event handler
    at jdk.jfr.internal.Utils.getHandler(Utils.java:323)
    at jdk.jfr.internal.MetadataRepository.getEventHandlers(MetadataRepository.java:227)
    at jdk.jfr.internal.MetadataRepository.getEventControls(MetadataRepository.java:212)
    at jdk.jfr.internal.SettingsManager.setSettings(SettingsManager.java:136)
    at jdk.jfr.internal.MetadataRepository.setSettings(MetadataRepository.java:200)
    at jdk.jfr.internal.PlatformRecorder.updateSettingsButIgnoreRecording(PlatformRecorder.java:338)
    at jdk.jfr.internal.PlatformRecorder.updateSettings(PlatformRecorder.java:327)
    at jdk.jfr.internal.PlatformRecorder.start(PlatformRecorder.java:237)
    at jdk.jfr.internal.PlatformRecording.start(PlatformRecording.java:114)
    at jdk.jfr.Recording.start(Recording.java:169)
    at jdk.jfr.internal.dcmd.DCmdStart.execute(DCmdStart.java:199)
    at com.oracle.svm.jfr.JfrManager.initRecording(JfrManager.java:113)
    at com.oracle.svm.jfr.JfrManager.setup(JfrManager.java:72)
    at com.oracle.svm.core.jdk.RuntimeSupport.executeHooks(RuntimeSupport.java:125)
    at com.oracle.svm.core.jdk.RuntimeSupport.executeStartupHooks(RuntimeSupport.java:75)
 
JavaFrameAnchor dump:
 
  No anchors
 
TopFrame info:
 
  TotalFrameSize in CodeInfoTable 48
 
VMThreads info:
 
  VMThread 00007f8898000b80  STATUS_IN_NATIVE  java.util.TimerThread@0x7f88a8001550
  VMThread 00007f889c000b80  STATUS_IN_NATIVE  java.lang.Thread@0x7f88aa0159b8
  VMThread 00007f88a4000b80  STATUS_IN_NATIVE  com.oracle.svm.jfr.JfrRecorderThread@0x7f88aaae1898
  VMThread 00000000045c0f00  STATUS_IN_JAVA (safepoints disabled)  java.lang.Thread@0x7f88aaae2790
 
VM Thread State for current thread 00000000045c0f00:
 
  0 (8 bytes): com.oracle.svm.jni.JNIThreadLocalEnvironment.jniFunctions = (bytes)
    00000000045c0f00: 00007f88aa747010
 
  8 (8 bytes): com.oracle.svm.core.graal.snippets.StackOverflowCheckImpl.stackBoundaryTL = (Word) 1  0000000000000001
  16 (4 bytes): com.oracle.svm.core.thread.Safepoint.safepointRequested = (int) 2144744209  7fd63311
  20 (4 bytes): com.oracle.svm.core.thread.VMThreads$StatusSupport.statusTL = (int) 1  00000001
  24 (32 bytes): com.oracle.svm.core.genscavenge.ThreadLocalAllocation.regularTLAB = (bytes)
    00000000045c0f18: 00007f88a2000000 00007f88a2100000
    00000000045c0f28: 00007f88a2007688 0000000000000000
   
 
  56 (8 bytes): com.oracle.svm.core.thread.JavaThreads.currentThread = (Object) java.lang.Thread  00007f88aaae2790
  64 (8 bytes): com.oracle.svm.core.stack.JavaFrameAnchors.lastAnchor = (Word) 0  0000000000000000
  72 (8 bytes): com.oracle.svm.core.heap.NoAllocationVerifier.openVerifiers = (Object) null
  80 (8 bytes): com.oracle.svm.core.identityhashcode.IdentityHashCodeSupport.hashCodeGeneratorTL = (Object) java.util.SplittableRandom  00007f88aa0169d8
  88 (8 bytes): com.oracle.svm.core.snippets.ExceptionUnwind.currentException = (Object) null
  96 (8 bytes): com.oracle.svm.core.thread.ThreadingSupportImpl.activeTimer = (Object) null
  104 (8 bytes): com.oracle.svm.graal.isolated.IsolatedCompileClient.currentClient = (Object) null
  112 (8 bytes): com.oracle.svm.graal.isolated.IsolatedCompileContext.currentContext = (Object) null
  120 (8 bytes): com.oracle.svm.jfr.JfrThreadLocal.javaEventWriter = (Object) null
  128 (8 bytes): com.oracle.svm.jni.JNIObjectHandles.handles = (Object) com.oracle.svm.core.handles.ThreadLocalHandles  00007f88aa001548
  136 (8 bytes): com.oracle.svm.jni.JNIThreadLocalPendingException.pendingException = (Object) null
  144 (8 bytes): com.oracle.svm.jni.JNIThreadLocalPinnedObjects.pinnedObjectsListHead = (Object) null
  152 (8 bytes): com.oracle.svm.jni.JNIThreadOwnedMonitors.ownedMonitors = (Object) null
  160 (8 bytes): com.oracle.svm.core.SubstrateDiagnostics.threadOnlyAttachedForCrashHandler = (bytes)
    00000000045c0fa0: 0000000000000000
 
  168 (8 bytes): com.oracle.svm.core.thread.VMThreads.IsolateTL = (Word) 140224946503680  00007f88aa200000
  176 (8 bytes): com.oracle.svm.core.thread.VMThreads.OSThreadHandleTL = (Word) 140224968039680  00007f88ab689d00
  184 (8 bytes): com.oracle.svm.core.thread.VMThreads.OSThreadIdTL = (Word) 140224968039680  00007f88ab689d00
  192 (8 bytes): com.oracle.svm.core.thread.VMThreads.nextTL = (Word) 0  0000000000000000
  200 (8 bytes): com.oracle.svm.core.thread.VMThreads.unalignedIsolateThreadMemoryTL = (Word) 73142016  00000000045c0f00
  208 (8 bytes): com.oracle.svm.jfr.JfrThreadLocal.dataLost = (Word) 0  0000000000000000
  216 (8 bytes): com.oracle.svm.jfr.JfrThreadLocal.javaBuffer = (Word) 0  0000000000000000
  224 (8 bytes): com.oracle.svm.jfr.JfrThreadLocal.nativeBuffer = (Word) 0  0000000000000000
  232 (8 bytes): com.oracle.svm.jfr.JfrThreadLocal.traceId = (long) 1  0000000000000001
  240 (4 bytes): com.oracle.svm.core.graal.snippets.StackOverflowCheckImpl.yellowZoneStateTL = (int) -16843010  fefefefe
  244 (4 bytes): com.oracle.svm.core.snippets.ImplicitExceptions.implicitExceptionsAreFatal = (int) 0  00000000
  248 (4 bytes): com.oracle.svm.core.thread.ThreadingSupportImpl.currentPauseDepth = (int) 0  00000000
  252 (4 bytes): com.oracle.svm.core.thread.VMThreads$ActionOnExitSafepointSupport.actionTL = (int) 0  00000000
  256 (4 bytes): com.oracle.svm.core.thread.VMThreads$ActionOnTransitionToJavaSupport.actionTL = (int) 0  00000000
  260 (4 bytes): com.oracle.svm.core.thread.VMThreads$StatusSupport.safepointsDisabledTL = (int) 1  00000001
  264 (4 bytes): io.quarkus.runtime.configuration.Substitutions.notExpanding = (int) 0  00000000
 
VMOperation dump:
 
  No VMOperation in progress
 
  The 10 most recent VM operation status changes (oldest first):
   
 
Dump Counters:
 
 
Raw Stacktrace:
 
  00007ffe2a9f3110: 00007f88aa200000 00007f88aa200000
  00007ffe2a9f3120: 00007f88aa200000 00007f88a2005eb8
  00007ffe2a9f3130: 00000000004a9eee 00000000004a66e4
  00007ffe2a9f3140: 00000001aa006c08 00000000004a9eee
  00007ffe2a9f3150: 00000031aabf7600 00007f88aa200000
  00007ffe2a9f3160: 00007f88aa200000 00007f88aa200000
  00007ffe2a9f3170: 00007f88aa200000 00007f88aa200000
  00007ffe2a9f3180: 00007f88aa006c68 00007f88aa200000
 
Stacktrace Stage 0:
 
  SP 00007ffe2a9f3110 IP 0000000000460d40  FrameSize 48
  SP 00007ffe2a9f3140 IP 00000000004a66e4  FrameSize 16
  SP 00007ffe2a9f3150 IP 00000000004a9eee  FrameSize 144
  SP 00007ffe2a9f31e0 IP 00000000004ad867  FrameSize 32
  SP 00007ffe2a9f3200 IP 00000000004a90d4  FrameSize 16
  SP 00007ffe2a9f3210 IP 000000000045d502  FrameSize 48
  SP 00007ffe2a9f3240 IP 000000000043304a  FrameSize 112
 
Stacktrace Stage 1:
 
  SP 00007ffe2a9f3110 IP 0000000000460d40  com.oracle.svm.core.code.CodeInfo@0x7f88aaa346c0 name = image code
  SP 00007ffe2a9f3140 IP 00000000004a66e4  com.oracle.svm.core.code.CodeInfo@0x7f88aaa346c0 name = image code
  SP 00007ffe2a9f3150 IP 00000000004a9eee  com.oracle.svm.core.code.CodeInfo@0x7f88aaa346c0 name = image code
  SP 00007ffe2a9f31e0 IP 00000000004ad867  com.oracle.svm.core.code.CodeInfo@0x7f88aaa346c0 name = image code
  SP 00007ffe2a9f3200 IP 00000000004a90d4  com.oracle.svm.core.code.CodeInfo@0x7f88aaa346c0 name = image code
  SP 00007ffe2a9f3210 IP 000000000045d502  com.oracle.svm.core.code.CodeInfo@0x7f88aaa346c0 name = image code
  SP 00007ffe2a9f3240 IP 000000000043304a  com.oracle.svm.core.code.CodeInfo@0x7f88aaa346c0 name = image code
 
Stacktrace Stage 2:
 
  SP 00007ffe2a9f3110 IP 0000000000460d40  [image code] com.oracle.svm.core.jdk.VMErrorSubstitutions.shutdown(VMErrorSubstitutions.java:112)
  SP 00007ffe2a9f3110 IP 0000000000460d40  [image code] com.oracle.svm.core.jdk.VMErrorSubstitutions.shouldNotReachHere(VMErrorSubstitutions.java:105)
  SP 00007ffe2a9f3140 IP 00000000004a66e4  [image code] com.oracle.svm.core.util.VMError.shouldNotReachHere(VMError.java:71)
  SP 00007ffe2a9f3150 IP 00000000004a9eee  [image code] com.oracle.svm.jfr.JfrManager.initRecording(JfrManager.java:117)
  SP 00007ffe2a9f31e0 IP 00000000004ad867  [image code] com.oracle.svm.jfr.JfrManager.setup(JfrManager.java:72)
  SP 00007ffe2a9f3200 IP 00000000004a90d4  [image code] com.oracle.svm.jfr.JfrFeature$$Lambda$7255c737b2f090767dfb5f41366f236b7c4037d0.run(Unknown Source)
  SP 00007ffe2a9f3210 IP 000000000045d502  [image code] com.oracle.svm.core.jdk.RuntimeSupport.executeHooks(RuntimeSupport.java:125)
  SP 00007ffe2a9f3240 IP 000000000043304a  [image code] com.oracle.svm.core.jdk.RuntimeSupport.executeStartupHooks(RuntimeSupport.java:75)
  SP 00007ffe2a9f3240 IP 000000000043304a  [image code] com.oracle.svm.core.JavaMainWrapper.runCore(JavaMainWrapper.java:137)
  SP 00007ffe2a9f3240 IP 000000000043304a  [image code] com.oracle.svm.core.JavaMainWrapper.run(JavaMainWrapper.java:182)
  SP 00007ffe2a9f3240 IP 000000000043304a  [image code] com.oracle.svm.core.code.IsolateEnterStub.JavaMainWrapper_run_5087f5482cc9a6abc971913ece43acb471d2631b(IsolateEnterStub.java:0)
 
[Heap settings and statistics:
  Supports isolates: true
  Object reference size: 8
  Incremental collections: 0
  Complete collections: 0]
 
[Native image heap boundaries:
  ReadOnly Primitives: 0x7f88aa201028 .. 0x7f88aa4b8708
  ReadOnly References: 0x7f88aa4b8718 .. 0x7f88aa746d88
  ReadOnly Relocatables: 0x7f88aa747000 .. 0x7f88aa937e98
  Writable Primitives: 0x7f88aa938000 .. 0x7f88aaa70200
  Writable References: 0x7f88aaa70210 .. 0x7f88aafdf808
  Writable Huge: 0x0 .. 0x0
  ReadOnly Huge: 0x7f88ab000030 .. 0x7f88ab5e9f20]
 
 
[Heap:
  [Young generation:
    [Eden:
      [edenSpace:
        aligned: 1044440/1 unaligned: 0/0
        aligned chunks:
          0x7f88a8000000 (0x7f88a8001028-0x7f88a8001878)]]
    [Survivors:
      ]]
  [Old generation:
    [oldFromSpace:
      aligned: 0/0 unaligned: 0/0]
    [oldToSpace:
      aligned: 0/0 unaligned: 0/0]
    ]
  [Unused:
    aligned: 0/0]]
 
Fatal error: java.lang.InternalError: Could not access event handler

@vic-ma
Copy link
Author

vic-ma commented Jul 19, 2021

Here are some logs from an initial debugging attempt I did:

Original program, Broken (1)
Original program, Broken (2)
Only Alpha event/method, Broken
Only Alpha event/method, Working

A little hard to read because I didn't think of sharing this at first, but essentially I just added some print statements in the last two methods of the stack trace. Here is the diff. I did this in LabsJDK/Graal because I had trouble before using Mandrel with the OpenJDK I built from source.

With the original program, the first log shows that the problematic event is Alpha, whereas the second log—a different native image build—shows that it is Beta. So I think this implies that the reason why the original program is much more likely to fail is because each custom event has some chance of failing, and obviously having three events increases the overall chance of having a broken build a lot.

The exception is java.lang.NoSuchFieldException: eventHandler, so somehow the broken event sometimes doesn't have the eventHandler field, and other times it does, and this is determiend at image build time, since the behaviour doesn't change after an image is built.

@jiekang
Copy link
Collaborator

jiekang commented Jul 19, 2021

The code in Graal that should be retransforming the event class to add the eventHandler field can be found here:
https://github.com/oracle/graal/blob/master/substratevm/src/com.oracle.svm.jfr/src/com/oracle/svm/jfr/JfrEventSubstitution.java

I think at this point you can re-create this issue in the oracle/graal upstream repo with the details.

@vic-ma
Copy link
Author

vic-ma commented Jul 19, 2021

oracle#3608

@jerboaa jerboaa added the pending upstream Issue or PR pending action on upstream repository (oracle/graal) label Jul 20, 2021
@jerboaa
Copy link
Collaborator

jerboaa commented Jul 20, 2021

As the issue is now tracked upstream we want the fix to land there first.

@jerboaa
Copy link
Collaborator

jerboaa commented Aug 16, 2021

Upstream fix. We should backport it to the 21.2 branch. oracle@bdaf8c2

@zakkak Thoughts?

@zakkak
Copy link
Collaborator

zakkak commented Aug 16, 2021

@zakkak Thoughts?

Sure. I would also suggest asking upstream to backport this for a potential pico release (e.g. 21.2.0.1).

@jerboaa
Copy link
Collaborator

jerboaa commented Aug 16, 2021

@zakkak Thoughts?

Sure. I would also suggest asking upstream to backport this for a potential pico release (e.g. 21.2.0.1).

Do they update non-LTS releases that way? It would be news to me. I thought 21.3 only.

@zakkak
Copy link
Collaborator

zakkak commented Aug 16, 2021

They do provide bug fixes in some cases
See https://github.com/graalvm/graalvm-ce-builds/releases/tag/vm-21.0.0.2 for instance.

$ git log --oneline --no-merges vm-21.0.0..vm-21.0.0.2 
f4a5093b6f8 (tag: vm-21.0.0.2, upstream/release/graal-vm/21.0) [GR-29243] Release GraalVM 21.0.0.2.
ea42ad94d4a Fix installable name if it only contain native libraries
f1a048b01f7 [GR-29049] Transfer license acceptance transiently during upgrade.
62247f65bb1 [GR-29049] Disable license storage.
6646b56b0ff [GR-29050] Include existing components into different edition.
9f5bf03274b Fix unclosed collected contexts are leaking a reference from explicit engines.
fb7217549b2 New dev cycle (21.0.0.2)
73790faa19f [GR-29005] Fixed transfer of GDS settings on Mac after upgrade.
2e98738f44b [GR-28970] Fixed CRLF/LF handling.
6ea39b447d6 [GR-28970] Case-insentive globbing - same results on UNIX/Windows.
3f49a663f29 [GR-28970] Recogine URLs as remote paths.
c3c3635ef75 [GR-28880] Release 21.0.0.1
707edfb6c51 Update imports, integrate JVMCI 8u281+09-jvmci-21.0-b07.
ef37eb9a669 Next dev cycle

@jerboaa
Copy link
Collaborator

jerboaa commented Aug 18, 2021

Closing this issue as it's on the 21.2 branch.

@jerboaa jerboaa closed this as completed Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/21.2 bug Something isn't working pending upstream Issue or PR pending action on upstream repository (oracle/graal)
Projects
None yet
Development

No branches or pull requests

4 participants