Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GR-35118] debuginfotest fails on JDK17; assertion error on aarch64. #4018

Closed
fniephaus opened this issue Nov 15, 2021 · 33 comments · Fixed by #4121
Closed

[GR-35118] debuginfotest fails on JDK17; assertion error on aarch64. #4018

fniephaus opened this issue Nov 15, 2021 · 33 comments · Fixed by #4121
Assignees
Milestone

Comments

@fniephaus
Copy link
Member

After switching the vm-native gates from JDK 8 to JDK 17, the debuginfotest started to fail on JDK17. To debug the issue, I tried to reproduce it on an aarch64 machine. However, image generation crashed due to an AssertionError (see below).

@zakkak is that expected/a known issue? Is debuginfo supported on aarch64?

$ mx --primary-suite-path substratevm --J @-Xmx2g debuginfotest
parent=/home/foo/dev/graal/substratevm
sourcepath=/home/foo/dev/graal/substratevm/src/com.oracle.svm.test/src
sourcecache=/home/foo/dev/graal/substratevm/svmbuild/sources
native_image ['--native-image-info', '-H:Path=/home/foo/dev/graal/substratevm/svmbuild', '-H:+VerifyNamingConventions', '-cp', '/home/opc/.mx/cache/HAMCREST_42a25dc3219429f0e5d060061f71acb49bf010a0/hamcrest.jar:/home/opc/.mx/cache/JUNIT_2973d150c0dc1fefe998f834810d68f278ea58ec/junit.jar:/home/opc/dev/mx/mxbuild/jdk17/dists/jdk9/junit-tool.jar:/home/foo/dev/graal/sdk/mxbuild/jdk17/dists/jdk17/graal-sdk.jar:/home/foo/dev/graal/truffle/mxbuild/jdk17/dists/jdk11/truffle-api.jar:/home/foo/dev/graal/compiler/mxbuild/jdk17/dists/jdk16/graal.jar:/home/foo/dev/graal/substratevm/mxbuild/jdk17/dists/jdk17/objectfile.jar:/home/foo/dev/graal/substratevm/mxbuild/jdk17/dists/jdk11/native-image-base.jar:/home/foo/dev/graal/substratevm/mxbuild/jdk17/dists/jdk17/pointsto.jar:/home/foo/dev/graal/substratevm/mxbuild/jdk17/dists/jdk17/svm.jar:/home/foo/dev/graal/substratevm/mxbuild/jdk17/com.oracle.svm.test/bin', '-Dgraal.LogFile=graal.log', '-g', '-H:-OmitInlinedMethodDebugLineInfo', '-H:DebugInfoSourceSearchPath=/home/foo/dev/graal/substratevm/src/com.oracle.svm.test/src', '-H:DebugInfoSourceCacheRoot=/home/foo/dev/graal/substratevm/svmbuild/sources', 'hello.Hello', '-Dllvm.bin.dir=/home/foo/dev/graal/sdk/mxbuild/linux-aarch64/LLVM_TOOLCHAIN/bin/', '-H:+SpawnIsolates']
[hello.hello:3052527]    classlist:   2,259.37 ms,  0.96 GB
[hello.hello:3052527]        (cap):     706.75 ms,  0.96 GB
[hello.hello:3052527]        setup:   2,931.43 ms,  0.96 GB
# Printing compilation-target information to: /home/foo/dev/graal/substratevm/svmbuild/reports/target_info_20211115_093129.txt
[hello.hello:3052527]     (clinit):     288.59 ms,  3.22 GB
# Printing native-library information to: /home/foo/dev/graal/substratevm/svmbuild/reports/native_library_info_20211115_093206.txt
[hello.hello:3052527]   (typeflow):   8,675.47 ms,  3.22 GB
[hello.hello:3052527]    (objects):  24,616.65 ms,  3.22 GB
[hello.hello:3052527]   (features):   1,871.33 ms,  3.22 GB
[hello.hello:3052527]     analysis:  36,385.95 ms,  3.22 GB
[hello.hello:3052527]     universe:   2,258.39 ms,  3.22 GB
[hello.hello:3052527]      (parse):   2,846.47 ms,  3.23 GB
[hello.hello:3052527]     (inline):   2,395.20 ms,  3.23 GB
[hello.hello:3052527]    (compile):  26,180.48 ms,  5.27 GB
[hello.hello:3052527]      compile:  35,526.60 ms,  5.27 GB
[hello.hello:3052527]      dbginfo:   1,188.45 ms,  5.27 GB
[hello.hello:3052527]        image:   2,725.78 ms,  5.27 GB
Fatal error: com.oracle.svm.core.util.VMError$HostedError: java.lang.AssertionError: Class_arrayType_adf989c78c59c6f411d53216c542e93dcf201093
	at com.oracle.svm.core.util.VMError.shouldNotReachHere(VMError.java:72)
	at com.oracle.svm.hosted.NativeImageGenerator.doRun(NativeImageGenerator.java:646)
	at com.oracle.svm.hosted.NativeImageGenerator.run(NativeImageGenerator.java:492)
	at com.oracle.svm.hosted.NativeImageGeneratorRunner.buildImage(NativeImageGeneratorRunner.java:420)
	at com.oracle.svm.hosted.NativeImageGeneratorRunner.build(NativeImageGeneratorRunner.java:578)
	at com.oracle.svm.hosted.NativeImageGeneratorRunner.main(NativeImageGeneratorRunner.java:127)
	at com.oracle.svm.hosted.NativeImageGeneratorRunner$JDK9Plus.main(NativeImageGeneratorRunner.java:608)
Caused by: java.lang.AssertionError: Class_arrayType_adf989c78c59c6f411d53216c542e93dcf201093
	at com.oracle.objectfile.debugentry.ClassEntry.indexMethodEntry(ClassEntry.java:181)
	at com.oracle.objectfile.debugentry.ClassEntry.processMethod(ClassEntry.java:310)
	at com.oracle.objectfile.debugentry.ClassEntry.lambda$addDebugInfo$2(ClassEntry.java:144)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:992)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
	at com.oracle.objectfile.debugentry.ClassEntry.addDebugInfo(ClassEntry.java:144)
	at com.oracle.objectfile.debugentry.DebugInfoBase.lambda$installDebugInfo$2(DebugInfoBase.java:233)
	at com.oracle.svm.hosted.image.NativeImageDebugInfoProvider$NativeImageDebugTypeInfo.debugContext(NativeImageDebugInfoProvider.java:331)
	at com.oracle.objectfile.debugentry.DebugInfoBase.lambda$installDebugInfo$3(DebugInfoBase.java:227)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:310)
	at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:735)
	at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)
	at com.oracle.objectfile.debugentry.DebugInfoBase.installDebugInfo(DebugInfoBase.java:227)
	at com.oracle.objectfile.elf.ELFObjectFile.installDebugInfo(ELFObjectFile.java:1203)
	at com.oracle.svm.hosted.image.NativeImage.build(NativeImage.java:460)
	at com.oracle.svm.hosted.NativeImageGenerator.doRun(NativeImageGenerator.java:635)
	... 5 more
[hello.hello:3052527]      [total]:  82,358.68 ms,  5.27 GB
# Printing build artifacts to: /home/foo/dev/graal/substratevm/svmbuild/hello.hello.build_artifacts.txt
Error: Image build request failed with exit status 1
@fniephaus
Copy link
Member Author

It seems that java.io.PrintStream::println(java.lang.String *) cannot be found. Changing

rexp = r"%svoid java.io.PrintStream::println\(java\.lang\.String \*\)"%maybe_spaces_pattern

to

rexp = r"%svoid java.io.PrintStream::println\(java\.lang\.Object \*\)"%maybe_spaces_pattern 

let's the test pass again. Not sure if that's the right thing to do, maybe the wrong println was pulled into the image?

@adinn
Copy link
Collaborator

adinn commented Nov 15, 2021

Yes, debug info is currently supported on AArch64.

The assert and the gdb test failure are both a bit peculiar. Perhaps the two failures may well be related but I am not sure.

  1. The assertion on AArch64 indicates that a method with symbol name Class_arrayType_adf989c78c59c6f411d53216c542e93dcf201093 has turned up twice in the list of compiled methods for which debug info is to be generated. The debug info generator currently assumes that each separate compiled method has a unique symbol name. I am not clear how this reuse of a symbol name could happen as a result of the switch to jdk17. So, I will need to try to reproduce this and see what the generator is doing. Is it reliably reproduced or intermittent?

  2. The patch you made to the python code does not look like it is actually going to guarantee that the problem goes away. Indeed, the fact that it allow execution to continue indicates a deeper problem. The check in question is on the output of gdb command:

    info func java.io.PrintStream::println

The expected output is described in the comment text:

# expect "All functions matching regular expression "java.io.PrintStream.println":"
# expect ""
# expect "File java.base/java/io/PrintStream.java:"
# expect "      void java.io.PrintStream::println(java.lang.Object *);"
# expect "      void java.io.PrintStream::println(java.lang.String *);"

The check call accepts a regexp for the call with arg type java.lang.String allowing failed matches to be skipped:

rexp = r"%svoid java.io.PrintStream::println\(java\.lang\.String \*\)"%maybe_spaces_pattern
checker = Checker("info func java.io.PrintStream::println", rexp)
checker.check(exec_string)

n.b. the check call accepts an argument skip_fails which defaults to True

So, if the check passes when you change the regexp from java.lang.String to java.lang.Object then that indicates that there is no debug info to record the compiled method taking a String argument only info for the method taking an Object argument. That's maybe a consequence of different inlining decisions taken when running on jdk17. However, I do find it surprising that only the Object variant would get compiled. This also will require investigation of what he generator is doing.

Can you confirm that this is can be reproduced simply by using jdk17 with the current head?

@fniephaus
Copy link
Member Author

Yes, debug info is currently supported on AArch64.

Thanks for confirming.

The patch you made to the python code does not look like it is actually going to guarantee that the problem goes away.

That's why I initially created the public issue against @zakkak. If you could help with a proper fix, that'd be great.

Can you confirm that this is can be reproduced simply by using jdk17 with the current head?

Yes, I build against e9bddb1 from a checkout I pulled today.

@adinn
Copy link
Collaborator

adinn commented Nov 15, 2021

Hi Fabio,

I found out why the assert is happening. When building up debug info for a HostedType that is included in the hosted universe (i.e. the ones listed by NativeImageHeap.getUniverse().getTypes()) the debuginfo code generator calls method HostedType::getAllDeclaredMethods to identify all the methods of the type. The case that is blowing up is type DynamicHub which, as you know, is a substitution for java.lang.Class.

The difference in jdk17 that ends up causing the problem relates to the presence of a new method of java.lang.Class called arrayType(). In class DynamicHub this is substituted as follows

@Substitute
@TargetElement(onlyWith = JDK17OrLater.class)
@Override
public DynamicHub arrayType() {
    return arrayHub;
}

When HostedType::getAllDeclaredMethods is called the list of methods includes one for DynamicHub::arrayType and one for Class::arrayType -- well strictly there is a separate method wrapper chain HostedMethod -> AnalysisMethod -> HotspotMethod for each of these two methods. However, the two HostedMethod instances are labelled with the same symbol name.

Actually, the problem is actually a bit worse than that described above. The list returned by HostedType::getAllDeclaredMethods actually contains another entry for arrayType. This entry is for a wrapper chain HostedMethod -> AnalysisMethod -> SubstitutionMethod -> HotspotMethod/HotspotMethod where the last linked pair in the chain has Class::arrayType as the original and DynamicHub::arrayType as the annotated field of the SubstitutionMethod. This 3rd version of the declared method has a different symbol name to the other two.

As far as I am aware, this repeated listing of the original and substituted methods (never mind the substitution method) never happened with jdk11u. For example, method getSimpleBinaryName0 is substituted in jdk11 and we never saw this problem. Certainly the assert that the method id is unique is not failing with jdk11u. So, I am not clear why it is happening now. I am actually rather surprised that method Class::arrayType is not getting deleted. There is no @keepOriginal annotation on the substitution. I am wondering if this might be a bug (or, perhaps, a 'new feature' :-) in the substitution code?

I have not yet been able to reproduce the problem with java.io.printStream.print(). I will let you know details if/when I can do so.

@adinn
Copy link
Collaborator

adinn commented Nov 15, 2021

@olpaw Do you have any idea why this is failing with jdk17 and whether or not it is legitimate. I could ignore the repeated use of the symbol. However, I think it would be wrong to list all 3 variants of this method rather than list the target version that is actually going to be used. My feeling is that HostedType::getAllDeclaredMethods ought only to report the target method. If there is a good reason for it to report all variants then I think I am may need some help to establish a criterion for discounting some methods and including others when notifying them to the debug info generator.

@adinn
Copy link
Collaborator

adinn commented Nov 16, 2021

Ok, I believe I have found out what is special about this case and it probably explains why processing of the substitution for Class::arrayType differs from processing for Class::getSimpleName0.

In jdk17 class Class implements an interface OfField<F>. The method arrayType() that is causing the problem here is declared by that interface. When I debugged the substitution processor I noticed that a a substitution happens twice for a method with name arrayType().

In the first case the call to AnnotationSubstitutionProcessor::handleAnnotatedMethodInSubstitutionClass registers a SubstitutionMethod mapping Class::arrayType() -> DynamicHub::arrayType() i.e. via the path at line 661. This SubstitutionMethod in response to the @Substitute annotation on method DynamicHub::arrayType().

In the second case the call to handleAnnotatedMethodInSubstitutionClass registers a SubstitutionMethod mapping Target_java_lang_invoke_TypeDescriptor_OfField.arrayType() -> TypeDescriptor$OfField.arrayType(). This goes via the path at line 663. There is no SusbtitutionMethod for this case because the annotation that drives the substitution is at class level on interface Target_java_lang_invoke_TypeDescriptor_OfField.

What I suspect is happening is that subsequent analysis is dragging all implementations of the original interface TypeDescriptor$OfField into the reachable methods set without correctly checking for and/or applying any necessary transformation to reduce the target set to the only implementation method we actually need i.e. the substitution method. Instead, we end up with the declared methods list containing the substitution method, the original method of Class and the annotated method of DynamicHub. By contrast, with the substitution Class::getSimpleName0 --> DymaicHub:getSimpleName0 we only have a SubstitutionMethod.

n.b. something in the analysis is detecting the equivalence of the original and annotated methods because they both end up with the same symbol -- a different one to that of the substitution method, note.

I can look further into this but it might be easier for someone who understands the substitution process better to look into why processing os interface substitutions might be dragging in unsubstituted implementations. @olpaw or @christianwimmer perhaps?

@olpaw olpaw self-assigned this Nov 16, 2021
@olpaw
Copy link
Member

olpaw commented Nov 16, 2021

I agree with @adinn that this is a bug on our side.

Looking at a few interesting entries of HostedType::getAllDeclaredMethods for Ljava/lang/Class; gives us:

HostedMethod<Class.arrayType -> AnalysisMethod<java.lang.Class.arrayType -> SubstitutionMethod<definition HotSpotMethod<Class.arrayType()>, implementation HotSpotMethod<DynamicHub.arrayType()>>>>
HostedMethod<Class.arrayType -> AnalysisMethod<java.lang.Class.arrayType -> HotSpotMethod<DynamicHub.arrayType()>>>
HostedMethod<Class.arrayType -> AnalysisMethod<java.lang.Class.arrayType -> HotSpotMethod<Class.arrayType()>>>
HostedMethod<Class.descriptorString -> AnalysisMethod<java.lang.Class.descriptorString -> HotSpotMethod<Class.descriptorString()>>>
HostedMethod<Class.elementType -> AnalysisMethod<java.lang.Class.elementType -> HotSpotMethod<Class.elementType()>>>
HostedMethod<Class.getComponentType -> AnalysisMethod<java.lang.Class.getComponentType -> SubstitutionMethod<definition HotSpotMethod<Class.getComponentType()>, implementation HotSpotMethod<DynamicHub.getComponentType()>>>>

As suspected for method arrayType() we do have 3 entries. This makes no sense. Something goes terribly wrong when we process the substitution:

    @Substitute
    @TargetElement(onlyWith = JDK17OrLater.class)
    @Override
    public DynamicHub arrayType() {
        return arrayHub;
    }

I will investigate further and provide a fix. cc @christianwimmer

@olpaw olpaw added this to the 22.0 milestone Nov 16, 2021
@fniephaus fniephaus changed the title [native image] debuginfotest fails on JDK17; assertion error on aarch64. [GR-35118] debuginfotest fails on JDK17; assertion error on aarch64. Nov 17, 2021
fniephaus added a commit to fniephaus/graal that referenced this issue Nov 17, 2021
Temporarily disabled until oracle#4018 / GR-35118 is fixed.
fniephaus added a commit to fniephaus/graal that referenced this issue Nov 17, 2021
Temporarily disabled until oracle#4018 / GR-35118 is fixed.
fniephaus added a commit to fniephaus/graal that referenced this issue Nov 17, 2021
Temporarily marked as expected failure until oracle#4018 / GR-35118 is fixed to unblock the public gates.
@zakkak
Copy link
Collaborator

zakkak commented Nov 19, 2021

Regarding the info func part of the issue. The tests where passing on my branch just before the merge of #3981 so the issue appears to got introduced in the following changes d0c7394...15cbb20 (left branch of 4550c69 which was the merge commit of #3981 that started failing) and specifically by ee638aa. Unfortunately I don't see how this change (ee638aa) could result in println(String) (or at least it's debug info) being affected.

Regarding the assertion error it looks like it first appeared with 7b5f831 and it's reproducible in both JDK 11 and JDK 17.

HTH

@adinn
Copy link
Collaborator

adinn commented Nov 19, 2021

I can certainly see how insertion of the IterativeConditionalEliminationPhase might mean that there is no longer an explicit call to println(String) that causes the method to be compiled. Conditional elimination can remove branches which include non-inlined calls. It can also change inlining decisions, causing calls to be inlined. This might be enough to ensure that all calls only exists inlined i.e. the method does not get compiled independently as a top level method. That would explain the observed behaviour. gbd only lists top-level compiled methods under the info command even though it knows that the method exists and contributes to inline code (i.e. to be more precise info lists methods with declaration and definition but not those with just a declaration).

What that probably means is that we cannot rely on being able to list println(String) using the info command. We should still be able to break it though and see /either/ an inline call or a /real/ call.

I am not surprised the culprit for the introduction of the repeated getDeclaredPublciMethods entries is to do with the reflection patch. I think the code that is adding method implementations probably needs to include a substitution mapping that has been omitted. I have my suspicions where this is going wrong but I'll leave it to @christianwimmer or @olpaw to fix as I am sure they can probably work out what is going wrong now that you have pin-pointed the specific change set that produces this result -- unless, of course, you want to have a go, Foivos :-)

@zakkak
Copy link
Collaborator

zakkak commented Nov 19, 2021

... gbd only lists top-level compiled methods under the info command even though it knows that the method exists and contributes to inline code ...

Uh right, that's what I was missing here (since we invoke println unconditionally in our HelloWorld) :)

What that probably means is that we cannot rely on being able to list println(String) using the info command. We should still be able to break it though and see /either/ an inline call or a /real/ call.

Unfortunately I can't b java.io.PrintStream::println(java.lang.String *) but if I do b java.io.PrintStream::println I can see it breaks in java.io.PrintStream::println(java.lang.String *)

I think I will move this part in a different issue!

@adinn
Copy link
Collaborator

adinn commented Nov 23, 2021

The assertion failure because of repeated methods with the same symbol is not just restricted to jdk17. I just observed the same problem on my master branch when the github CI ran debuginfotest for jdk11.

Saleint details excerpted from https://github.com/adinn/graal/runs/4289658938?check_suite_focus=true

[hello.hello:9148]        image:   1,820.01 ms,  2.47 GB
Fatal error: com.oracle.svm.core.util.VMError$HostedError: java.lang.AssertionError: JNIJavaCallWrappers_arrayJavaCallTrampoline_e88f80fdb12bb00579a2147f43f98ad32e12f77c
	at com.oracle.svm.core.util.VMError.shouldNotReachHere(VMError.java:72)
	at com.oracle.svm.hosted.NativeImageGenerator.doRun(NativeImageGenerator.java:646)
	at com.oracle.svm.hosted.NativeImageGenerator.run(NativeImageGenerator.java:492)
	...
Caused by: java.lang.AssertionError:    JNIJavaCallWrappers_arrayJavaCallTrampoline_e88f80fdb12bb00579a2147f43f98ad32e12f77c
at com.oracle.objectfile.debugentry.ClassEntry.indexMethodEntry(ClassEntry.java:181)
	...

Of course, this is happening with a method of a different class (jdk11 does not include interface TypeDescriptor$OfField). In this case there are no substitutions at play. The relevant details are

hostedType.toJavaName() => "com.oracle.svm.jni.JNIJavaCallWrappers"

hostedType.getAllDeclaredMethods()[0]) => HostedMethod<JNIJavaCallWrappers.arrayJavaCallTrampoline -> AnalysisMethod<com.oracle.svm.jni.JNIJavaCallWrappers.arrayJavaCallTrampoline -> com.oracle.svm.jni.hosted.JNICallTrampolineMethod@4c8d45cf>>

hostedType.getAllDeclaredMethods()[1]) => HostedMethod<JNIJavaCallWrappers.arrayJavaCallTrampoline -> AnalysisMethod<com.oracle.svm.jni.JNIJavaCallWrappers.arrayJavaCallTrampoline -> com.oracle.svm.jni.hosted.JNINativeCallWrapperMethod@1b3ea96c>>


hostedType.getAllDeclaredMethods()[0]).getName() => "arrayJavaCallTrampoline"

hostedType.getAllDeclaredMethods()[1]).getName() => "arrayJavaCallTrampoline"

hostedType.getAllDeclaredMethods()[0]).getSignature().toMethodDescriptor() ==> "()V"

hostedType.getAllDeclaredMethods()[1]).getSignature().toMethodDescriptor() ==> "()V"

hostedType.getAllDeclaredMethods()[0]).getUniqueShortName() ==> "JNIJavaCallWrappers_arrayJavaCallTrampoline_e88f80fdb12bb00579a2147f43f98ad32e12f77c"

hostedType.getAllDeclaredMethods()[1]).getUniqueShortName() ==> "JNIJavaCallWrappers_arrayJavaCallTrampoline_e88f80fdb12bb00579a2147f43f98ad32e12f77c"

So, we have two different HostedMethod objects listed as being declared by this class, a JNICallTrampolineMethod and a JNINativeCallWrapperMethod, both of which have the same method name and signature and the same unique short name.

I am not sure why this ambiguous naming scheme needs to be adopted. It is certainly a challenge as regards generating debug info. These names need to be used by the debug info generator during generation to distinguish elements of the compiled code base and their associated debug info. They are also used at by the debugger to distinguish elements of the generated info.

@adinn
Copy link
Collaborator

adinn commented Nov 26, 2021

I looked into the duplication that occurs for this case where we have a JNICallTrampolineMethod and a JNINativeCallWrapperMethod. This seems to be quite a weird situation.

The trampoline method exists to provide a target address for a foreign call into a Java method. It merely jumps to the corresponding Jav amethod code entry.

The native method is the usual wrapper that is the target for a Java call which does any required argument shuffling plus providing of the JNIEnv and calls the foreign implementation method.

So, in this case we must have a trampoline for foreign code to call a Java method which is actually native and will call out to foreign code.

In any case the problem is that the trampoline method is using the same original method name and unique short name as the method it is jumping to. One way of fixing this is to add an override for getName() in class JNICallTrampolineMethod:

@Override
public String getName() {
    return "__jni__" + original.getName();
}

This avoids the failure on jdk11.

I'm not clear whether this is a correct solution but it looks ok to me. Of course, it means the trampoline method that caused the error now has name __jni__arrayJavaCallTrampoline() and a short name derived from that name. I don't know if that is going to make nay difference as far as expectations about JNI linkage are concerned.

@olpaw @christianwimmer if you think my suggested solution to this part of the current problem is ok I'll be happy to raise a separate issue for this and post a PR.

@olpaw
Copy link
Member

olpaw commented Nov 26, 2021

Of course, it means the trampoline method that caused the error now has name __jni__arrayJavaCallTrampoline() and a short name derived from that name.

@peter-hofer are you aware of any place were we might depend on the current names of JNICallTrampolineMethods ?

@christianwimmer
Copy link

@adinn in general, you need to be prepared that you have multiple methods with the same fully qualified name.

We have a completely unnecessary restriction right now to disallow loading the two classes with the same name (in different class loaders of course). This restriction will go away soon. And even now there are corner cases (like with synthetic methods as you observed, or method substitutions) where you have multiple methods with the same fully qualified name.

I will look at the Class.arrayType problem this week, sorry for the delay on this.

@christianwimmer
Copy link

I looked into the duplication that occurs for this case where we have a JNICallTrampolineMethod and a JNINativeCallWrapperMethod. This seems to be quite a weird situation.

That is just a method that is declared native on the Java side, and also registered for invocation via JNI. It's probably not the most useful Java method to call from C (because it immediately calls back to C), but it is certainly allowed.

graalvmbot pushed a commit that referenced this issue Dec 7, 2021
Temporarily marked as expected failure until #4018 / GR-35118 is fixed to unblock the gates.
@adinn
Copy link
Collaborator

adinn commented Dec 7, 2021

@christianwimmer Thanks -- that has removed the one of the problems with the non-unique method names that is causing the debuginfo generator to fail.

@fniephaus Of course, the trampoline/native method name clash is still present. Unfortunately, I am on PTO right now and will not be back at work until early January. So, fixing the naming issue is not going to happen for some time. If we were to apply the patch I suggested earlier (append jni prefix to the generated trampoline method names) then this should allow the debug info generator to work properly and enable the debuginfotest gate to be reinstated. It will also allow Simon's Windows debug info patch to proceed. Can we try that as a temporary measure? I am sure @zakkak can prepare a patch.

@fniephaus
Copy link
Member Author

Can we try that as a temporary measure?

I will leave this to @christianwimmer and @olpaw to decide.

I only created #4097 to buy us some time to prepare and review a fix for this issue. If this happens soon, we don't need #4097. Since this issue has been causing the GitHub gates to fail for more than three weeks, it has unfortunately masked other issues that have popped up in the meantime. I'd like to avoid that.

@olpaw
Copy link
Member

olpaw commented Dec 9, 2021

@fniephaus I see no problem with appending _jni to the trampolines. If @zakkak provides a PR feel free to merge that to master and undo GR-35118 (Mark debuginfotest as expected failure). CC @peter-hofer

@zakkak
Copy link
Collaborator

zakkak commented Dec 9, 2021

👍 adding this to my to-do list :)

@peter-hofer
Copy link
Member

@olpaw @fniephaus @zakkak @adinn the best fix would be to handle these trampoline methods in JNINativeCallWrapperSubstitutionProcessor instead, please try that first.

@zakkak
Copy link
Collaborator

zakkak commented Dec 10, 2021

the best fix would be to handle these trampoline methods in JNINativeCallWrapperSubstitutionProcessor instead, please try that first.

@peter-hofer do I understand correctly that we would essentially like JNINativeCallWrapperSubstitutionProcessor to not create wrapper calls for trampolines?

i.e.:

diff --git a/substratevm/src/com.oracle.svm.jni/src/com/oracle/svm/jni/hosted/JNINativeCallWrapperSubstitutionProcessor.java b/substratevm/src/com.oracle.svm.jni/src/com/oracle/svm/jni/hosted/JNINativeCallWrapperSubstitutionProcessor.java
index ff6f342c431..545bd1943e8 100644
--- a/substratevm/src/com.oracle.svm.jni/src/com/oracle/svm/jni/hosted/JNINativeCallWrapperSubstitutionProcessor.java
+++ b/substratevm/src/com.oracle.svm.jni/src/com/oracle/svm/jni/hosted/JNINativeCallWrapperSubstitutionProcessor.java
@@ -41,6 +41,9 @@ class JNINativeCallWrapperSubstitutionProcessor extends SubstitutionProcessor {
     @Override
     public ResolvedJavaMethod lookup(ResolvedJavaMethod method) {
         assert method.isNative() : "Must have been registered as a native substitution processor";
+        if (method instanceof JNICallTrampolineMethod) {
+            return method;
+        }
         return callWrappers.computeIfAbsent(method, JNINativeCallWrapperMethod::new);
     }

Update: Our issue seems to be related to JNIJavaCallWrapperMethods being created and not JNINativeCallWrapperMethods. So it's not clear to me how we could handle the issue in JNINativeCallWrapperSubstitutionProcessor. Thanks in advance.

zakkak added a commit to zakkak/mandrel that referenced this issue Dec 10, 2021
`JNICallTrampolineMethod`s can not be invoked from Java, so creating
`JNINativeCallWrapperMethod`s for them is not needed.

Closes: oracle#4018
zakkak added a commit to zakkak/mandrel that referenced this issue Dec 10, 2021
`JNICallTrampolineMethod`s can not be invoked from Java, so creating
`JNINativeCallWrapperMethod`s for them is not needed.

Closes: oracle#4018
@peter-hofer
Copy link
Member

@zakkak the JNICallTrampolineMethod instances are registered for compilation independently, you shouldn't see them being passed to lookup. Instead you'll get some implementation of ResolvedJavaMethod from JVMCI and will need to check if the passed method is declared in JNIJavaCallWrappers and has one of the names as generated by JNIJavaCallWrappers.lookupJavaCallTrampoline. Rather than hardcoding these names, you should either enumerate them somewhere or move the trampolines to a different class so you can simply excluded all methods of that class.

Of course, that will mean that there will be no "implementation" for those "methods" which are reachable somehow (presumably registered due to some bug), but then you should see how they become reachable in the first place and can fix that problem.

Update: Our issue seems to be related to JNIJavaCallWrapperMethods being created and not JNINativeCallWrapperMethods. So it's not clear to me how we could handle the issue in JNINativeCallWrapperSubstitutionProcessor. Thanks in advance.

I don't think so, because they are assigned different names and not the name of the method they replace, see JNIJavaCallWrapperMethod.createName.

@peter-hofer
Copy link
Member

Of course, that will mean that there will be no "implementation" for those "methods" which are reachable somehow (presumably registered due to some bug), but then you should see how they become reachable in the first place and can fix that problem.

I just realized that after recent changes, we might consider these "methods" reachable because there are pointers reachable for them so this would not be a bug -- but also not correct because what is reachable are the trampolines and not the faux native methods we have for them, so there must be some unwrapping and lookups happening.
In that case, it shouldn't be necessary to create and register JNICallTrampolineMethod objects in JNIAccessFeature.createJavaCallTrampoline and should be possible to create the objects in the JNINativeCallWrapperSubstitutionProcessor instead.

zakkak added a commit to zakkak/mandrel that referenced this issue Dec 13, 2021
`JNICallTrampolineMethod`s can not be invoked from Java, so creating
`JNINativeCallWrapperMethod`s for them is not needed.

Closes: oracle#4018
zakkak added a commit to zakkak/mandrel that referenced this issue Dec 13, 2021
`JNICallTrampolineMethod`s can not be invoked from Java, so creating
`JNINativeCallWrapperMethod`s for them is not needed.

Closes: oracle#4018
@zakkak
Copy link
Collaborator

zakkak commented Dec 14, 2021

I just realized that after recent changes, we might consider these "methods" reachable because there are pointers reachable for them so this would not be a bug -- but also not correct because what is reachable are the trampolines and not the faux native methods we have for them, so there must be some unwrapping and lookups happening.

It looks like they are not always marked as reachable! In my experiments I need to do a couple of runs to replicate the issue.

In that case, it shouldn't be necessary to create and register JNICallTrampolineMethod objects in JNIAccessFeature.createJavaCallTrampoline and should be possible to create the objects in the JNINativeCallWrapperSubstitutionProcessor instead.

@peter-hofer I gave this a try and I am getting:

...
[3/7] Building universe...                                                                               (1.7s @ 1.43GB)
Fatal error: com.oracle.graal.pointsto.constraints.UnsupportedFeatureException: Unresolved method found: null. Probably there are some compilation or classpath problems. 
	at com.oracle.svm.hosted.meta.HostedUniverse.lookup(HostedUniverse.java:166)
	at com.oracle.svm.jni.functions.JNIFunctionTablesFeature.prepareCallTrampoline(JNIFunctionTablesFeature.java:192)
	at com.oracle.svm.jni.functions.JNIFunctionTablesFeature.fillJNIFunctionsTable(JNIFunctionTablesFeature.java:242)
	at com.oracle.svm.jni.functions.JNIFunctionTablesFeature.beforeCompilation(JNIFunctionTablesFeature.java:186)
	at com.oracle.svm.hosted.NativeImageGenerator.lambda$doRun$4(NativeImageGenerator.java:602)
	at com.oracle.svm.hosted.FeatureHandler.forEachFeature(FeatureHandler.java:74)
	at com.oracle.svm.hosted.NativeImageGenerator.doRun(NativeImageGenerator.java:602)
	at com.oracle.svm.hosted.NativeImageGenerator.run(NativeImageGenerator.java:495)
	at com.oracle.svm.hosted.NativeImageGeneratorRunner.buildImage(NativeImageGeneratorRunner.java:424)
	at com.oracle.svm.hosted.NativeImageGeneratorRunner.build(NativeImageGeneratorRunner.java:580)
	at com.oracle.svm.hosted.NativeImageGeneratorRunner.main(NativeImageGeneratorRunner.java:127)
	at com.oracle.svm.hosted.NativeImageGeneratorRunner$JDK9Plus.main(NativeImageGeneratorRunner.java:610)

To my understanding the trampolines need to be present in every application, no matter if their corresponding Java methods appear to be reachable (and thus substituted) or not. By moving the creation of the JNICallTrampolineMethods in JNINativeCallWrapperSubstitutionProcessor we essentially only create them when they are reachable (which is not always the case to my experience).

@zakkak
Copy link
Collaborator

zakkak commented Dec 14, 2021

Although I don't understand how it is transient and not consistent, the generation of the redundant JNINativeCallWrapperMethods for trampolines is the result of the following stack trace (line numbers might be off since this is from my branch):

lookup:51, JNINativeCallWrapperSubstitutionProcessor (com.oracle.svm.jni.hosted)
lookup:81, NativeMethodSubstitutionProcessor (com.oracle.svm.hosted.code)
lookup:140, SubstitutionProcessor$ChainedSubstitutionProcessor (com.oracle.graal.pointsto.infrastructure)
lookup:140, SubstitutionProcessor$ChainedSubstitutionProcessor (com.oracle.graal.pointsto.infrastructure)
lookup:140, SubstitutionProcessor$ChainedSubstitutionProcessor (com.oracle.graal.pointsto.infrastructure)
lookup:140, SubstitutionProcessor$ChainedSubstitutionProcessor (com.oracle.graal.pointsto.infrastructure)
lookup:140, SubstitutionProcessor$ChainedSubstitutionProcessor (com.oracle.graal.pointsto.infrastructure)
lookup:140, SubstitutionProcessor$ChainedSubstitutionProcessor (com.oracle.graal.pointsto.infrastructure)
lookupAllowUnresolved:419, AnalysisUniverse (com.oracle.graal.pointsto.meta)
lookup:398, AnalysisUniverse (com.oracle.graal.pointsto.meta)
lookup:443, AnalysisUniverse (com.oracle.graal.pointsto.meta)
getDeclaredMethods:1035, AnalysisType (com.oracle.graal.pointsto.meta)
findMethod:1040, AnalysisType (com.oracle.graal.pointsto.meta)
registerHidingSubTypeMethods:329, ReflectionDataBuilder (com.oracle.svm.reflect.hosted)
registerHidingSubTypeMethods:343, ReflectionDataBuilder (com.oracle.svm.reflect.hosted)
processMethodMetadata:295, ReflectionDataBuilder (com.oracle.svm.reflect.hosted)
duringAnalysis:205, ReflectionDataBuilder (com.oracle.svm.reflect.hosted)
duringAnalysis:203, ReflectionFeature (com.oracle.svm.reflect.hosted)
lambda$runPointsToAnalysis$10:704, NativeImageGenerator (com.oracle.svm.hosted)
accept:-1, 717490079 (com.oracle.svm.hosted.NativeImageGenerator$$Lambda$870)
forEachFeature:74, FeatureHandler (com.oracle.svm.hosted)
lambda$runPointsToAnalysis$11:704, NativeImageGenerator (com.oracle.svm.hosted)
apply:-1, 1137013089 (com.oracle.svm.hosted.NativeImageGenerator$$Lambda$509)
runAnalysis:755, PointsToAnalysis (com.oracle.graal.pointsto)
runPointsToAnalysis:702, NativeImageGenerator (com.oracle.svm.hosted)
doRun:537, NativeImageGenerator (com.oracle.svm.hosted)
run:495, NativeImageGenerator (com.oracle.svm.hosted)
buildImage:424, NativeImageGeneratorRunner (com.oracle.svm.hosted)
build:580, NativeImageGeneratorRunner (com.oracle.svm.hosted)
main:127, NativeImageGeneratorRunner (com.oracle.svm.hosted)
main:610, NativeImageGeneratorRunner$JDK9Plus (com.oracle.svm.hosted)

Note the registerHidingSubTypeMethods invocations (added in 7b5f831). In the first invocation the arguments passed to the method are:

  1. AnalysisMethod<java.lang.Object.<init> -> HotSpotMethod<Object.<init>()>> and
  2. AnalysisType<java.lang.Object, allocated: true, inHeap: true, reachable: true>,

in the second invocation they are:

  1. AnalysisMethod<java.lang.Object.<init> -> HotSpotMethod<Object.<init>()>> and
  2. AnalysisType<com.oracle.svm.jni.JNIJavaCallTrampolines, allocated: false, inHeap: false, reachable: true>

notice how in the second call the AnalysisType is JNIJavaCallTrampolines (my new class for keeping trampolines out of JNIJavaCallWrappers). This happens because JNIJavaCallTrampolines is a subtype of Object and it shouldn't be an issue, but registerHidingSubTypeMethods invokes findMethod which internally invokes getDeclaredMethods() which in turn invokes lookup and JNINativeCallWrapperSubstitutionProcessor#lookup for all the methods in JNIJavaCallTrampolines (or JNIJavaCallWrappers on master). This results in generating a JNINativeCallWrapperMethod for all methods in JNIJavaCallTrampolines (or JNIJavaCallWrappers on master).

The reason that JNIJavaCallTrampolines (or JNIJavaCallWrappers on master) is present in the universe appears to be the invocation to access.getUniverse().lookup(trampoline) in

access.registerAsCompiled(access.getUniverse().lookup(trampoline));

@peter-hofer
Copy link
Member

peter-hofer commented Dec 20, 2021

@zakkak I thought the trampolines become reachable through the MethodPointers to them that are created as part of the JNI data structures we prepare during the image build. Apparently that's not the case, so we need to keep registering the trampolines for compilation explicitly. We should still create their proper representation as JNICallTrampolineMethod via JNINativeCallWrapperSubstitutionProcessor.
Thank you for your looking into how the trampoline methods become reachable. I don't think registerHidingSubTypeMethods should apply to constructors, since constructors from a supertype are never returned for a subtype anyway, unlike static methods from a supertype which may be returned by getMethod. However, simply creating a JNINativeCallWrapperMethod does not mean that that method will also become reachable and will be AOT-compiled, unless they end up getting registered for reflection this way. @loicottet @vjovanov

graalvmbot pushed a commit that referenced this issue Jan 3, 2022
Temporarily marked as expected failure until #4018 / GR-35118 is fixed to unblock the gates.
@peter-hofer
Copy link
Member

@loicottet @vjovanov I had a closer look at this and registerHidingSubTypeMethods / findMethod / getDeclaredMethods do not operate on constructors, so no trouble there. The meta access lookups on the trampoline methods will still be done, which is not a bug.

zakkak added a commit to zakkak/mandrel that referenced this issue Jan 14, 2022
`JNICallTrampolineMethod`s can not be invoked from Java, so creating
`JNINativeCallWrapperMethod`s for them is not needed.

Closes: oracle#4018
zakkak added a commit to zakkak/mandrel that referenced this issue Jan 24, 2022
`JNICallTrampolineMethod`s can not be invoked from Java, so creating
`JNINativeCallWrapperMethod`s for them is not needed.

Closes: oracle#4018
zakkak added a commit to zakkak/mandrel that referenced this issue Jan 24, 2022
`JNICallTrampolineMethod`s can not be invoked from Java, so creating
`JNINativeCallWrapperMethod`s for them is not needed.

Closes: oracle#4018
zakkak added a commit to graalvm/mandrel that referenced this issue Jan 25, 2022
`JNICallTrampolineMethod`s can not be invoked from Java, so creating
`JNINativeCallWrapperMethod`s for them is not needed.

Closes: oracle#4018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants