-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GR-35118] debuginfotest fails on JDK17; assertion error on aarch64. #4018
Comments
It seems that graal/substratevm/mx.substratevm/testhello.py Line 308 in e9bddb1
to rexp = r"%svoid java.io.PrintStream::println\(java\.lang\.Object \*\)"%maybe_spaces_pattern let's the test pass again. Not sure if that's the right thing to do, maybe the wrong |
Yes, debug info is currently supported on AArch64. The assert and the gdb test failure are both a bit peculiar. Perhaps the two failures may well be related but I am not sure.
The expected output is described in the comment text:
The check call accepts a regexp for the call with arg type
n.b. the check call accepts an argument So, if the check passes when you change the regexp from Can you confirm that this is can be reproduced simply by using jdk17 with the current head? |
Thanks for confirming.
That's why I initially created the public issue against @zakkak. If you could help with a proper fix, that'd be great.
Yes, I build against e9bddb1 from a checkout I pulled today. |
Hi Fabio, I found out why the assert is happening. When building up debug info for a HostedType that is included in the hosted universe (i.e. the ones listed by The difference in jdk17 that ends up causing the problem relates to the presence of a new method of
When Actually, the problem is actually a bit worse than that described above. The list returned by As far as I am aware, this repeated listing of the original and substituted methods (never mind the substitution method) never happened with jdk11u. For example, method I have not yet been able to reproduce the problem with |
@olpaw Do you have any idea why this is failing with jdk17 and whether or not it is legitimate. I could ignore the repeated use of the symbol. However, I think it would be wrong to list all 3 variants of this method rather than list the target version that is actually going to be used. My feeling is that |
Ok, I believe I have found out what is special about this case and it probably explains why processing of the substitution for In jdk17 class In the first case the call to In the second case the call to handleAnnotatedMethodInSubstitutionClass registers a SubstitutionMethod mapping What I suspect is happening is that subsequent analysis is dragging all implementations of the original interface n.b. something in the analysis is detecting the equivalence of the original and annotated methods because they both end up with the same symbol -- a different one to that of the substitution method, note. I can look further into this but it might be easier for someone who understands the substitution process better to look into why processing os interface substitutions might be dragging in unsubstituted implementations. @olpaw or @christianwimmer perhaps? |
I agree with @adinn that this is a bug on our side. Looking at a few interesting entries of
As suspected for method @Substitute
@TargetElement(onlyWith = JDK17OrLater.class)
@Override
public DynamicHub arrayType() {
return arrayHub;
} I will investigate further and provide a fix. cc @christianwimmer |
Temporarily disabled until oracle#4018 / GR-35118 is fixed.
Temporarily disabled until oracle#4018 / GR-35118 is fixed.
Temporarily marked as expected failure until oracle#4018 / GR-35118 is fixed to unblock the public gates.
Regarding the Regarding the assertion error it looks like it first appeared with 7b5f831 and it's reproducible in both JDK 11 and JDK 17. HTH |
I can certainly see how insertion of the IterativeConditionalEliminationPhase might mean that there is no longer an explicit call to println(String) that causes the method to be compiled. Conditional elimination can remove branches which include non-inlined calls. It can also change inlining decisions, causing calls to be inlined. This might be enough to ensure that all calls only exists inlined i.e. the method does not get compiled independently as a top level method. That would explain the observed behaviour. gbd only lists top-level compiled methods under the info command even though it knows that the method exists and contributes to inline code (i.e. to be more precise info lists methods with declaration and definition but not those with just a declaration). What that probably means is that we cannot rely on being able to list println(String) using the info command. We should still be able to break it though and see /either/ an inline call or a /real/ call. I am not surprised the culprit for the introduction of the repeated getDeclaredPublciMethods entries is to do with the reflection patch. I think the code that is adding method implementations probably needs to include a substitution mapping that has been omitted. I have my suspicions where this is going wrong but I'll leave it to @christianwimmer or @olpaw to fix as I am sure they can probably work out what is going wrong now that you have pin-pointed the specific change set that produces this result -- unless, of course, you want to have a go, Foivos :-) |
Uh right, that's what I was missing here (since we invoke println unconditionally in our HelloWorld) :)
Unfortunately I can't I think I will move this part in a different issue! |
The assertion failure because of repeated methods with the same symbol is not just restricted to jdk17. I just observed the same problem on my master branch when the github CI ran debuginfotest for jdk11. Saleint details excerpted from https://github.com/adinn/graal/runs/4289658938?check_suite_focus=true
Of course, this is happening with a method of a different class (jdk11 does not include interface
So, we have two different HostedMethod objects listed as being declared by this class, a JNICallTrampolineMethod and a JNINativeCallWrapperMethod, both of which have the same method name and signature and the same unique short name. I am not sure why this ambiguous naming scheme needs to be adopted. It is certainly a challenge as regards generating debug info. These names need to be used by the debug info generator during generation to distinguish elements of the compiled code base and their associated debug info. They are also used at by the debugger to distinguish elements of the generated info. |
I looked into the duplication that occurs for this case where we have a JNICallTrampolineMethod and a JNINativeCallWrapperMethod. This seems to be quite a weird situation. The trampoline method exists to provide a target address for a foreign call into a Java method. It merely jumps to the corresponding Jav amethod code entry. The native method is the usual wrapper that is the target for a Java call which does any required argument shuffling plus providing of the JNIEnv and calls the foreign implementation method. So, in this case we must have a trampoline for foreign code to call a Java method which is actually native and will call out to foreign code. In any case the problem is that the trampoline method is using the same original method name and unique short name as the method it is jumping to. One way of fixing this is to add an override for getName() in class JNICallTrampolineMethod:
This avoids the failure on jdk11. I'm not clear whether this is a correct solution but it looks ok to me. Of course, it means the trampoline method that caused the error now has name __jni__arrayJavaCallTrampoline() and a short name derived from that name. I don't know if that is going to make nay difference as far as expectations about JNI linkage are concerned. @olpaw @christianwimmer if you think my suggested solution to this part of the current problem is ok I'll be happy to raise a separate issue for this and post a PR. |
@peter-hofer are you aware of any place were we might depend on the current names of |
@adinn in general, you need to be prepared that you have multiple methods with the same fully qualified name. We have a completely unnecessary restriction right now to disallow loading the two classes with the same name (in different class loaders of course). This restriction will go away soon. And even now there are corner cases (like with synthetic methods as you observed, or method substitutions) where you have multiple methods with the same fully qualified name. I will look at the |
That is just a method that is declared |
Temporarily marked as expected failure until #4018 / GR-35118 is fixed to unblock the gates.
@christianwimmer Thanks -- that has removed the one of the problems with the non-unique method names that is causing the debuginfo generator to fail. @fniephaus Of course, the trampoline/native method name clash is still present. Unfortunately, I am on PTO right now and will not be back at work until early January. So, fixing the naming issue is not going to happen for some time. If we were to apply the patch I suggested earlier (append jni prefix to the generated trampoline method names) then this should allow the debug info generator to work properly and enable the debuginfotest gate to be reinstated. It will also allow Simon's Windows debug info patch to proceed. Can we try that as a temporary measure? I am sure @zakkak can prepare a patch. |
I will leave this to @christianwimmer and @olpaw to decide. I only created #4097 to buy us some time to prepare and review a fix for this issue. If this happens soon, we don't need #4097. Since this issue has been causing the GitHub gates to fail for more than three weeks, it has unfortunately masked other issues that have popped up in the meantime. I'd like to avoid that. |
@fniephaus I see no problem with appending |
👍 adding this to my to-do list :) |
@olpaw @fniephaus @zakkak @adinn the best fix would be to handle these trampoline methods in |
@peter-hofer do I understand correctly that we would essentially like i.e.: diff --git a/substratevm/src/com.oracle.svm.jni/src/com/oracle/svm/jni/hosted/JNINativeCallWrapperSubstitutionProcessor.java b/substratevm/src/com.oracle.svm.jni/src/com/oracle/svm/jni/hosted/JNINativeCallWrapperSubstitutionProcessor.java
index ff6f342c431..545bd1943e8 100644
--- a/substratevm/src/com.oracle.svm.jni/src/com/oracle/svm/jni/hosted/JNINativeCallWrapperSubstitutionProcessor.java
+++ b/substratevm/src/com.oracle.svm.jni/src/com/oracle/svm/jni/hosted/JNINativeCallWrapperSubstitutionProcessor.java
@@ -41,6 +41,9 @@ class JNINativeCallWrapperSubstitutionProcessor extends SubstitutionProcessor {
@Override
public ResolvedJavaMethod lookup(ResolvedJavaMethod method) {
assert method.isNative() : "Must have been registered as a native substitution processor";
+ if (method instanceof JNICallTrampolineMethod) {
+ return method;
+ }
return callWrappers.computeIfAbsent(method, JNINativeCallWrapperMethod::new);
} Update: Our issue seems to be related to |
`JNICallTrampolineMethod`s can not be invoked from Java, so creating `JNINativeCallWrapperMethod`s for them is not needed. Closes: oracle#4018
`JNICallTrampolineMethod`s can not be invoked from Java, so creating `JNINativeCallWrapperMethod`s for them is not needed. Closes: oracle#4018
@zakkak the Of course, that will mean that there will be no "implementation" for those "methods" which are reachable somehow (presumably registered due to some bug), but then you should see how they become reachable in the first place and can fix that problem.
I don't think so, because they are assigned different names and not the name of the method they replace, see |
I just realized that after recent changes, we might consider these "methods" reachable because there are pointers reachable for them so this would not be a bug -- but also not correct because what is reachable are the trampolines and not the faux |
`JNICallTrampolineMethod`s can not be invoked from Java, so creating `JNINativeCallWrapperMethod`s for them is not needed. Closes: oracle#4018
`JNICallTrampolineMethod`s can not be invoked from Java, so creating `JNINativeCallWrapperMethod`s for them is not needed. Closes: oracle#4018
It looks like they are not always marked as reachable! In my experiments I need to do a couple of runs to replicate the issue.
@peter-hofer I gave this a try and I am getting:
To my understanding the trampolines need to be present in every application, no matter if their corresponding Java methods appear to be reachable (and thus substituted) or not. By moving the creation of the |
Although I don't understand how it is transient and not consistent, the generation of the redundant
Note the
in the second invocation they are:
notice how in the second call the AnalysisType is The reason that graal/substratevm/src/com.oracle.svm.jni/src/com/oracle/svm/jni/access/JNIAccessFeature.java Line 194 in 229aea5
|
@zakkak I thought the trampolines become reachable through the |
Temporarily marked as expected failure until #4018 / GR-35118 is fixed to unblock the gates.
@loicottet @vjovanov I had a closer look at this and |
`JNICallTrampolineMethod`s can not be invoked from Java, so creating `JNINativeCallWrapperMethod`s for them is not needed. Closes: oracle#4018
`JNICallTrampolineMethod`s can not be invoked from Java, so creating `JNINativeCallWrapperMethod`s for them is not needed. Closes: oracle#4018
`JNICallTrampolineMethod`s can not be invoked from Java, so creating `JNINativeCallWrapperMethod`s for them is not needed. Closes: oracle#4018
`JNICallTrampolineMethod`s can not be invoked from Java, so creating `JNINativeCallWrapperMethod`s for them is not needed. Closes: oracle#4018
After switching the vm-native gates from JDK 8 to JDK 17, the debuginfotest started to fail on JDK17. To debug the issue, I tried to reproduce it on an aarch64 machine. However, image generation crashed due to an
AssertionError
(see below).@zakkak is that expected/a known issue? Is debuginfo supported on aarch64?
The text was updated successfully, but these errors were encountered: