-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jtreg/FFI] Crash with TestUpcallStack.java on AArch64 macOS #16336
Comments
Stack backtrace in the debugger:
|
Correction: |
Please ignore the assertion failure with |
Hi @knn-k, your dump already indicates the test case
Based on the trace above, the dispatcher was not yet invoked by the thunk code; otherwise, the trace should be something like:
against the java trace at (which I reproduced on my side)
So it might be helpful to double-check the thunk code against the test case to see what happened in there. |
@ChengJin01 |
Output from a build with PR #16332 from https://openj9-jenkins.osuosl.org/job/Build_JDK19_aarch64_mac_Personal/123/.
|
My understanding is that the broken |
This failure have been there at least for these two weeks. I tried the following nightly builds, and they all crashed at
|
It might exist from the very beginning when the thunk generation code is merged via #15744 as all FFI related Jtreg test suites are still disabled for now via https://github.com/adoptium/aqa-tests (till all issues are resolved), which is why we are manually verifying these tests so as to fix all detected issues after all thunk generation code is implemented. |
Usually the return value from |
How is the upcall thunk address used after Setting a breakpoint at
|
The upcall thunk address is just returned to applications after wrapping it up as a MemorySegment at openj9/runtime/vm/BytecodeInterpreter.hpp Line 5145 in f21a361
So please check the last argument of openj9/runtime/vm/BytecodeInterpreter.hpp Line 5111 in f21a361
cb (the last one) is a valid address or not. If so, the issue might come from libffi/Aarch64 (which means cb in this test case is not correctly handled even with the latest version via #16252); otherwise, there was problem with the passed-in arguments themselves.
|
The upcall thunk address is unchanged and correct in |
If it is proved to be a liffi/Aarch64 specific issue, we need to raise an issue at https://github.com/libffi/libffi and send a bug report via https://sourceware.org/mailman/listinfo/libffi-discuss/ to request them to fix it up on Aarch64. In addition, I am wondering this is the only bug detected on macOS/Aarch64. Might need to double-check the rest of the tests in |
I had to disable 36 testcases in TestUpcallStack.java for skipping the broken
AArch64 Linux passes all the tests in TestUpcallStack.java when #16332 is applied. |
Could you list all your disabled test cases? coz the libffi developers will need to ensure their fix works for all these cases (probably with different combinations of arguments plus the function pointer) if they get started to investigate the issue. |
The list of testcases that have the broken sf0_V_FS_FFI, sf0_V_SF_FII, sf3_V_FSI_II, sf3_V_FSI_IIF, sf3_V_FSF_IF, sf3_V_FSD_FFI, sf3_V_FSP_IFI, sf3_V_FSS_IFF, sf4_V_DFS_FII, sf5_V_PSI_IFI, sf6_V_PSF_IFF, sf7_V_SFD_FII, sf7_V_SFP_III, sf7_V_SFP_FIF, sf7_V_SFS_IIF, sf9_V_SSD_I, sf10_V_SSS_FII, sf11_S_SF_FFI, sf12_I_ISD_I, sf12_I_ISS_FII, sf13_F_FSI_FI, sf13_F_FSI_IFF, sf14_D_DFS_FFI, sf14_D_DSF_FII, sf15_P_PIS_FII, sf17_S_SIP_FII, sf17_S_SIS_III, sf17_S_SIS_FIF, sf17_S_SFD_FFI, sf17_S_SFP_IFI, sf17_S_SFS_IFF, sf19_S_SSI_FII, sf19_S_SSF_III, sf19_S_SSF_FIF, sf19_S_SSD_IIF, sf20_S_SSS_FFI |
Based on the debugging results on my side, the test failures detected at #16336 (comment) belongs to the same issue: all arguments after the first struct argument got messed up, whether they are float/double/int or struct.
Even at #16336 (comment) with the broken
So they are fundamentally the same issue with different cases (as long as the 1st struct argument shows up in the argument list, |
Will need to put everything together to sent bug report to https://sourceware.org/mailman/listinfo/libffi-discuss/ for fix given it has nothing to do with our code. For now, |
Maybe we need to create a simple, stand-alone testcase that does not depend on Java for reporting this issue to the libffi community? |
I wrote a program that demonstrates the failure with Running it on AArch64 macOS with libffi 3.4.4 generates the following output:
Running the same program on AArch64 Linux looks like this, which is as expected:
|
I ran the following commands for building and running my sample program on AArch64 macOS:
|
https://sourceware.org/libffi/ does not list AArch64 macOS as a supported platform. |
Using the latest source tree from https://github.com/libffi/libffi instead of v3.4.4 also gives a wrong result on AArch64 macOS. |
That's what they usually expect us to do but they should be able to look at the issue as everything is in the open.
That looks weird to me but we can check with them to confirm whether the code on
You don't have to check |
I just opened an issue at libffi/libffi#750 and also contacted one of the project maintainers over Slack & by e-mail to see whether they can come back to us with solutions to the problem. |
The condition for some of the failures seems to be:
|
Here is another example. sample2.c.txt It calls
The callee function expects |
My previous two comments can explain the reasons of the failures in #16336 (comment). They are related to padding and alignment of struct arguments in the stack on macOS.
|
I opened a draft PR #16362 as a fix for this problem. |
Failure from my local testing.
One of FFI upcall tests, java/foreign/TestUpcallStack.java, crashes on AArch64 macOS using the runtime and the test image from https://openj9-jenkins.osuosl.org/job/Build_JDK19_aarch64_mac_Nightly/123/ as shown below. 100% reproducible.
A broken value (
NULL
in the case above) is passed ascb
to the functionsf0_V_FS_FFI
in test/jdk/java/foreign/libTestUpcallStack.c, and it causes the crash.This failure is not observed on AArch64 Linux.
The text was updated successfully, but these errors were encountered: