Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Jtreg/FFI]Crash in getArgPointer() in upcall on Aarch64 #16237

Closed
ChengJin01 opened this issue Nov 1, 2022 · 17 comments · Fixed by #16332
Closed

[Jtreg/FFI]Crash in getArgPointer() in upcall on Aarch64 #16237

ChengJin01 opened this issue Nov 1, 2022 · 17 comments · Fixed by #16332
Assignees
Labels
arch:aarch64 comp:jit jdk19 project:panama Used to track Project Panama related work
Milestone

Comments

@ChengJin01
Copy link

ChengJin01 commented Nov 1, 2022

The dump indicates the crash occurred in getArgPointer() in upcall on both Linux/Aarch64 and macOS/Aarch64 when running https://github.com/ibmruntimes/openj9-openjdk-jdk19/blob/openj9/test/jdk/java/foreign/TestUpcallAsync.java

test TestUpcallAsync.testUpcallsAsync(425, "f0_V_PS_PII", VOID, [POINTER, STRUCT], [POINTER, INT, INT]): success
test TestUpcallAsync.testUpcallsAsync(442, "f0_V_SI_F", VOID, [STRUCT, INT], [FLOAT]): success
test TestUpcallAsync.testUpcallsAsync(459, "f0_V_SI_PD", VOID, [STRUCT, INT], [POINTER, DOUBLE]): success
----------System.err:(11/1659)----------
JVMDUMP039I Processing dump event "abort", detail "" at 2022/10/31 18:36:14 - please wait.
...
#13 <signal handler called>
#14 0x0000ffffaaa10c1c in native2InterpJavaUpcallImpl (data=0xffff1c20bed0, argsListPointer=<optimized out>) at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/openj9/runtime/vm/UpcallVMHelpers.cpp:292
#15 0x0000ffffaae3d5a4 in ?? ()

#14 0x0000ffffaaa10c1c in native2InterpJavaUpcallImpl (data=0xffff1c20bed0, argsListPointer=<optimized out>) at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/openj9/runtime/vm/UpcallVMHelpers.cpp:292
292                                     I_64 argValue = *(I_64*)getArgPointer(nativeSig, argsListPointer, argIndex);

(gdb) list
286                             case J9_FFI_UPCALL_SIG_TYPE_INT32:
287                             case J9_FFI_UPCALL_SIG_TYPE_FLOAT:
288                             {
289                                     /* Convert the argument value to 64 bits prior to the 32-bit conversion to get the actual value
290                                      * in the case of boolean/byte/char/short/int regardless of the endianness on platforms.
291                                      */
292                                     I_64 argValue = *(I_64*)getArgPointer(nativeSig, argsListPointer, argIndex); <-------
293     #if !defined(J9VM_ENV_LITTLE_ENDIAN)

in which the failing test might be test TestUpcallAsync.testUpcallsAsync(476, "f0_V_SI_IPP", VOID, [STRUCT, INT], [INT, POINTER, POINTER]) based on the results from other platforms.

Note:
All upcall specific test suites with the same subtests at https://github.com/ibmruntimes/openj9-openjdk-jdk19/blob/openj9/test/jdk/java/foreign crashed at the same place, including:

TestUpcallAsync.java
TestUpcallScope.java
TestUpcallStack.java
TestUpcallHighArity.java
@ChengJin01 ChengJin01 added comp:jit project:panama Used to track Project Panama related work arch:aarch64 jdk19 labels Nov 1, 2022
@ChengJin01
Copy link
Author

FYI: @knn-k, @0xdaryl, @tajila, @pshipton

@ChengJin01
Copy link
Author

ChengJin01 commented Nov 1, 2022

Hi @knn-k, could you help determine what happened to getArgPointer() in upcall on Aaarch64 given it is part of the thunk generation code? Thanks.

@ChengJin01 ChengJin01 changed the title [Jtreg/FFI] Crash in getArgPointer() in upcall on Aarch64 [Jtreg/FFI]Crash in getArgPointer() in upcall on Aarch64 Nov 1, 2022
@pshipton pshipton added this to the Java 19 milestone Nov 1, 2022
@knn-k
Copy link
Contributor

knn-k commented Nov 2, 2022

I tried to run the test on macOS, but I got an UnsatisfiedLinkError with loadLibrary() as shown below.

% JDK_CUSTOM_TARGET=java/foreign/TestUpcallAsync.java make _jdk_custom_1
        (... snip ...)
org.testng.TestNGException:
An error occurred while instantiating class TestUpcallAsync: Can't load TestUpcall
        at org.testng.internal.InstanceCreator.createInstanceUsingObjectFactory(InstanceCreator.java:123)
        at org.testng.internal.InstanceCreator.createInstance(InstanceCreator.java:79)
        at org.testng.internal.ClassImpl.getDefaultInstance(ClassImpl.java:109)
        at org.testng.internal.ClassImpl.getInstances(ClassImpl.java:167)
        at org.testng.TestClass.getInstances(TestClass.java:102)
        (... snip ...)
Caused by: java.lang.UnsatisfiedLinkError: Can't load TestUpcall
        at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:1825)
        at java.base/java.lang.System.loadLibrary(System.java:783)
        at TestUpcallAsync.<clinit>(TestUpcallAsync.java:56)
        (... snip ...)

Where should the TestUpcall shared library be? How is it built?
I searched for it, but I couldn't find it in my environment.

@pshipton
Copy link
Member

pshipton commented Nov 2, 2022

@ChengJin01
Copy link
Author

ChengJin01 commented Nov 2, 2022

Hi @knn-k,

you could choose to download the test image from the link nightly build as Peter mentioned above or to compile a build locally in which all native libraries intended for jtreg tests will be generated at openj9-openjdk-jdk19/build/macosx-aarch64-server-release/images/test/jdk/jtreg/native (in the case of macOS/Aarch64):

$ ls  openj9-openjdk-jdk19/build/macosx-aarch64-server-release/images/test/jdk/jtreg/native
BasicSleep                              libNativeThread.dylib
CallerAccessTest                        libSafeAccess.dylib
JliLaunchTest                           libStackWalk.dylib
JniInvocationTest                       libTestDowncall.dylib
LibraryCache                            libTestDowncallStack.dylib
NullCallerTest                          libTestDynamicStore.dylib
libAsyncInvokers.dylib                  libTestMainKeyWindow.dylib
libAsyncStackWalk.dylib                 libTestUpcall.dylib <--------- the native library in your case
libBasicJNI.dylib                       libTestUpcallHighArity.dylib
libDirectIO.dylib                       libTestUpcallStack.dylib
libExplicitAttach.dylib                 libTestUpcallStructScope.dylib
...

All you need to do next is to move or copy build/macosx-aarch64-server-release/images/test to aqa-tests/openjdkbinary/openjdk-test-image to ensure the native library is located/loaded correctly by the specified test suite.

@knn-k
Copy link
Contributor

knn-k commented Nov 7, 2022

I reproduced the crash locally, and found that getArgPointer() returned a wrong value for a certain isPointerToStruct case.

@knn-k
Copy link
Contributor

knn-k commented Nov 7, 2022

I opened PR #16268 as a (partial) fix. TestUpcallStack.java still seems to fail.

@ChengJin01
Copy link
Author

ChengJin01 commented Nov 7, 2022

Hi @knn-k, please make sure you rebase your branch with the merged changes for libffi/Aarch64 at #16252.

I verified with your PR at #16268 and all upcall specific test suites passed on Aarch64, including

TestUpcallAsync.java
TestUpcallScope.java
TestUpcallStack.java

except TestUpcallHighArity.java which always fails with the exception as follows:

test TestUpcallHighArity.testUpcall(java.lang.invoke.BoundMethodHandle$Species_LLLLLLLLLLL
@770aa6bd, java.lang.invoke.MethodType@cadb5379, java.lang.foreign.FunctionDescriptor@d2b128c9):
failure java.lang.AssertionError: For index 11
expected [MemoryAddress{ offset=0x12eb0aa60 }] 
but found [MemoryAddress{ offset=0x4038000000000000 }] <-----------
        at org.testng.Assert.fail(Assert.java:99)
        at org.testng.Assert.failNotEquals(Assert.java:1037)
        at org.testng.Assert.assertEqualsImpl(Assert.java:140)
        at org.testng.Assert.assertEquals(Assert.java:122)
        at TestUpcallHighArity.testUpcall(TestUpcallHighArity.java:120)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:578)
...

According to the test suite (confirmed in debugging), 0x4038000000000000 is a double value specified in the test helper code at https://github.com/ibmruntimes/openj9-openjdk-jdk19/blob/110e1e7808efdb0cdd180dfb12a22843777f3d3c/test/jdk/java/foreign/CallGeneratorHelper.java#L418. So the test excepted the argument at index 11 to be an address but it ended up with a double value, which probably shared something similar on zLinux at #16214 in such case (incorrectly decoding address to double in upcall).

@ChengJin01
Copy link
Author

ChengJin01 commented Nov 7, 2022

Debugging shows the problem still came from getArgPointer() in the case of J9_FFI_UPCALL_SIG_TYPE_POINTER:

Thread 27 "MainThread" hit Breakpoint 4, do_upcall (cb=0xfffff7fee028, 
a0=..., a1=42, a2=24, a3=0xffff6016b610, a4=..., a5=42, a6=24, 
a7=0xffff6016c540, a8=..., a9=42, a10=24, a11=0xffff6016c600, <---------
a12=..., a13=42, a14=24, a15=0xffff6016c6c0) at test/jdk/java/foreign/libTestUpcallHighArity.c:37
37          cb(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15);
(gdb) p  a11
$8 = (void *) 0xffff6016c600 <------ it means the passed-in pointer in downcall was correct.

Thread 27 "MainThread" hit Breakpoint 3, native2InterpJavaUpcallImpl (data=0xffff6005d340, 
argsListPointer=<optimized out>) at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/
openj9/runtime/vm/UpcallVMHelpers.cpp:311
309			case J9_FFI_UPCALL_SIG_TYPE_POINTER:
310			{
311                                     I_64  offset = *(I_64*)getArgPointer(nativeSig, argsListPointer, argIndex); <-------
(gdb) p  argIndex
$5 = 11
(gdb) n
312                                     j9object_t memAddrObject = createMemAddressObject(data, offset);
(gdb) p/x offset
$7 = 0x4038000000000000  <---- a double value returned for the pointer type "void* a11"

@tajila
Copy link
Contributor

tajila commented Nov 10, 2022

@knn-k Do you think this will be resolved within 2 weeks?

@knn-k
Copy link
Contributor

knn-k commented Nov 10, 2022

Yes, I believe so.

@knn-k
Copy link
Contributor

knn-k commented Nov 13, 2022

createUpcallThunk() generates the code to save register d3 (for a14) right after d2 (for a10) in the extended stack. The caller passes the arguments a11, a12, and a13 in the original stack because GPRs x0 - x7 have been used by arguments a0 - a1, a3 - a5, and a7 - a9.
On the other hand, getArgPointer() thinks the slot for a11 is located after the slot for a10, where the thunk saved a14. This is the reason for the failure.
getArgPointer() needs to take such GPR/FPR/stack mix case into consideration.

@knn-k
Copy link
Contributor

knn-k commented Nov 16, 2022

PR #16332 fixes the failure with TestUpcallHighArity.java.
I ran the following tests with the fix on Linux and macOS.

TestUpcallAsync.java
TestUpcallHighArity.java
TestUpcallScope.java
TestUpcallStack.java
Jep424Tests_testLinkerFfi_UpCall (in functional test)

TestUpcallStack.java on macOS failed, but that is not caused by the fix. All other tests passed.

@ChengJin01
Copy link
Author

ChengJin01 commented Nov 16, 2022

Hi @knn-k, I'd like to know what test failures with TestUpcallStack.java coz I already verified it as mentioned at #16237 (comment) before the fix at #16332.

@knn-k
Copy link
Contributor

knn-k commented Nov 16, 2022

@ChengJin01
I can reproduce the failures with TestupcallStack.java using the latest nightly build for macOS from https://openj9-jenkins.osuosl.org/job/Build_JDK19_aarch64_mac_Nightly/123/ as shown below.

test TestUpcallStack.testUpcallsStack(0, "f0_V__", VOID, [], []): success
test TestUpcallStack.testUpcallsStack(17, "f0_V_S_DI", VOID, [STRUCT], [DOUBLE, INT]): success
test TestUpcallStack.testUpcallsStack(34, "f0_V_S_IDF", VOID, [STRUCT], [INT, DOUBLE, FLOAT]): success
test TestUpcallStack.testUpcallsStack(51, "f0_V_S_FDD", VOID, [STRUCT], [FLOAT, DOUBLE, DOUBLE]): success
test TestUpcallStack.testUpcallsStack(68, "f0_V_S_DDP", VOID, [STRUCT], [DOUBLE, DOUBLE, POINTER]): success
test TestUpcallStack.testUpcallsStack(85, "f0_V_S_PPI", VOID, [STRUCT], [POINTER, POINTER, INT]): success
test TestUpcallStack.testUpcallsStack(102, "f0_V_IS_FF", VOID, [INT, STRUCT], [FLOAT, FLOAT]): failure
java.lang.AssertionError: expected [12.0] but found [0.0]
        at org.testng.Assert.fail(Assert.java:99)
        (... snip ...)
test TestUpcallStack.testUpcallsStack(204, "f0_V_FS_IIP", VOID, [FLOAT, STRUCT], [INT, INT, POINTER]): success
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000002
Handler1=0000000102A32C50 Handler2=0000000102C061B0 InaccessibleAddress=0000000000000000
(... snip ...)

#16332 fixes the assertion failure with "f0_V_IS_FF".
The crash after "f0_V_FS_IIP" happens when "sf0_V_FS_FFI" calls the callback function. It is reproducible on macOS with or without #16332.

I would like #16332 to be reviewed anyway.

@knn-k
Copy link
Contributor

knn-k commented Nov 16, 2022

TestupcallStack.java passes on Linux, with or without #16332.

@knn-k
Copy link
Contributor

knn-k commented Nov 17, 2022

I opened a separate issue for the crash with TestupcallStack.java on macOS as #16336, because it is not related to getArgPointer().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch:aarch64 comp:jit jdk19 project:panama Used to track Project Panama related work
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants