Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jtreg/FFI] Crash with TestUpcallStack.java on AArch64 macOS #16336

Closed
knn-k opened this issue Nov 17, 2022 · 31 comments · Fixed by #16362
Closed

[jtreg/FFI] Crash with TestUpcallStack.java on AArch64 macOS #16336

knn-k opened this issue Nov 17, 2022 · 31 comments · Fixed by #16362
Labels
arch:aarch64 os:macos project:panama Used to track Project Panama related work test failure

Comments

@knn-k
Copy link
Contributor

knn-k commented Nov 17, 2022

Failure from my local testing.
One of FFI upcall tests, java/foreign/TestUpcallStack.java, crashes on AArch64 macOS using the runtime and the test image from https://openj9-jenkins.osuosl.org/job/Build_JDK19_aarch64_mac_Nightly/123/ as shown below. 100% reproducible.

test TestUpcallStack.testUpcallsStack(0, "f0_V__", VOID, [], []): success
test TestUpcallStack.testUpcallsStack(17, "f0_V_S_DI", VOID, [STRUCT], [DOUBLE, INT]): success
test TestUpcallStack.testUpcallsStack(34, "f0_V_S_IDF", VOID, [STRUCT], [INT, DOUBLE, FLOAT]): success
test TestUpcallStack.testUpcallsStack(51, "f0_V_S_FDD", VOID, [STRUCT], [FLOAT, DOUBLE, DOUBLE]): success
test TestUpcallStack.testUpcallsStack(68, "f0_V_S_DDP", VOID, [STRUCT], [DOUBLE, DOUBLE, POINTER]): success
test TestUpcallStack.testUpcallsStack(85, "f0_V_S_PPI", VOID, [STRUCT], [POINTER, POINTER, INT]): success
test TestUpcallStack.testUpcallsStack(102, "f0_V_IS_FF", VOID, [INT, STRUCT], [FLOAT, FLOAT]): failure
java.lang.AssertionError: expected [12.0] but found [0.0]
        at org.testng.Assert.fail(Assert.java:99)
        (... snip ...)
test TestUpcallStack.testUpcallsStack(204, "f0_V_FS_IIP", VOID, [FLOAT, STRUCT], [INT, INT, POINTER]): success
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000002
Handler1=0000000102A32C50 Handler2=0000000102C061B0 InaccessibleAddress=0000000000000000
x0=000000000000002A x1=000000000000002A x2=000000000000002A x3=000000000000002A
x4=000000000000002A x5=000000000000002A x6=000000000000002A x7=000000000000002A
x8=0000000000000000 x9=0000000002CA0068 x10=0000002A41400000 x11=0000000102B379DC
x12=0000000000000190 x13=00000000000000A0 x14=000000011361B170 x15=0000000000000000
x16=000000018A122940 x17=0000000113891940 x18=0000000000000000 x19=0000000000000080
x20=0000000113868618 x21=0000000000000008 x22=0000000000000008 x23=000000011182D340
x24=0000000000000000 x25=00000001342A3100 x26=0000000000000080 x27=0000000000000013
x28=00000001342A3270 x29(FP)=000000016D992460 x30(LR)=000000011280D99C x31(SP)=000000016D992390
PC=0000000000000000 SP=000000016D992390
(... snip ...)

A broken value (NULL in the case above) is passed as cb to the function sf0_V_FS_FFI in test/jdk/java/foreign/libTestUpcallStack.c, and it causes the crash.
This failure is not observed on AArch64 Linux.

@knn-k
Copy link
Contributor Author

knn-k commented Nov 17, 2022

Stack backtrace in the debugger:

  * frame #0: 0x0000000000000000
    frame #1: 0x000000015500d99c libTestUpcallStack.dylib`sf0_V_FS_FFI + 240
    frame #2: 0x000000010063807c libj9vm29.dylib`ffi_call_SYSV + 76
    frame #3: 0x000000010063798c libj9vm29.dylib`ffi_call_int + 1324
    frame #4: 0x0000000100593c58 libj9vm29.dylib`VM_BytecodeInterpreterFull::run(J9VMThread*) + 97956
    frame #5: 0x000000010057bda8 libj9vm29.dylib`bytecodeLoopFull + 92

@knn-k
Copy link
Contributor Author

knn-k commented Nov 17, 2022

This is independent from createUpcallThunk() and getArgPointer(). They are not called for sf0_V_FS_FFI.

Correction: getArgPointer() is not called, but createUpcallThunk() is called.

@knn-k
Copy link
Contributor Author

knn-k commented Nov 17, 2022

Please ignore the assertion failure with f0_V_IS_FF in the output above. PR #16332 will fix it.

@ChengJin01
Copy link

ChengJin01 commented Nov 17, 2022

Hi @knn-k, your dump already indicates the test case sf0_V_FS_FFI was invoked in native but it looks like the crash was triggered somewhere in the thunk code, something similar we detected in one of TestUpcallStack test cases on Windows at #16235, in which the crash occurred in the thunk rather than getArgPointer.

 * frame #0: 0x0000000000000000 <-------- somewhere in the thunk code
    frame #1: 0x000000015500d99c libTestUpcallStack.dylib`sf0_V_FS_FFI + 240 <-----
    frame #2: 0x000000010063807c libj9vm29.dylib`ffi_call_SYSV + 76
    frame #3: 0x000000010063798c libj9vm29.dylib`ffi_call_int + 1324
    frame #4: 0x0000000100593c58 libj9vm29.dylib`VM_BytecodeInterpreterFull::run(J9VMThread*) + 97956
    frame #5: 0x000000010057bda8 libj9vm29.dylib`bytecodeLoopFull + 92

Based on the trace above, the dispatcher was not yet invoked by the thunk code; otherwise, the trace should be something like:

* frame #0: ............
    frame #1: 0x0000000xxxxxxxxxx  native2InterpJavaUpcallImpl
    frame #2: 0x0000000yyyyyyyyyy native2InterpJavaUpcall0
    frame #3: 0x0000000zzzzzzzzzz  thunk code <-----------
    frame #3: 0x000000015500d99c libTestUpcallStack.dylib`sf0_V_FS_FFI + 240 <----- test case
    frame #4: 0x000000010063807c libj9vm29.dylib`ffi_call_SYSV + 76
    frame #5: 0x000000010063798c libj9vm29.dylib`ffi_call_int + 1324
    frame #6: 0x0000000100593c58 libj9vm29.dylib`VM_BytecodeInterpreterFull::run(J9VMThread*) + 97956
    frame #7: 0x000000010057bda8 libj9vm29.dylib`bytecodeLoopFull + 92

against the java trace at (which I reproduced on my side)

3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at openj9/internal/foreign/abi/InternalDowncallHandler.invokeNative(Native Method)
4XESTACKTRACE                at openj9/internal/foreign/abi/InternalDowncallHandler.runNativeMethod(InternalDowncallHandler.java:538)
4XESTACKTRACE                at java/lang/invoke/LambdaForm$DMH/0x00000000478e7c20.invokeSpecial(LambdaForm$DMH)
4XESTACKTRACE                at java/lang/invoke/LambdaForm$MH/0x0000000047885a20.invoke(LambdaForm$MH)
4XESTACKTRACE                at java/lang/invoke/LambdaForm$MH/0x0000000048283420.invoke(LambdaForm$MH)
4XESTACKTRACE                at java/lang/invoke/LambdaForm$MH/0x000000004816e820.invokeExact_MT(LambdaForm$MH)
4XESTACKTRACE                at java/lang/invoke/MethodHandle.invokeWithArguments(MethodHandle.java:733)
4XESTACKTRACE                at TestUpcallStack.testUpcallsStack(TestUpcallStack.java:62)
4XESTACKTRACE                at java/lang/invoke/LambdaForm$DMH/0x0000000027867220.invokeVirtual(LambdaForm$DMH)

So it might be helpful to double-check the thunk code against the test case to see what happened in there.

@knn-k
Copy link
Contributor Author

knn-k commented Nov 17, 2022

@ChengJin01
No, it is the cb passed to sf0_V_FS_FFI() that is broken.
It was a NULL in the example above, and the function call cb(pf0, pf1, pf2, pf3, pf4, pf5, pf6, pf7, pf8, pf9, pf10, pf11, pf12, pf13, pf14, pf15, p0,p1); in sf0_V_FS_FFI() failed because the destination address was 0. So the debugger shows frame #0: 0x0000000000000000, and the register dump prints PC=0000000000000000.

@knn-k
Copy link
Contributor Author

knn-k commented Nov 17, 2022

Output from a build with PR #16332 from https://openj9-jenkins.osuosl.org/job/Build_JDK19_aarch64_mac_Personal/123/.

test TestUpcallStack.testUpcallsStack(0, "f0_V__", VOID, [], []): success
test TestUpcallStack.testUpcallsStack(17, "f0_V_S_DI", VOID, [STRUCT], [DOUBLE, INT]): success
test TestUpcallStack.testUpcallsStack(34, "f0_V_S_IDF", VOID, [STRUCT], [INT, DOUBLE, FLOAT]): success
test TestUpcallStack.testUpcallsStack(51, "f0_V_S_FDD", VOID, [STRUCT], [FLOAT, DOUBLE, DOUBLE]): success
test TestUpcallStack.testUpcallsStack(68, "f0_V_S_DDP", VOID, [STRUCT], [DOUBLE, DOUBLE, POINTER]): success
test TestUpcallStack.testUpcallsStack(85, "f0_V_S_PPI", VOID, [STRUCT], [POINTER, POINTER, INT]): success
test TestUpcallStack.testUpcallsStack(102, "f0_V_IS_FF", VOID, [INT, STRUCT], [FLOAT, FLOAT]): success
test TestUpcallStack.testUpcallsStack(119, "f0_V_IS_IFD", VOID, [INT, STRUCT], [INT, FLOAT, DOUBLE]): success
test TestUpcallStack.testUpcallsStack(136, "f0_V_IS_FFP", VOID, [INT, STRUCT], [FLOAT, FLOAT, POINTER]): success
test TestUpcallStack.testUpcallsStack(153, "f0_V_IS_DDI", VOID, [INT, STRUCT], [DOUBLE, DOUBLE, INT]): success
test TestUpcallStack.testUpcallsStack(170, "f0_V_IS_PDF", VOID, [INT, STRUCT], [POINTER, DOUBLE, FLOAT]): success
test TestUpcallStack.testUpcallsStack(187, "f0_V_FS_ID", VOID, [FLOAT, STRUCT], [INT, DOUBLE]): success
test TestUpcallStack.testUpcallsStack(204, "f0_V_FS_IIP", VOID, [FLOAT, STRUCT], [INT, INT, POINTER]): success
Unhandled exception
Type=Bus error vmState=0x00000000
J9Generic_Signal_Number=00000028 Signal_Number=0000000a Error_Value=00000000 Signal_Code=00000001
Handler1=000000010049E8D0 Handler2=000000010481E150 InaccessibleAddress=00000001403965D8
x0=000000000000002A x1=000000000000002A x2=000000000000002A x3=000000000000002A
x4=000000000000002A x5=000000000000002A x6=000000000000002A x7=000000000000002A
x8=00000001403965D8 x9=0000000027440068 x10=0000002A41400000 x11=00000001005A3854
x12=0000000000000190 x13=00000000000000A0 x14=000000012977B090 x15=0000000000000000
x16=000000018A122940 x17=006FBF01006FBF00 x18=0000000125D5A168 x19=0000000000000080
x20=0000000126829818 x21=0000000000000008 x22=0000000000000008 x23=000000014B8A9540
x24=0000000000000000 x25=000000014A120470 x26=0000000000000080 x27=0000000000000013
x28=000000014A1205E0 x29(FP)=000000029342E460 x30(LR)=00000001277B199C x31(SP)=000000029342E390
PC=00000001403965D8 SP=000000029342E390
(... snip ...)

@ChengJin01
Copy link

ChengJin01 commented Nov 17, 2022

The broken cb value is not a NULL but 0x1403965D8 in this case as you can see in the register dump as PC=00000001403965D8.

My understanding is that the broken cb value was set in the thunk generation code (not the thunk code itself) as it knows which dispatcher should be invoked via cb. If so, the problem came from the thunk generation code which determines the type of dispatcher (based on the return type) for cb.

@ChengJin01
Copy link

ChengJin01 commented Nov 17, 2022

It might exist from the very beginning when the thunk generation code is merged via #15744 as all FFI related Jtreg test suites are still disabled for now via https://github.com/adoptium/aqa-tests (till all issues are resolved), which is why we are manually verifying these tests so as to fix all detected issues after all thunk generation code is implemented.

@knn-k
Copy link
Contributor Author

knn-k commented Nov 17, 2022

Usually the return value from createUpcallThunk() in UpcallThunkGen.cpp is passed as cb to the functions in test/jdk/java/foreign/libTestUpcallStack.c.
For the sf0_V_FS_FFI() case on macOS, createUpcallThunk() returns the correct thunk address, and metaData->thunkAddress has the thunk address, but sf0_V_FS_FFI() receives a broken value as cb. I need to understand what is different with this test function.

@knn-k
Copy link
Contributor Author

knn-k commented Nov 17, 2022

How is the upcall thunk address used after VM_OutOfLineINL_Helpers::returnDouble() stores it in the Java stack?
I would be glad if anyone can list VM functions for me to look at.

Setting a breakpoint at sf0_V_FS_FFI() in the debugger gives the following output. cb is 0x0:

    frame #0: 0x0000000153c199bc libTestUpcallStack.dylib`sf0_V_FS_FFI(pf0=42, pf1=42, pf2=42, pf3=42, pf4=42, pf5=42, pf6=42, pf7=42, pf8=24, pf9=24, pf10=24, pf11=24, pf12=24, pf13=24, pf14=24, pf15=24, p0=12, p1=(p0 = 12, p1 = 5.88545355E-44, p2 = 111968360), cb=0x0000000000000000) at libTestUpcallStack.c:244 [opt]

@ChengJin01
Copy link

ChengJin01 commented Nov 17, 2022

How is the upcall thunk address used after VM_OutOfLineINL_Helpers::returnDouble() stores it in the Java stack? I would be glad if anyone can list VM functions for me to look at.

The upcall thunk address is just returned to applications after wrapping it up as a MemorySegment at
https://github.com/eclipse-openj9/openj9/blob/master/jcl/src/java.base/share/classes/openj9/internal/foreign/abi/InternalUpcallHandler.java#L170
and
https://github.com/ibmruntimes/openj9-openjdk-jdk19/blob/openj9/src/java.base/share/classes/jdk/internal/foreign/abi/UpcallStubs.java#L58
and passed from applications into inlInternalDowncallHandlerInvokeNative in the interpreter at

ffi_call(cif, FFI_FN(function), returnStorage, values);

So please check the last argument of ffiArgs at

pointerValues[i] = (U_64)ffiArgs[i];
to see whether cb (the last one) is a valid address or not. If so, the issue might come from libffi/Aarch64 (which means cb in this test case is not correctly handled even with the latest version via #16252); otherwise, there was problem with the passed-in arguments themselves.

@knn-k
Copy link
Contributor Author

knn-k commented Nov 18, 2022

The upcall thunk address is unchanged and correct in inlInternalDowncallHandlerInvokeNative() when the crash occurs.
So it suggests it might be a libffi problem.

@ChengJin01
Copy link

ChengJin01 commented Nov 18, 2022

If it is proved to be a liffi/Aarch64 specific issue, we need to raise an issue at https://github.com/libffi/libffi and send a bug report via https://sourceware.org/mailman/listinfo/libffi-discuss/ to request them to fix it up on Aarch64.

In addition, I am wondering this is the only bug detected on macOS/Aarch64. Might need to double-check the rest of the tests in testUpcallsStack to see how many tests need to be reported to libffi and disabled for the moment.

@knn-k
Copy link
Contributor Author

knn-k commented Nov 18, 2022

I had to disable 36 testcases in TestUpcallStack.java for skipping the broken cb problem.
I found there were 7 other failures below with TestUpcallStack.java on AArch64 macOS.

test TestUpcallStack.testUpcallsStack(2329, "f3_V_FSS_FI", VOID, [FLOAT, STRUCT, STRUCT], [FLOAT, INT]): failure
test TestUpcallStack.testUpcallsStack(2414, "f4_V_DIS_IF", VOID, [DOUBLE, INT, STRUCT], [INT, FLOAT]): failure
test TestUpcallStack.testUpcallsStack(4369, "f7_V_SFI_I", VOID, [STRUCT, FLOAT, INT], [INT]): failure
test TestUpcallStack.testUpcallsStack(4709, "f7_V_SFS_II", VOID, [STRUCT, FLOAT, STRUCT], [INT, INT]): failure
test TestUpcallStack.testUpcallsStack(10081, "f16_S_SIF_I", NON_VOID, [STRUCT, INT, FLOAT], [INT]): failure
test TestUpcallStack.testUpcallsStack(10438, "f17_S_SFI_IIF", NON_VOID, [STRUCT, FLOAT, INT], [INT, INT, FLOAT]): failure
test TestUpcallStack.testUpcallsStack(10761, "f17_S_SFS_FI", NON_VOID, [STRUCT, FLOAT, STRUCT], [FLOAT, INT]): failure

AArch64 Linux passes all the tests in TestUpcallStack.java when #16332 is applied.

@ChengJin01
Copy link

I had to disable 36 testcases in TestUpcallStack.java for skipping the broken cb problem.

Could you list all your disabled test cases? coz the libffi developers will need to ensure their fix works for all these cases (probably with different combinations of arguments plus the function pointer) if they get started to investigate the issue.

@knn-k
Copy link
Contributor Author

knn-k commented Nov 18, 2022

The list of testcases that have the broken cb problem:

sf0_V_FS_FFI, sf0_V_SF_FII, sf3_V_FSI_II, sf3_V_FSI_IIF, sf3_V_FSF_IF, sf3_V_FSD_FFI, sf3_V_FSP_IFI, sf3_V_FSS_IFF, sf4_V_DFS_FII, sf5_V_PSI_IFI, sf6_V_PSF_IFF, sf7_V_SFD_FII, sf7_V_SFP_III, sf7_V_SFP_FIF, sf7_V_SFS_IIF, sf9_V_SSD_I, sf10_V_SSS_FII, sf11_S_SF_FFI, sf12_I_ISD_I, sf12_I_ISS_FII, sf13_F_FSI_FI, sf13_F_FSI_IFF, sf14_D_DFS_FFI, sf14_D_DSF_FII, sf15_P_PIS_FII, sf17_S_SIP_FII, sf17_S_SIS_III, sf17_S_SIS_FIF, sf17_S_SFD_FFI, sf17_S_SFP_IFI, sf17_S_SFS_IFF, sf19_S_SSI_FII, sf19_S_SSF_III, sf19_S_SSF_FIF, sf19_S_SSD_IIF, sf20_S_SSS_FFI

@ChengJin01
Copy link

ChengJin01 commented Nov 18, 2022

Based on the debugging results on my side, the test failures detected at #16336 (comment) belongs to the same issue: all arguments after the first struct argument got messed up, whether they are float/double/int or struct.
e.g.
test TestUpcallStack.testUpcallsStack(2329, "f3_V_FSS_FI", VOID, [FLOAT, STRUCT, STRUCT], [FLOAT, INT]): failure

(lldb) bt
* thread #27, stop reason = breakpoint 2.1
  * frame #0: 0x000000013b07d928 libTestUpcallStack.dylib`sf3_V_FSS_FI(pf0=42, pf1=42, pf2=42, 
  * pf3=42, pf4=42, pf5=42, pf6=42, pf7=42, pf8=24, pf9=24, 
  * pf10=24, pf11=24, pf12=24, pf13=24, pf14=24, pf15=24, p0=12, 
  * p1=(p0 = 5.88545355E-44, p1 = 1094713344), <--------
  * p2=(p0 = 5.88545355E-44, p1 = 1),  <-------
  * cb=(0x00000001007fc068)) at libTestUpcallStack.c:2355:490 [opt]
    frame #1: 0x000000010053912c libj9vm29.dylib`ffi_call_SYSV + 76
    frame #2: 0x0000000100538a48 libj9vm29.dylib`ffi_call_int(cif=0x0000000100998ad0, fn=(libTestUpcallStack.dylib`sf3_V_FSS_FI at libTestUpcallStack.c:2355), orig_rvalue=0x0000000100932e18, avalue=0x00000001002cd340, closure=0x0000000000000000) at ffi.c:816:3 [opt]
    frame #3: 0x0000000100538518 libj9vm29.dylib`ffi_call(cif=<unavailable>, fn=<unavailable>, rvalue=<unavailable>, avalue=<unavailable>) at ffi.c:825:3 [opt] [artificial]
    frame #4: 0x0000000100494b30 libj9vm29.dylib`VM_BytecodeInterpreterFull::run(J9VMThread*) [inlined] VM_BytecodeInterpreterFull::inlInternalDowncallHandlerInvokeNative(this=0x0000000170b368c0, _sp=0x0000000170b36698, _pc=<unavailable>) at BytecodeInterpreter.hpp:5145:3 [opt]
    frame #5: 0x00000001004946c0 libj9vm29.dylib`VM_BytecodeInterpreterFull::run(this=0x0000000170b368c0, vmThread=<unavailable>) at BytecodeInterpreter.hpp:10739:3 [opt]
    frame #6: 0x000000010047cc80 libj9vm29.dylib`::bytecodeLoopFull(currentThread=<unavailable>) at BytecodeInterpreter.inc:112:21 [opt]
    frame #7: 0x00000001004cc81c libj9vm29.dylib`cInterpreter + 16
Target 0: (java) stopped.
(lldb) p pf0
(long long) $0 = 42
(lldb) p pf15
(double) $1 = 24
(lldb) p p0
(float) $2 = 12
(lldb) p  p1
(S_FI) $3 = (p0 = 5.88545355E-44, p1 = 1094713344) <----- wrong / p0(float) is 12 while p1(int) 42 as specified in test)
(lldb) p  p2
(S_FI) $4 = (p0 = 5.88545355E-44, p1 = 1)  <----- wrong / p0(float) is 12 while p1(int) 42 as specified in test)
(lldb) p/x cb
(void (*)(long long, long long, long long, long long, long long, long long, 
long long, long long, double, double, double, double, double, double, double, 
double, float, S_FI, S_FI)) $5 = 0x00000001007fc068 (0x00000001007fc068)

Even at #16336 (comment) with the broken cb, the struct argument were also messed up:

frame #0: 0x0000000153c199bc libTestUpcallStack.dylib`sf0_V_FS_FFI(pf0=42, pf1=42, 
pf2=42, pf3=42, pf4=42, pf5=42, pf6=42, pf7=42, pf8=24, pf9=24, 
pf10=24, pf11=24, pf12=24, pf13=24, pf14=24, pf15=24, p0=12, 
p1=(p0 = 12, p1 = 5.88545355E-44, p2 = 111968360), <------- p1 should be (12, 12, 42)
cb=0x0000000000000000) at libTestUpcallStack.c:244 [opt]

So they are fundamentally the same issue with different cases (as long as the 1st struct argument shows up in the argument list,
the remaining argument are messed up).

@ChengJin01
Copy link

ChengJin01 commented Nov 18, 2022

Will need to put everything together to sent bug report to https://sourceware.org/mailman/listinfo/libffi-discuss/ for fix given it has nothing to do with our code. For now, TestUpcallStack.java need to be disabled on macOS/Aarch64 till the libffi specific issue is resolved.

@ChengJin01
Copy link

FYI: @pshipton, @tajila

@knn-k
Copy link
Contributor Author

knn-k commented Nov 21, 2022

Maybe we need to create a simple, stand-alone testcase that does not depend on Java for reporting this issue to the libffi community?

@knn-k
Copy link
Contributor Author

knn-k commented Nov 21, 2022

I wrote a program that demonstrates the failure with sf0_V_FS_FFI as attached.

Running it on AArch64 macOS with libffi 3.4.4 generates the following output:

% ./sample
d8=-8.000000, f1=12.300000, s1.m1=78.900002, s1.m2=0.000000, s1.m3=15319212, p1=0x0

f1 is correct, but all the following arguments (struct members of s1 and the pointer p1 are wrong.

Running the same program on AArch64 Linux looks like this, which is as expected:

$ ./sample
d8=-8.000000, f1=12.300000, s1.m1=45.599998, s1.m2=78.900002, s1.m3=111, p1=0xaaaadf2a39d0

sample.c.txt

@knn-k
Copy link
Contributor Author

knn-k commented Nov 21, 2022

I ran the following commands for building and running my sample program on AArch64 macOS:

% tar xvpf libffi-3.4.4.tar.gz
% cd libffi-3.4.4
% ./configure
% make
% cd ..
% clang -o sample -O -I libffi-3.4.4/aarch64-apple-darwin20.6.0/include sample.c libffi-3.4.4/aarch64-apple-darwin20.6.0/.libs/libffi.a
% ./sample
% clang --version
Apple clang version 12.0.5 (clang-1205.0.22.11)
Target: arm64-apple-darwin20.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

@knn-k
Copy link
Contributor Author

knn-k commented Nov 21, 2022

https://sourceware.org/libffi/ does not list AArch64 macOS as a supported platform.

@knn-k
Copy link
Contributor Author

knn-k commented Nov 21, 2022

Using the latest source tree from https://github.com/libffi/libffi instead of v3.4.4 also gives a wrong result on AArch64 macOS.

@ChengJin01
Copy link

ChengJin01 commented Nov 21, 2022

Maybe we need to create a simple, stand-alone testcase that does not depend on Java for reporting this issue to the libffi community?

That's what they usually expect us to do but they should be able to look at the issue as everything is in the open.

https://sourceware.org/libffi/ does not list AArch64 macOS as a supported platform.

That looks weird to me but we can check with them to confirm whether the code on iOS/AArch64 also applies to macOS/AArch64.

Architecture	Operating System
AArch64 (ARM64) 	iOS <---------

Using the latest source tree from https://github.com/libffi/libffi instead of v3.4.4 also gives a wrong result on AArch64 macOS.

You don't have to check v3.4.4 as the merged PR at #16252 is the latest code in https://github.com/libffi/libffi.

@ChengJin01
Copy link

ChengJin01 commented Nov 21, 2022

I just opened an issue at libffi/libffi#750 and also contacted one of the project maintainers over Slack & by e-mail to see whether they can come back to us with solutions to the problem.

@knn-k
Copy link
Contributor Author

knn-k commented Nov 22, 2022

The condition for some of the failures seems to be:

  • On AArch64 macOS only
  • The callee function has at least 10 arguments (8 arguments in registers + 2 arguments in the stack)
  • An argument smaller than 8 bytes (char/short/int/float/small struct) is passed in the stack
  • A struct is passed as an argument in the stack after the argument above
  • The size of the struct is 16 bytes or smaller (Larger structs are passed as a pointer)
  • There could be something in addition to above

@knn-k
Copy link
Contributor Author

knn-k commented Nov 22, 2022

Here is another example. sample2.c.txt
It is similar to the testcase sf7_V_SFP_III.

It calls testFunc() twice, without and with libffi. Those calls are expected to generate the same output, but the result on AArch64 macOS looks like this:

 % ./sample2
d8=-8.000000, s1.m1=201, s1.m2=202, s1.m3=203, f1=12.300000, p1=0x1040440ac <-- without libffi
d8=-8.000000, s1.m1=201, s1.m2=202, s1.m3=203, f1=0.000000, p1=0x1f50fdf68 <-- with libffi
  • A struct of 12 bytes long (s1) is passed as an argument to the function in the stack: Its value is correct
  • An argument of type float (f1) is passed in the stack after the struct: It is corrupted when the function is called by libffi

The callee function expects s1 to be at [sp, 0] and f1 at [sp, 16], but libffi seems to place f1 at [sp, 12]. The stack offset for p1 is also wrong.

@knn-k
Copy link
Contributor Author

knn-k commented Nov 23, 2022

My previous two comments can explain the reasons of the failures in #16336 (comment). They are related to padding and alignment of struct arguments in the stack on macOS.

  • A struct argument after a 4-byte argument passed in the stack
  • A 4-byte argument after a struct argument passed in the stack

@knn-k
Copy link
Contributor Author

knn-k commented Nov 23, 2022

I opened a draft PR #16362 as a fix for this problem.
I ran the upcall and downcall tests in jtreg, and they were all successful on AArch64 macOS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch:aarch64 os:macos project:panama Used to track Project Panama related work test failure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants