Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Jtreg/FFI] Invalid struct offset returned from getArgPointer() in upcall on zLinux #16214

Closed
ChengJin01 opened this issue Oct 27, 2022 · 5 comments · Fixed by #16326
Closed
Assignees
Labels
comp:jit jdk19 project:panama Used to track Project Panama related work
Milestone

Comments

@ChengJin01
Copy link

ChengJin01 commented Oct 27, 2022

The crash occurred when running https://github.com/ibmruntimes/openj9-openjdk-jdk19/blob/openj9/test/jdk/java/foreign/TestUpcallStack.java (the corresponding native code at https://github.com/ibmruntimes/openj9-openjdk-jdk19/blob/openj9/test/jdk/java/foreign/libTestUpcallStack.c

...
test TestUpcallStack.testUpcallsStack(6035, "f10_V_SSS_PID", VOID, [STRUCT, STRUCT, STRUCT], [POINTER, INT, DOUBLE]): success
test TestUpcallStack.testUpcallsStack(6052, "f10_P_P_", NON_VOID, [POINTER], []): success
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=230bda70 Signal_Code=00000001
Handler1=000003FF88944310 Handler2=000003FF888316D0 InaccessibleAddress=4038000000000000
gpr0=0000007FDF897798 gpr1=0000000000000002 gpr2=000003FEFC4BBCD0 gpr3=4038000000000000
gpr4=0000000000000000 gpr5=0000000000000010 gpr6=0000000000000003 gpr7=0000000000000010
gpr8=4038000000000000 gpr9=000003FEFC4BBCD0 gpr10=0000000000000000 gpr11=0000000000244500
gpr12=000003FF62BDB550 gpr13=0000000000244500 gpr14=000003FF7C74AD80 gpr15=000003FF8837D368
psw=000003FF7C76575C mask=0705200180000000 fpc=0008fe00 bea=000003FF7C765596
fpr0 000003ff7d763ec8 (f: 2104901376.000000, d: 2.171842e-311)
fpr1 4038000000000000 (f: 0.000000, d: 2.400000e+01)
fpr2 4038000000000000 (f: 0.000000, d: 2.400000e+01)
fpr3 3caf26f055555555 (f: 1431655808.000000, d: 2.161611e-16)
fpr4 4038000000000000 (f: 0.000000, d: 2.400000e+01)
fpr5 3e92540b00000000 (f: 0.000000, d: 2.731128e-07)
fpr6 4038000000000000 (f: 0.000000, d: 2.400000e+01)
fpr7 3e3a44e100000000 (f: 0.000000, d: 6.116242e-09)
fpr8 4038000000000000 (f: 0.000000, d: 2.400000e+01)
fpr9 0000000000535de8 (f: 5463528.000000, d: 2.699341e-317)
fpr10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr12 0005eb66b0428603 (f: 2957149696.000000, d: 8.232128e-309)
fpr13 000003fee402a128 (f: 3825377536.000000, d: 2.170570e-311)
fpr14 0000000000535df0 (f: 5463536.000000, d: 2.699345e-317)
fpr15 000003fee4031d58 (f: 3825409280.000000, d: 2.170570e-311)
Module=/home/jenkins/jchau_ffi/jdk19_openj9_ffi_cmk_s390x_v09/lib/default/libjclse29.so
Module_base_address=000003FF7C700000
Target=2_90_20221017_000000 (Linux 3.10.0-1160.76.1.el7.s390x)
CPU=s390x (4 logical CPUs) (0x1ec5df000 RAM)
----------- Stack Backtrace -----------
alignedMemcpy+0x1e4 (0x000003FF7C76575C [libjclse29.so+0x6575c])
Java_sun_misc_Unsafe_copyMemory__Ljava_lang_Object_2JLjava_lang_Object_2JJ+0xb90 (0x000003FF7C74AD80 [libjclse29.so+0x4ad80])
 (0x000003FF62BDB4F6 [<unknown>+0x0])

It crashed in copyForwardU64() when copying from the invalid source pointer (offset) originated from getArgPointer():

#12 <signal handler called>
#13 copyForwardU64 (count=<optimized out>, source=0x4038000000000008, dest=0x3ff6c21d6a8) at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/openj9/runtime/util/alignedmemcpy.c:45
#14 alignedMemcpy (vmStruct=<optimized out>, dest=0x3ff6c21d6a0, source=0x4038000000000000, bytes=<optimized out>, alignment=3) at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/openj9/runtime/util/alignedmemcpy.c:227
#15 0x000003ffa984ad80 in copyMemorySub (elementCount=2, destIndex=<optimized out>, sourceIndex=<optimized out>, logElementSize=3, actualSize=16, destOffset=4395565700768, destObject=0x0, sourceOffset=4627448617123184640, sourceObject=0x0, currentThread=0x20c600) at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/openj9/runtime/jcl/common/sun_misc_Unsafe.cpp:548
#16 copyMemory (actualSize=16, destOffset=4395565700768, destObject=0x0, sourceOffset=4627448617123184640, sourceObject=0x0, currentThread=0x20c600) at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/openj9/runtime/jcl/common/sun_misc_Unsafe.cpp:581
#17 Java_sun_misc_Unsafe_copyMemory__Ljava_lang_Object_2JLjava_lang_Object_2JJ (env=0x20c600, receiver=<optimized out>, srcBase=<optimized out>, srcOffset=4627448617123184640, dstBase=<optimized out>, dstOffset=4395565700768, size=16) at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/openj9/runtime/jcl/common/sun_misc_Unsafe.cpp:643
#18 0x000003ffb4d16204 in ffi_call_SYSV () from /home/jenkins/jchau_ffi/jdk19_openj9_ffi_cmk_s390x_v09/lib/default/libj9vm29.so
#19 0x000003ffb4d15c5a in ffi_call (cif=<optimized out>, fn=<optimized out>, rvalue=<optimized out>, avalue=<optimized out>) at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/openj9/runtime/libffi/s390/ffi.c:532
#20 0x000003ffb4bbbd60 in cJNICallout (isStatic=<optimized out>, function=<optimized out>, returnStorage=<optimized out>, returnType=<optimized out>, javaArgs=<optimized out>, receiverAddress=0x2111f0, _pc=<optimized out>, _sp=<optimized out>, this=<optimized out>) at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/openj9/runtime/vm/BytecodeInterpreter.hpp:2502
#21 callCFunction (returnType=<optimized out>, isStatic=<optimized out>, bp=<optimized out>, javaArgs=<optimized out>, receiverAddress=<optimized out>, jniMethodStartAddress=<optimized out>, _pc=<optimized out>, _sp=<optimized out>, this=<optimized out>) at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/openj9/runtime/vm/BytecodeInterpreter.hpp:2320
#22 runJNINative (_pc=<optimized out>, _sp=<optimized out>, this=<optimized out>) at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/openj9/runtime/vm/BytecodeInterpreter.hpp:2210
#23 VM_BytecodeInterpreterCompressed::run (this=0x3ffb45fd3d0, vmThread=<optimized out>) at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/openj9/runtime/vm/BytecodeInterpreter.hpp:10745
#24 0x000003ffb4ba77ae in bytecodeLoopCompressed (currentThread=<optimized out>) at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/openj9/runtime/vm/BytecodeInterpreter.inc:112
#25 0x000003ffb4c9023c in c_cInterpreter () at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/build/linux-s390x-server-release/vm/runtime/vm/zcinterp.s:278
#26 0x000003ffb4b93a46 in native2InterpJavaUpcallImpl (data=0x3ff6c3c5400, argsListPointer=<optimized out>) at /home/jenkins/jchau_ffi/openj9-openjdk-jdk19/openj9/runtime/vm/UpcallVMHelpers.cpp:360
#27 0x000003ffb562c0f0 in ?? ()

with the debugging output as follows:

reakpoint 1, native2InterpJavaUpcallImpl (data=0x3ff704d88d0, argsListPointer=0x3fff077b5f0) at UpcallVMHelpers.cpp:329
329                                     j9object_t memSegmtObject = createMemSegmentObject(data, offset, sigArray[argIndex].sizeInByte, sessionOrScopeObject);
(gdb) p/x sigArray[argIndex].type
$11 = 0xca  <------ #define J9_FFI_UPCALL_SIG_TYPE_STRUCT_AGGREGATE_MISC    0XCA

Breakpoint 1, native2InterpJavaUpcallImpl (data=0x3ff704d88d0, argsListPointer=0x3fff077b5f0) at UpcallVMHelpers.cpp:329
329                                     j9object_t memSegmtObject = createMemSegmentObject(data, offset, sigArray[argIndex].sizeInByte, sessionOrScopeObject);

(gdb) p/x  offset
$12 = 0x4038000000000000 <---- a double rather than a valid struct address
(gdb) x/10x  offset
0x4038000000000000:     Cannot access memory at address 0x4038000000000000
(gdb)

(gdb) p/x   4627448617123184640 <------- sourceOffset or source
$13 = 0x4038000000000000

Program received signal SIGSEGV, Segmentation fault.
0x000003fff1117dfc in copyForwardU64 (dest=0x3ff704dc708, source=0x4038000000000008, count=2) at alignedmemcpy.c:45
45                      *dest++ = *source++; <------- invalid source pointer
(gdb) bt

So the JIT team need to take a look to see why the returned offset for struct is invalid given getArgPointer() is part of the thunk generation code on zLinux.

Note:
the failing test case might be test TestUpcallStack.testUpcallsStack(6069, "f10_S_S_PI", NON_VOID, [STRUCT], [POINTER, INT]) based on the result on other platforms, in which case the passed-in struct argument should be [POINTER, INT] in upcall.

FYI: @dchopra001, @0xdaryl, @tajila, @pshipton

@ChengJin01 ChengJin01 added comp:jit jdk19 project:panama Used to track Project Panama related work labels Oct 27, 2022
@ChengJin01
Copy link
Author

Hi @dchopra001, could you take a look at this issue? Let me know if you need any help.

@ChengJin01
Copy link
Author

ChengJin01 commented Nov 8, 2022

Hi @dchopra001, is there any progress on the issue?

@dchopra001
Copy link
Contributor

This problem is exposed by a native function with the following parameter list:

long long pf0, long long pf1, long long pf2, long long pf3, long long pf4,
long long pf5, long long pf6, long long pf7,
double pf8, double pf9, double pf10, double pf11, 
double pf12,double pf13, double pf14, double pf15,
struct S_PI p0

As per our design, parameters pf0 to pf4 should be in the newly allocated callee frame, and pf5 to pf7 should be in the original caller frame. Similarly, pf8 to pf11 should be in the newly allocated callee frame, and pf12 to pf15 should be in the original caller frame.

What I'm seeing happening at the moment is that the address where pf12 should be instead points to long long instead of a double. Incrementing this argument pointer (and all subsequent argument pointers) by 8 resolves the issue. So based on that, I think we have an extra long long in the caller frame that we don't expect to see there.

I'm not sure why this is happening at the moment as the original caller frame is populated by the caller before the glue routine is invoked. I'll investigate further to figure out why. I hope to have a resolution to this problem soon.

@tajila
Copy link
Contributor

tajila commented Nov 10, 2022

@dchopra001 Do you think this will be resolved within 2 weeks?

@dchopra001
Copy link
Contributor

Yes, I'll have a fix for this soon. If I run into any blockers I'll update here.

dchopra001 added a commit to dchopra001/openj9 that referenced this issue Nov 15, 2022
When getArgPointer is invoked while an upcall is performed
the gprIndex must be incremented if hidden parameters have
to be accounted for. This commit implements the suggested
change.

Fixes: eclipse-openj9#16214

Signed-off-by: Dhruv Chopra <Dhruv.C.Chopra@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jit jdk19 project:panama Used to track Project Panama related work
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants