Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APP CRASH, HANG running jdk8 apps with traces enabled #3586

Open
rkgithubs opened this issue May 1, 2019 · 24 comments
Open

APP CRASH, HANG running jdk8 apps with traces enabled #3586

rkgithubs opened this issue May 1, 2019 · 24 comments

Comments

@rkgithubs
Copy link

rkgithubs commented May 1, 2019

I'm seeing the following crashes on aarch64 while running a couple of java applications, this happens to on JDK8 and JDK9 both.
First one is SPECjvm08 benchmark, second one is a physics game test.
~/workloads/SPECjvm2008/SPECjvm2008_v101$ ~/tools/DynamoRIO/DynamoRIO-x86_64-Linux-7.90.17998-0/bin64/drrun -t drcachesim -offline -- ~/tools/jdk1.9-b160/bin/java -jar SPECjvm2008.jar scimark.sor.small -ikv -wt 30 -it 60

A fatal error has been detected by the Java Runtime Environment:
SIGILL (0x4) at pc=0x00007f9541ca5383, pid=29922, tid=29923

JRE version: (9.0+160) (build )
Java VM: Java HotSpot(TM) 64-Bit Server VM (9-ea+160, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
Problematic frame:
v ~BufferBlob::Signature Handler Temp Buffer

No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as:
/home/daenerys/workloads/SPECjvm2008/SPECjvm2008_v101/hs_err_pid29922.log
Loaded disassembler from /home/daenerys/tools/jdk1.9-b160/lib/server/hsdis-amd64.so

If you would like to submit a bug report, please visit:
http://bugreport.java.com/bugreport/crash.jsp

~/workloads/SPECjvm2008/SPECjvm2008_v101$ ~/tools/DynamoRIO/DynamoRIO-x86_64-Linux-7.90.17998-0/bin64/drrun -t drcachesim -offline -- ~/tools/jdk1.9-b160/bin/java -jar ~/workloads/IcyRocks-Java-Kernel-V1.0.2.jar

A fatal error has been detected by the Java Runtime Environment:

SIGILL (0x4) at pc=0x00007f1c11ca5383, pid=29991, tid=29992

JRE version: (9.0+160) (build )
Java VM: Java HotSpot(TM) 64-Bit Server VM (9-ea+160, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
Problematic frame:
v ~BufferBlob::Signature Handler Temp Buffer

No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as:
/home/daenerys/workloads/SPECjvm2008/SPECjvm2008_v101/hs_err_pid29991.log
Loaded disassembler from /home/daenerys/tools/jdk1.9-b160/lib/server/hsdis-amd64.so

If you would like to submit a bug report, please visit:
http://bugreport.java.com/bugreport/crash.jsp

Segmentation fault

@rkgithubs
Copy link
Author

I see #2506 from 2017 with similar issue on java and node apps.

@AssadHashmi
Copy link
Contributor

AssadHashmi commented May 2, 2019

This crash looks different from the one in #2506 in that this is a SIGILL.
Perhaps I'm not interpreting the crash information correctly but it looks like these are x86 binaries?

~/tools/DynamoRIO/DynamoRIO-x86_64-Linux-7.90.17998-0/bin64/drrun . . .
. . .
Java VM: Java HotSpot(TM) 64-Bit Server VM (9-ea+160, mixed mode, tiered, compressed oops
, g1 gc, linux-amd64)
. . .
Loaded disassembler from /home/daenerys/tools/jdk1.9-b160/lib/server/hsdis-amd64.so

@rkgithubs
Copy link
Author

Following is another crash seen in the java app:
This app runs successfully using this JDK without DR. Also, attached the loglevel 2 info.
/root/DynamoRIO-x86_64-Linux-7.90.18019-0/bin64/drrun -debug -loglevel 2 -- /root/jdk1.9-b160/bin/java -jar /root/IcyRocks-Java-Kernel-V1.0.2.jar

<Starting application /root/jdk1.9-b160/bin/java (139682)>
<Initial options = -no_dynamic_options -loglevel 2 -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/root/DynamoRIO-x86_64-Linux-7.90.18019-0/lib64/debug/libdynamorio.so' 0x00007f1e221bc580

<(1+x) Handling our fault in a TRY at 0x00007f1e22405d7c>
<spurious rep/repne prefix @0x00007f1e21f79f96 (f2 41 ff e3): >

<Application /root/jdk1.9-b160/bin/java (139682). Internal Error: DynamoRIO debug check failure: /home/travis/build/DynamoRIO/dynamorio/core/translate.c:931 false
(Error occurred @18897 frags)
version 7.90.18019, custom build
-no_dynamic_options -loglevel 2 -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct
0x00007f1e022a0b40 0x00007f1e2225700e
0x00007f1e022a0d90 0x00007f1e2236d68d
0x00007f1e022a0e90 0x00007f1e2236f21c
0x00007f1e022a1920 0x00007f1e2236f79a
0x00007f1e022a1960 0x00007f1e223483ae
0x00007f1e022a19c0 0x00007f1e2243c52f
0x00007f1e022a1ec0 0x00007f1e22441c1e
0x00007f1e022a21d0 0x00007f1e22444e38
0x00007f1e022a2370 0x00007f1e2240589d
0x00007f1e013251c0 0x49656e696c65746e>

java.0.139682.txt

@rkgithubs
Copy link
Author

simple java -version failed.
/root/DynamoRIO-x86_64-Linux-7.90.18019-0/bin64/drrun -debug -loglevel 2 -- /root/jdk1.9-b160/bin/java -version

<Starting application /root/jdk1.9-b160/bin/java (139870)>
<Initial options = -no_dynamic_options -loglevel 2 -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/root/DynamoRIO-x86_64-Linux-7.90.18019-0/lib64/debug/libdynamorio.so' 0x00007f8f1f430580

<(1+x) Handling our fault in a TRY at 0x00007f8f1f679d7c>
<spurious rep/repne prefix @0x00007f8f1f1edf96 (f2 41 ff e3): >

<Application /root/jdk1.9-b160/bin/java (139870). Internal Error: DynamoRIO debug check failure: /home/travis/build/DynamoRIO/dynamorio/core/translate.c:931 false
(Error occurred @18379 frags)
version 7.90.18019, custom build
-no_dynamic_options -loglevel 2 -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct
0x00007f8eff4d7b40 0x00007f8f1f4cb00e
0x00007f8eff4d7d90 0x00007f8f1f5e168d
0x00007f8eff4d7e90 0x00007f8f1f5e321c
0x00007f8eff4d8920 0x00007f8f1f5e379a
0x00007f8eff4d8960 0x00007f8f1f5bc3ae
0x00007f8eff4d89c0 0x00007f8f1f6b052f
0x00007f8eff4d8ec0 0x00007f8f1f6b5c1e
0x00007f8eff4d91d0 0x00007f8f1f6b8e38
0x00007f8eff4d9370 0x00007f8f1f67989d
0x00007f8efe5991c0 0x49656e696c65746e>

@AssadHashmi
Copy link
Contributor

AssadHashmi commented May 15, 2019

From the java.0.139682.txt log file supplied:

. . .
SYSLOG_WARNING: spurious rep/repne prefix @0x00007f1e21f79f96 (f2 41 ff e3):
. . .
SYSLOG_WARNING: Invalid opcode encountered                                       
SYSLOG_ERROR: Application /root/jdk1.9-b160/bin/java (139682).  Internal Error: DynamoRIO debug check failure: /home/travis/build/DynamoRIO/dynamorio/core/translate.c:931 false
(Error occurred @18897 frags)                                                    
version 7.90.18019, custom build
. . .

Which is consistent with the SIGILL.
Can you confirm that the title of this issue is inaccurate in that you're not running AArch64?

Did you build the /root/jdk1.9-b160/bin/java binary from source or is it a standard pre-built download?

The same error appears in issue #3610

@derekbruening
Copy link
Contributor

Please look in the thread log file (you've only supplied the global log file) to see what the invalid opcode is. Is it AVX-512 (#1312) whose support is in-progress?

@rkgithubs
Copy link
Author

#java -version shouldn't execute avx512 opcodes. This is jvm built from the source, also system is x86_64.

@hgreving2304
Copy link

hgreving2304 commented May 16, 2019 via email

@rkgithubs
Copy link
Author

Is there a recommended version of JDK on x64. Currently all java program crash with DR for JDK9+

@rkgithubs
Copy link
Author

with jdk8 debug level 3; java app is stuck <proactive reset @ fragments>>. I have tried running SPEC jvm2008 and some multi-threaded tests- applications don't run/start as expected. Please suggest if any more info is needed. This is x86_64 and jdk8 built from source.

~/rahul/java_tests# /root/rahul/DynamoRIO-x86_64-Linux-7.90.18019-0/bin64/drrun -debug -loglevel 3 -- /root/rahul/jdk1.8.0_201/bin/java TestThread

<Starting application /root/rahul/jdk1.8.0_201/bin/java (186088)>
<Initial options = -no_dynamic_options -loglevel 3 -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/root/rahul/DynamoRIO-x86_64-Linux-7.90.18019-0/lib64/debug/libdynamorio.so' 0x00007f106bec3580

<curiosity: rex.w on OPSZ_6_irex10_short4!>
<(1+x) Handling our fault in a TRY at 0x00007f106c10cd7c>
<spurious rep/repne prefix @0x00007f106bc80f96 (f2 41 ff e3): >

<get_memory_info mismatch! (can happen if os combines entries in /proc/pid/maps)
os says: 0x00007f0fbc000000-0x00007f0fe8000000 prot=0x00000000
cache says: 0x00007f0fbc000000-0x00007f0fc8000000 prot=0x00000000

<proactive reset @ 18507 fragments>
<proactive reset @ 41493 fragments>
<proactive reset @ 53141 fragments>
<proactive reset @ 60504 fragments>
<proactive reset @ 81809 fragments>

@rkgithubs
Copy link
Author

rkgithubs commented May 22, 2019

top of stack trace

A fatal error has been detected by the Java Runtime Environment:

Internal Error (sharedRuntime.cpp:549), pid=186643, tid=0x00007f78d61f3700
guarantee(cb != NULL && cb->is_nmethod()) failed: safepoint polling: pc must refer to an nmethod

JRE version: Java(TM) SE Runtime Environment (8.0_201-b09) (build 1.8.0_201-b09)
Java VM: Java HotSpot(TM) 64-Bit Server VM (25.201-b09 mixed mode linux-amd64 compressed oops)
Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

If you would like to submit a bug report, please visit:
http://bugreport.java.com/bugreport/crash.jsp

--------------- T H R E A D ---------------

Current thread (0x00007f79f8173000): JavaThread "3" [_thread_in_Java, id=186719, stack(0x00007f78d60f3000,0x00007f78d61f4000)]

Stack: [0x00007f78d60f3000,0x00007f78d61f4000], sp=0x00007f78d61ef020, free space=1008k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0xad3525] VMError::report_and_die()+0x2e5
V [libjvm.so+0x4e0368] report_vm_error(char const*, int, char const*, char const*)+0x58
V [libjvm.so+0x9c071f] SharedRuntime::get_poll_stub(unsigned char*)+0x6f
V [libjvm.so+0x9190fc] JVM_handle_linux_signal+0x8ac
V [libjvm.so+0x90b8b8] signalHandler(int, siginfo*, void*)+0x38
C [libpthread.so.0+0x11390]
j MTDemo.fib(Ljava/math/BigInteger;)Ljava/math/BigInteger;+46
j MTDemo.fib(Ljava/math/BigInteger;)Ljava/math/BigInteger;+30
j MTDemo.fib(Ljava/math/BigInteger;)Ljava/math/BigInteger;+30
j MTDemo.fib(Ljava/math/BigInteger;)Ljava/math/BigInteger;+30
j MTDemo.fib(Ljava/math/BigInteger;)Ljava/math/BigInteger;+30
j MTDemo.fib(Ljava/math/BigInteger;)Ljava/math/BigInteger;+30
j MTDemo.fib(Ljava/math/BigInteger;)Ljava/math/BigInteger;+30
j MTDemo.fib(Ljava/math/BigInteger;)Ljava/math/BigInteger;+30
j MTDemo.fib(Ljava/math/BigInteger;)Ljava/math/BigInteger;+30
j MTDemo.fib(Ljava/math/BigInteger;)Ljava/math/BigInteger;+30
j MTDemo.fib(Ljava/math/BigInteger;)Ljava/math/BigInteger;+30
j MTDemo.fib(Ljava/math/BigInteger;)Ljava/math/BigInteger;+30
j MTDemo.fib(Ljava/math/BigInteger;)Ljava/math/BigInteger;+30

@derekbruening
Copy link
Contributor

We are still waiting for info about the invalid opcode...note that Status-NeedInfo label.

@rkgithubs rkgithubs changed the title crash due profiling java application (s) on aarch64 crash due profiling java (jdk 8) application (s) on aarch64 May 22, 2019
@rkgithubs
Copy link
Author

The program crashes on jdk9 due to invalid opcode. log attached

SYSLOG_WARNING: Invalid opcode encountered
Error decoding 0x00007f54d97fb4a5 == 0x62 0xf1 0x7d 0x8 0x6e 0xc1
Error decoding 0x00007f54d97fb4a5 == 0x62 0xf1 0x7d 0x8 0x6e 0xc1
Error decoding 0x00007f54d97fb4a5 == 0x62 0xf1 0x7d 0x8 0x6e 0xc1
ERROR: recreate_app : looking for 0x0000000044a2db9f in frag @ 0x0000000044a2db90 (tag 0x00007f54d97fb4a5)
TAG 0x0000000000000000
+0 L4 @0x00007f54fb7c1ea0 e9 a0 d7 14 de jmp $0x00007f54d97fb4a5
END 0x0000000000000000
log.1.191280.txt

@rkgithubs
Copy link
Author

rkgithubs commented May 22, 2019

although there is different crash with jdk 8. attached thread level log for this crash.
log.1.191850.txt

SYSLOG_ERROR: Application /root/rahul/jdk1.8.0_201/bin/java (191849). Internal Error: DynamoRIO debug check failure: /home/travis/build/DynamoRIO/dynamorio/core/dispatch.c:757 wherewasi == DR_WHERE_FCACHE || wherewasi == DR_WHERE_TRAMPOLINE || wherewasi == DR_WHERE_APP || (dcontext->go_native && wherewasi == DR_WHERE_DISPATCH)
(Error occurred @94436 frags)
version 7.90.18019, custom build

@derekbruening
Copy link
Contributor

The program crashes on jdk9 due to invalid opcode. log attached

SYSLOG_WARNING: Invalid opcode encountered
Error decoding 0x00007f54d97fb4a5 == 0x62 0xf1 0x7d 0x8 0x6e 0xc1
Error decoding 0x00007f54d97fb4a5 == 0x62 0xf1 0x7d 0x8 0x6e 0xc1
Error decoding 0x00007f54d97fb4a5 == 0x62 0xf1 0x7d 0x8 0x6e 0xc1

That is AVX-512.

@derekbruening
Copy link
Contributor

The wherewasi assert is #3175. It's not serious -- maybe re-run with -ignore_assert_list '*' to let it continue to see if more significant things show up?

@derekbruening
Copy link
Contributor

<proactive reset @ 18507 fragments>
<proactive reset @ 41493 fragments>
<proactive reset @ 53141 fragments>
<proactive reset @ 60504 fragments>
<proactive reset @ 81809 fragments>

These seem surprisingly early and frequent: that much memory shouldn't be used so quickly. I would try with -vm_size 1G -no_enable_reset -disable_traces. Maybe an experimental run which adds -no_hw_cache_consistency -no_sandbox_writes.

@rkgithubs
Copy link
Author

application can be run via DR after adding -disable_traces. Wouldn't it disable the trace collection completely though. Our goal is to run drcachesim tool to collect the memory access instructions.

/rahul/java_tests# /root/rahul/DynamoRIO-x86_64-Linux-7.90.18019-0/bin64/drrun -s 60 -vm_size 1G -no_enable_reset -disable_traces -- /root/rahul/jdk1.8.0_201/bin/java TestThread
Creating 1
Starting 1
Creating 2
Starting 2
Running1
Running2
Thread: 2,125
Thread: 1,125

@derekbruening
Copy link
Contributor

The DR core's traces have no relation to drcachesim's memory access traces. See the -disable_traces docs http://dynamorio.org/docs/using.html#sec_options, see the DR tutorial slides, other DR documentation.

@derekbruening derekbruening changed the title crash due profiling java (jdk 8) application (s) on aarch64 APP CRASH, HANG running jdk8 apps with traces enabled May 23, 2019
@rkgithubs
Copy link
Author

rkgithubs commented May 24, 2019

once we tried to collect traces -t drachesim -offline trace collection failed with the following:
hs_err_pid198432.log indicates the following exception thrown. There is no directory inside logs for this PID.
[hs_err_pid198432.log]

Internal exceptions (2 events):
Event: 2.043 Thread 0x00007f1b2800a800 Exception <a 'java/lang/NoSuchMethodError': Method sun.misc.Unsafe.defineClass(Ljava/lang/String;[BII)Ljava/lang/Class; name or signature does not match> (0x00007f1899107cb0) thrown at [/HUDSON/workspace/8-2-build-linux-amd64/jdk8u201/12317/hotspot/src/sh
Event: 2.046 Thread 0x00007f1b2800a800 Exception <a 'java/lang/NoSuchMethodError': Method sun.misc.Unsafe.prefetchRead(Ljava/lang/Object;J)V name or signature does not match> (0x00007f1899107f98) thrown at [/HUDSON/workspace/8-2-build-linux-amd64/jdk8u201/12317/hotspot/src/share/vm/prims/jni.c

[stdout]

/root/rahul/DynamoRIO-x86_64-Linux-7.90.18019-0/bin64/drrun -vm_size
o_enable_reset -disable_traces -t drcachesim -offline -trace_after_instrs 5000M -exit_after_tracing 
-- /root/rahul/jdk1.8.0_201/bin/java TestThread

A fatal error has been detected by the Java Runtime Environment:

  SIGSEGV (0xb) at pc=0x00007f1b30370481, pid=198432, tid=0x00007f1246d30700

 JRE version: Java(TM) SE Runtime Environment (8.0_201-b09) (build 1.8.0_201-b09)
 Java VM: Java HotSpot(TM) 64-Bit Server VM (25.201-b09 mixed mode linux-amd64 compressed oops)
 Problematic frame:
 V  [libjvm.so+0x2dd481]  Assembler::jmp_literal(unsigned char*, RelocationHolder const&)+0x61

 Core dump written. Default location: /root/rahul/java_tests/core or core.198432

 An error report file with more information is saved as:
 /root/rahul/java_tests/hs_err_pid198432.log

[error occurred during error reporting , id 0xb]

@derekbruening
Copy link
Contributor

(Triple backtics delimit code blocks to make it easier to read -- I added some)

@derekbruening
Copy link
Contributor

So the status is that jdk8 on even small Java apps seems to hit an app crash when run under plain DR with traces enabled, or when run under drcachesim -offline? Likely both are related to cache consistency: probably the same or similar to #2989.

@rkgithubs
Copy link
Author

rkgithubs commented May 24, 2019 via email

@derekbruening
Copy link
Contributor

crash is seen with drcachesim -offline on small apps even

That is surprising given regression tests and usage elsewhere. This sounds like a separate issue from this issue here about JDK8. Please file a separate issue and provide the app where this happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants