Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation error vmState=0x0005ffff in internal special.system tests with JITServer #15601

Open
cjjdespres opened this issue Jul 21, 2022 · 21 comments
Assignees
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project

Comments

@cjjdespres
Copy link
Contributor

Appears in the tests

  • ClassLoadingTest_special_5m_18
  • ClassLoadingTest_special_5m_19
  • ClassLoadingTest_special_5m_19
  • ClassLoadingTest_special_5m_24
  • DaaLoadTest_all_special_5m_18
  • DaaLoadTest_all_special_5m_19
  • DaaLoadTest_all_special_5m_24
  • DaaLoadTest_daa1_special_5m_18
  • DaaLoadTest_daa1_special_5m_19
  • DaaLoadTest_daa1_special_5m_24
  • DaaLoadTest_daa2_special_5m_18
  • DaaLoadTest_daa2_special_5m_19
  • DaaLoadTest_daa2_special_5m_23
  • DaaLoadTest_daa2_special_5m_24
  • DaaLoadTest_daa3_special_5m_24
  • LambdaLoadTest_special_J9_5m_18
  • LambdaLoadTest_special_J9_5m_24
  • MathLoadTest_all_special_5m_18
  • MathLoadTest_all_special_5m_19
  • MathLoadTest_all_special_5m_24
  • MathLoadTest_autosimd_special_5m_18
  • MathLoadTest_autosimd_special_5m_19
  • MathLoadTest_bigdecimal_special_5m_18
  • MathLoadTest_bigdecimal_special_5m_19
  • MathLoadTest_bigdecimal_special_5m_24
  • ParallelStreamsLoadTest_special_J9_18

in at least the jobs

  • /job/Test_openjdk11_j9_special.system_x86-64_linux_jit_Personal/411
  • /job/Test_openjdk17_j9_special.system_ppc64le_linux_jit_Personal/318
  • /job/Test_openjdk11_j9_special.system_ppc64le_linux_jit_Personal/401

Console log for one of the ClassLoadingTest_special_5m_18 failures:

DLT stderr Type=Segmentation error vmState=0x0005ffff
DLT stderr J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
DLT stderr Handler1=00007FFFB36DEAD0 Handler2=00007FFFB3608440
DLT stderr R0=00007FFFB1824124 R1=00007FFF8E33A130 R2=00007FFFB25CF000 R3=0000000000000000
DLT stderr R4=00007FFF8E33ACF0 R5=00007FFFAC1C7500 R6=00007FFEC7E06006 R7=0000000000000000
DLT stderr R8=0000000000000000 R9=00007FFF8FA5BAB8 R10=0000000000080005 R11=000000000000003B
DLT stderr R12=00007FFFB417A650 R13=00007FFF8E3468E0 R14=00007FFFB41B0000 R15=00007FFEC7E015C8
DLT stderr R16=0000000000000001 R17=0000000000000001 R18=00007FFF8E33AA88 R19=00007FFEC7E00188
DLT stderr R20=00007FFEC7E012F0 R21=0000000000000000 R22=00007FFEC7E01470 R23=00007FFEC7E01450
DLT stderr R24=00007FFEC7E01430 R25=0000000000000000 R26=00007FFF8E33AAE8 R27=00007FFF8E33ACF0
DLT stderr R28=00007FFFB1CDB1D0 R29=00007FFFB25F4660 R30=00007FFF8CDF5260 R31=00007FFEC7E0C0E0
DLT stderr NIP=00007FFFB1CDB210 MSR=800000000280F033 ORIG_GPR3=00007FFFB1C7BB38 CTR=00007FFFB417A650
DLT stderr LINK=00007FFFB1CDB204 XER=0000000000000000 CCR=0000000024884434 SOFTE=0000000000000001
DLT stderr TRAP=0000000000000300 DAR=0000000000000008 dsisr=0000000040000000 RESULT=0000000000000000
DLT stderr FPR0 ffffffffffffffff (f: 4294967296.000000, d: -nan)
DLT stderr FPR1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR6 00007fff800377c0 (f: 2147710976.000000, d: 6.953250e-310)
DLT stderr FPR7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR9 0000000500000004 (f: 4.000000, d: 1.060998e-313)
DLT stderr FPR10 0000f85400000004 (f: 4.000000, d: 1.348995e-309)
DLT stderr FPR11 0000000b0000000a (f: 10.000000, d: 2.334195e-313)
DLT stderr FPR12 0000ffff00000023 (f: 35.000000, d: 1.390650e-309)
DLT stderr FPR13 0000001100000000 (f: 0.000000, d: 3.607393e-313)
DLT stderr FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr Module=/home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64le_linux_jit_Personal_testList_4/openjdkbinary/j2sdk-image/lib/default/libj9jit29.so
DLT stderr Module_base_address=00007FFFB1690000
DLT stderr 
DLT stderr Method_being_compiled=org/junit/validator/AnnotationsValidator$FieldValidator.getAnnotatablesForTestClass(Lorg/junit/runners/model/TestClass;)Ljava/lang/Iterable;
DLT stderr Target=2_90_20220721_342 (Linux 4.18.0-372.13.1.el8_6.ppc64le)
DLT stderr CPU=ppc64le (32 logical CPUs) (0x1da850000 RAM)
DLT stderr ----------- Stack Backtrace -----------
DLT stderr _ZN19TR_PPCRecompilation21getExistingMethodInfoEP17TR_ResolvedMethod+0x40 (0x00007FFFB1CDB210 [libj9jit29.so+0x64b210])
DLT stderr _ZN19TR_PPCRecompilationC1EPN2TR11CompilationE+0x48 (0x00007FFFB1CDB9D8 [libj9jit29.so+0x64b9d8])
DLT stderr _ZN19TR_PPCRecompilation8allocateEPN2TR11CompilationE+0x4c (0x00007FFFB1CDBA4C [libj9jit29.so+0x64ba4c])
DLT stderr _ZN2J95Power13CodeGenerator25allocateRecompilationInfoEv+0x1c (0x00007FFFB1C887DC [libj9jit29.so+0x5f87dc])
DLT stderr _ZN3OMR11CompilationC1EiP12OMR_VMThreadP11TR_FrontEndP17TR_ResolvedMethodRN2TR12IlGenRequestERNS7_7OptionsERNS7_6RegionEP9TR_MemoryP19TR_OptimizationPlanPNS7_11EnvironmentE+0x974 (0x00007FFFB1D4AF74 [libj9jit29.so+0x6baf74])
DLT stderr _ZN2J911CompilationC2EiP10J9VMThreadP11TR_FrontEndP17TR_ResolvedMethodRN2TR12IlGenRequestERNS7_7OptionsERNS7_6RegionEP9TR_MemoryP19TR_OptimizationPlanP20TR_RelocationRuntimePNS7_11EnvironmentE+0x94 (0x00007FFFB17C5FD4 [libj9jit29.so+0x135fd4])
DLT stderr _ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0x95c (0x00007FFFB180373C [libj9jit29.so+0x17373c])
DLT stderr omrsig_protect+0x3f4 (0x00007FFFB3609914 [libj9prt29.so+0x39914])
DLT stderr _ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x36c (0x00007FFFB180039C [libj9jit29.so+0x17039c])
DLT stderr _ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x170 (0x00007FFFB1800990 [libj9jit29.so+0x170990])
DLT stderr _ZN2TR24CompilationInfoPerThread14processEntriesEv+0x3fc (0x00007FFFB17FF17C [libj9jit29.so+0x16f17c])
DLT stderr _ZN2TR24CompilationInfoPerThread3runEv+0xa8 (0x00007FFFB17FF7A8 [libj9jit29.so+0x16f7a8])
DLT stderr _Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0xa0 (0x00007FFFB17FF860 [libj9jit29.so+0x16f860])
DLT stderr omrsig_protect+0x3f4 (0x00007FFFB3609914 [libj9prt29.so+0x39914])
DLT stderr _Z21compilationThreadProcPv+0x1a4 (0x00007FFFB17FFDC4 [libj9jit29.so+0x16fdc4])
DLT stderr thread_wrapper+0x190 (0x00007FFFB359CBC0 [libj9thr29.so+0xcbc0])
DLT stderr start_thread+0xf8 (0x00007FFFB4069678 [libpthread-2.28.so+0x9678])
DLT stderr clone+0x74 (0x00007FFFB3F57E58 [libc-2.28.so+0x137e58])

It reproduces with JITServer enabled (23/25 tests) and does not appear to reproduce with JITServer disabled (0/25 tests).

@cjjdespres
Copy link
Contributor Author

@mpirvu. I'll try to reproduce this locally. From the recent commits I noticed 1a92d68 that touches the JITServer network communication, but I haven't looked at it in detail.

@mpirvu
Copy link
Contributor

mpirvu commented Jul 21, 2022

The crash seems to be in the in the Compilation constructor

OMR::Compilation::Compilation(int, OMR_VMThread*, TR_FrontEnd*, TR_ResolvedMethod*, TR::IlGenRequest&, TR::Options&, TR::Region&, TR_Memory*, TR_OptimizationPlan*, TR::Environment*)

which at some point calls J9::Power::CodeGenerator::allocateRecompilationInfo()
and a few frames after TR_PPCRecompilation::getExistingMethodInfo(TR_ResolvedMethod*)

@gacholio gacholio added the comp:jitserver Artifacts related to JIT-as-a-Service project label Jul 22, 2022
@cjjdespres
Copy link
Contributor Author

I am able to reproduce this locally.

@cjjdespres
Copy link
Contributor Author

After bisection it appears that the bad commit is in fact d505324.

@mpirvu
Copy link
Contributor

mpirvu commented Jul 26, 2022

It's likely a problem with -XX:+JITServerLocalSyncCompiles

@cjjdespres
Copy link
Contributor Author

Setting -XX:-JITServerLocalSyncCompiles does seem to solve the issue.

@mpirvu
Copy link
Contributor

mpirvu commented Jul 27, 2022

Since you are able to reproduce this locally, could you please generate a verbose log at the client?
We may need to further instrument the code if I cannot figure out the problem from the code.

@cjjdespres
Copy link
Contributor Author

I can generate a client verbose log by setting EXTRA_OPTIONS, but the segfault does not occur when I do.

@mpirvu
Copy link
Contributor

mpirvu commented Jul 27, 2022

That's unfortunate. So it's also time dependent. is it possible that when adding EXTRA_OPTIONS, the JVM no longer runs in client mode?

@cjjdespres
Copy link
Contributor Author

It appears to run in client mode still:

#JITServer: JITServer version: 1.38.0
#JITServer: JITServer Client Mode. Server address: localhost port: 38400. Connection Timeout 30000ms
#JITServer: Identifier for current client JVM: 12627387422990972750

and compilation requests seem to be fulfilled.

@cjjdespres
Copy link
Contributor Author

If it's time dependent, would you like me to try to get a log on a perf machine or something with a bit more power?

@mpirvu
Copy link
Contributor

mpirvu commented Jul 27, 2022

Maybe targeted instrumentation will work better, but I have to look at the code first

@mpirvu
Copy link
Contributor

mpirvu commented Jul 28, 2022

Trying to decipher the code:
After the sync compilation has been performed locally, the application thread that was waiting for the compilation result is notified (in compileOnSeparateThread()) and executes this piece of code which is supposed to add another compilation to the queue:

#if defined(J9VM_OPT_JITSERVER)
      // If a remote sync compilation has been changed to a local sync compilation,
      // queue a new remote async compilation entry for this method.
      // - The second compilation is queued only when this is the last thread waiting
      //   for the compilation of the method.
      // - If for any reason, the current compilation request is changed to a sync
      //   compilation from an async compilation, it probably means we can’t do an
      //   async compilation. Don't queue the second async remote compilation.
      if (entry->hasChangedToLocalSyncComp() &&
          (entry->_numThreadsWaiting <= 1) &&
          !forcedSync &&
          (startPC && startPC != oldStartPC))
         {
         void *currOldStartPC = startPCIfAlreadyCompiled(vmThread, details, startPC);
         // If currOldStartPC is NULL, the method body defined by startPC has not been
         // recompiled. We will queue the second remote aync compilation here. Otherwise,
         // another thread is doing the recompilation and will take care of queueing
         // the second compilation.
         if (!currOldStartPC)
            {
            CompilationPriority compPriority = CP_ASYNC_NORMAL;
            bool queuedForRemote = false;
            bool isAsync = true;
            TR_OptimizationPlan *plan = TR_OptimizationPlan::alloc(entry->_origOptLevel);
            TR_YesNoMaybe mthInSharedCache = TR_no;

            if (plan)
               {
               TR_MethodToBeCompiled *entryRemoteCompReq = addMethodToBeCompiled(details, currOldStartPC, compPriority, isAsync,
                                                                                 plan, &queuedForRemote, mthInSharedCache);

               if (entryRemoteCompReq)
                  {
                  entryRemoteCompReq->_async = isAsync;

                  if (getMethodQueueSize() <= 1 ||
                     getNumCompThreadsJobless() > 0) // send notification if any thread is sleeping on comp monitor waiting for suitable work
                     {
                     getCompilationMonitor()->notifyAll();
                     }
                  if (TR::Options::getJITCmdLineOptions()->getVerboseOption(TR_VerboseCompileRequest))
                      TR_VerboseLog::writeLineLocked(TR_Vlog_CR,"%p   Queued a remote async compilation: entry=%p, j9method=%p",
                        vmThread, entryRemoteCompReq, entryRemoteCompReq->getMethodDetails().getMethod());
                  }
               if (!queuedForRemote)
                   TR_OptimizationPlan::freeOptimizationPlan(plan);
               }
            }
         }
#endif /* defined(J9VM_OPT_JITSERVER) */

In void *currOldStartPC = startPCIfAlreadyCompiled(vmThread, details, startPC); startPC corresponds to the sync compilation that has just finished. We know it's not NULL because of the test before. Assuming this is not a thunk, startPCIfAlreadyCompiled() should see this non-NULL value, then look at the linkage info to see whether a recompilation has been been attempted, and if not, we return NULL. Then, the main code proceeds to queue a recompilation with:

TR_MethodToBeCompiled *entryRemoteCompReq = addMethodToBeCompiled(details, currOldStartPC, compPriority, isAsync,
                                                                                 plan, &queuedForRemote, mthInSharedCache);

I think there is a bug here, because currOldStartPC is NULL and we want to pass startPC, that is the most recent startPC for the method in question.

@mpirvu
Copy link
Contributor

mpirvu commented Jul 28, 2022

This piece of code also deserves a second look:

         // If this is a remote sync compilation, change it to a local sync compilation.
         // After the local compilation is completed successfully, a remote async compilation
         // will be scheduled in compileOnSeparateThread().
         TR::IlGeneratorMethodDetails & details = entry->getMethodDetails();
         if (!doLocalCompilation &&
             !entry->_async &&
             !entry->isJNINative() &&
             entry->_optimizationPlan->getOptLevel() > cold &&
             !details.isNewInstanceThunk() &&
             !details.isMethodHandleThunk() &&
             (TR::Options::getCmdLineOptions()->getOption(TR_EnableJITServerHeuristics) ||
             _compInfo.getPersistentInfo()->isLocalSyncCompiles()) &&
             !TR::Options::getCmdLineOptions()->getOption(TR_DisableUpgradingColdCompilations) &&
             TR::Options::getCmdLineOptions()->allowRecompilation())
            {
            doLocalCompilation = true;
            entry->_origOptLevel = entry->_optimizationPlan->getOptLevel();
            entry->_optimizationPlan->setOptLevel(cold);
            entry->_optimizationPlan->setOptLevelDowngraded(true);

            if (TR::Options::getVerboseOption(TR_VerboseCompilationDispatch))
               TR_VerboseLog::writeLineLocked(TR_Vlog_DISPATCH, "Changed the remote sync compilation to a local sync cold compilation: j9method=%p",
               entry->getMethodDetails().getMethod());
            }

If I read this correctly, synchronous compilations at cold at below do not benefit from this change. Instead they are going to be performed remotely. If this true, I think we need to change the code to perform the compilation locally, without a forced remote retry.
Maybe changing preferLocalComp() to return true if isLocalSyncCompiles() && entry->_optimizationPlan->getOptLevel() <= cold && !entry->_async

cjjdespres added a commit to cjjdespres/openj9 that referenced this issue Jul 31, 2022
Fixes: eclipse-openj9#15601
Signed-off-by: Christian Despres <despresc@ibm.com>
@cjjdespres
Copy link
Contributor Author

cjjdespres commented Aug 12, 2022

I think that this may still be happening, but with different tests now. I see various failures on ppc64le only, though perhaps the x86_64 tests haven't finished yet. (EDIT: it's happening on s390x too).

  • cmdLineTester_jvmtitests_hcr_OSRG_nongold_SE80_3
  • cmdLineTester_jvmtitests_hcr_OSRG_nongold_SE80_4
  • jit_ra_0
  • cmdLineTester_jvmtitests_hcr_10
  • jit_recognizedMethod_3
  • jit_recognizedMethod_4

scattered across the jobs

  • /job/Test_openjdk8_j9_extended.functional_ppc64le_linux_jit_Personal/429
  • /job/Test_openjdk11_j9_extended.functional_ppc64le_linux_jit_Personal/430
  • /job/Test_openjdk8_j9_sanity.functional_ppc64le_linux_jit_Personal/994
  • /job/Test_openjdk11_j9_sanity.functional_ppc64le_linux_jit_Personal/1001

Example from cmdLineTester_jvmtitests_hcr_OSRG_nongold_SE80_4

Testing: rc018
Test start time: 2022/08/12 01:57:57 Pacific Standard Time
Running command: "/home/jenkins/workspace/Test_openjdk8_j9_extended.functional_ppc64le_linux_jit_Personal_testList_0/openjdkbinary/j2sdk-image/bin/java" -XX:+UseJITServer -Xcompressedrefs -Xgcpolicy:gencon -Xjit:enableOSR,enableOSROnGuardFailure,count=1,disableAsyncCompilation  -Xdump    -agentlib:jvmtitest=test:rc018 -cp "/home/jenkins/workspace/Test_openjdk8_j9_extended.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jvmtitests/jvmtitest.jar:/home/jenkins/workspace/Test_openjdk8_j9_extended.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../TKG/lib/asm-all.jar" com.ibm.jvmti.tests.util.TestRunner
Time spent starting: 12 milliseconds
Time spent executing: 5244 milliseconds
Test result: FAILED
Output from test:
 [OUT] *** Testing [1/1]:	testReflectRedefineAtSameTime
 [OUT] starting reflect worker threads
 [OUT] starting to populate java heap
 [ERR] Unhandled exception
 [ERR] Type=Segmentation error vmState=0x00000000
 [ERR] J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
 [ERR] Handler1=00003FFF99403650 Handler2=00003FFF9932D080
 [ERR] R0=00003FFF981E04CC R1=00003FFF999FB810 R2=00003FFF98F5F000 R3=0000000000000000
 [ERR] R4=00003FFF999FBD48 R5=0000000000000200 R6=0000000000000007 R7=0000000000000000
 [ERR] R8=0000000000000008 R9=0000000000000220 R10=0000000000000002 R11=0000000000000000
 [ERR] R12=00003FFF983741A0 R13=00003FFF99A06900 R14=0000000000152E00 R15=0000000000095700
 [ERR] R16=00003FFF7A640038 R17=00000000001D8F10 R18=0000000000000000 R19=0000000000000001
 [ERR] R20=00003FFF999FB96F R21=0000000000000000 R22=0000000000095700 R23=00003FFF7A4CD6D0
 [ERR] R24=0000000000000080 R25=00003FFF7BDC307C R26=00003FFF940BCA60 R27=00003FFF999FBD48
 [ERR] R28=0000000000000040 R29=00003FFF940BB8B0 R30=0000000000000000 R31=00003FFF18004450
 [ERR] NIP=00003FFF981E0504 MSR=800000010280F033 ORIG_GPR3=C0000000000027E0 CTR=00003FFF983741A0
 [ERR] LINK=00003FFF981E04F8 XER=0000000000000000 CCR=0000000024004424 SOFTE=0000000000000001
 [ERR] TRAP=0000000000000300 DAR=0000000000000008 dsisr=0000000040000000 RESULT=0000000000000000
 [ERR] FPR0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR1 406a400000000000 (f: 0.000000, d: 2.100000e+02)
 [ERR] FPR2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR4 3fe3333340000000 (f: 1073741824.000000, d: 6.000000e-01)
 [ERR] FPR5 bfdb4b81c0000000 (f: 3221225472.000000, d: -4.264836e-01)
 [ERR] FPR6 3fe5555560000000 (f: 1610612736.000000, d: 6.666667e-01)
 [ERR] FPR7 3f2ede5680000000 (f: 2147483648.000000, d: 2.355080e-04)
 [ERR] FPR8 5c002300095b0008 (f: 156958720.000000, d: 1.466099e+135)
 [ERR] FPR9 ffffffffffffffff (f: 4294967296.000000, d: -nan)
 [ERR] FPR10 60002300095f0007 (f: 157220864.000000, d: 2.704475e+154)
 [ERR] FPR11 0009580023000957 (f: 587204928.000000, d: 1.299409e-308)
 [ERR] FPR12 0000000000000001 (f: 1.000000, d: 4.940656e-324)
 [ERR] FPR13 4008000000000000 (f: 0.000000, d: 3.000000e+00)
 [ERR] FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] Module=/home/jenkins/workspace/Test_openjdk8_j9_extended.functional_ppc64le_linux_jit_Personal_testList_0/openjdkbinary/j2sdk-image/jre/lib/ppc64le/default/libj9jit29.so
 [ERR] Module_base_address=00003FFF98070000
 [ERR] Target=2_90_20220812_1090 (Linux 3.10.0-1160.71.1.el7.ppc64le)
 [ERR] CPU=ppc64le (4 logical CPUs) (0xde750000 RAM)
 [ERR] ----------- Stack Backtrace -----------
 [ERR] protectedBacktrace+0x24 (0x00003FFF993270E4 [libj9prt29.so+0x370e4])
 [ERR] omrsig_protect+0x358 (0x00003FFF9932E488 [libj9prt29.so+0x3e488])
 [ERR] omrintrospect_backtrace_thread_raw+0xe4 (0x00003FFF99327714 [libj9prt29.so+0x37714])
 [ERR] protectedIntrospectBacktraceThread+0x24 (0x00003FFF99326D04 [libj9prt29.so+0x36d04])
 [ERR] omrsig_protect+0x358 (0x00003FFF9932E488 [libj9prt29.so+0x3e488])
 [ERR] omrintrospect_backtrace_thread+0x98 (0x00003FFF99326E18 [libj9prt29.so+0x36e18])
 [ERR] generateDiagnosticFiles+0x140 (0x00003FFF994034D0 [libj9vm29.so+0x434d0])
 [ERR] omrsig_protect+0x358 (0x00003FFF9932E488 [libj9prt29.so+0x3e488])
 [ERR] structuredSignalHandler+0x1b4 (0x00003FFF99403804 [libj9vm29.so+0x43804])
 [ERR] mainSynchSignalHandler+0x260 (0x00003FFF9932D2E0 [libj9prt29.so+0x3d2e0])
 [ERR] __kernel_sigtramp_rt64+0x0 (0x00003FFF99CF0478)
 [ERR] _ZN2TR15CompilationInfo21addMethodToBeCompiledERNS_24IlGeneratorMethodDetailsEPv19CompilationPrioritybP19TR_OptimizationPlanPb13TR_YesNoMaybe+0x54c (0x00003FFF981E04CC [libj9jit29.so+0x1704cc])
 [ERR] _ZN2TR15CompilationInfo23compileOnSeparateThreadEP10J9VMThreadRNS_24IlGeneratorMethodDetailsEPv13TR_YesNoMaybeP23TR_CompilationErrorCodePbP19TR_OptimizationPlan+0xe4c (0x00003FFF981E2DAC [libj9jit29.so+0x172dac])
 [ERR] _ZN2TR15CompilationInfo13compileMethodEP10J9VMThreadRNS_24IlGeneratorMethodDetailsEPv13TR_YesNoMaybeP23TR_CompilationErrorCodePbP19TR_OptimizationPlan+0x314 (0x00003FFF981E3714 [libj9jit29.so+0x173714])
 [ERR] DLTLogic+0x74c (0x00003FFF98203B0C [libj9jit29.so+0x193b0c])
 [ERR] jitMethodSampleInterrupt+0x13c (0x00003FFF9820400C [libj9jit29.so+0x19400c])
 [ERR] dispatchAsyncEvents+0x124 (0x00003FFF99405874 [libj9vm29.so+0x45874])
 [ERR] javaCheckAsyncMessages+0xf8 (0x00003FFF993D22A8 [libj9vm29.so+0x122a8])
 [ERR] bytecodeLoopCompressed+0x1c4 (0x00003FFF9946B694 [libj9vm29.so+0xab694])
 [ERR]  (0x00003FFF9952A930 [libj9vm29.so+0x16a930])
 [ERR] sidecarInvokeReflectMethodImpl+0x3c0 (0x00003FFF993DDE10 [libj9vm29.so+0x1de10])
 [ERR] sidecarInvokeReflectMethod+0x38 (0x00003FFF993DF2A8 [libj9vm29.so+0x1f2a8])
 [ERR] JVM_InvokeMethod_Impl+0xb4 (0x00003FFF8D17C504 [libjclse29.so+0x7c504])
 [ERR] JVM_InvokeMethod+0x40 (0x00003FFF9991F620 [libjvm.so+0x1f620])
 [ERR] JVM_InvokeMethod+0x34 (0x00003FFF99A06564 [libjvm.so+0x6564])
 [ERR] Java_sun_reflect_NativeMethodAccessorImpl_invoke0+0x24 (0x00003FFF7A26C064 [libjava.so+0xc064])
 [ERR]  (0x00003FFF7BDFD724 [<unknown>+0x0])
 [ERR] runCallInMethod+0x270 (0x00003FFF993DD700 [libj9vm29.so+0x1d700])
 [ERR] gpProtectedRunCallInMethod+0x50 (0x00003FFF99408980 [libj9vm29.so+0x48980])
 [ERR] signalProtectAndRunGlue+0x28 (0x00003FFF9953A988 [libj9vm29.so+0x17a988])
 [ERR] omrsig_protect+0x358 (0x00003FFF9932E488 [libj9prt29.so+0x3e488])
 [ERR] gpProtectAndRun+0xac (0x00003FFF9953AA5C [libj9vm29.so+0x17aa5c])
 [ERR] gpCheckCallin+0xc4 (0x00003FFF9940B234 [libj9vm29.so+0x4b234])
 [ERR] callStaticVoidMethod+0x48 (0x00003FFF99407F98 [libj9vm29.so+0x47f98])
 [ERR] JavaMain+0xff4 (0x00003FFF99C69C94 [libjli.so+0x9c94])
 [ERR] start_thread+0x104 (0x00003FFF99CA8CD4 [libpthread.so.0+0x8cd4])
 [ERR] clone+0xe4 (0x00003FFF99B67F14 [libc.so.6+0x127f14])

I'll grind these to get a sense of how frequent the segfault is now.

@mpirvu
Copy link
Contributor

mpirvu commented Aug 12, 2022

The goal is find a test that fails consistently so that we can debug faster

@cjjdespres
Copy link
Contributor Author

I can reproduce cmdLineTester_jvmtitests_hcr_OSRG_nongold_SE80_3 readily now. The program is still crashing at the same place as before:

4964	         TR_PersistentMethodInfo* methodInfo = bodyInfo->getMethodInfo(); //TR::Recompilation::getMethodInfoFromPC(pc);

and bodyInfo is NULL.

@mpirvu
Copy link
Contributor

mpirvu commented Aug 15, 2022

I would like to know what kind of compilation was the one before this one.

TR::Recompilation::getJittedBodyInfoFromPC(pc); has the following definition:

TR_PersistentJittedBodyInfo *J9::Recompilation::getJittedBodyInfoFromPC(void *startPC)
   {
   // The body info pointer is stored in the pre-prologue of the method. The
   // location of the field depends upon the type of the method header.  Use the
   // header-type bits in the linkage info fields to determine what kind of header
   // this is.
   //
   J9::PrivateLinkage::LinkageInfo *linkageInfo = J9::PrivateLinkage::LinkageInfo::get(startPC);
   return linkageInfo->isRecompMethodBody() ?
      *(TR_PersistentJittedBodyInfo **)((uint8_t*)startPC + START_PC_TO_METHOD_INFO_ADDRESS) :
      NULL;
   }

LinkageInfo is a word at some offset before startPC. We ask linkageInfo if this is a body that can be recompiled and if so, we return the jittedBodyInfo which sits at some offset before startPC.
Example:

0x000007FFFDD8E730 ffffffe8 [0x000000006081AAE0] 66 66 90 66 90             nop (5 bytes)		; Alignment (boundary=8, margin=19)
0x000007FFFDD8E735 ffffffed [0x000000006081AB50] 48 3b                      dw	0xcccc		; DWImm2  Bytes that will be copied over jitEntryPoint if we fail recompilation
0x000007FFFDD8E737 ffffffef [0x000000006081AC30] e8 84 16 1e 00             call	__samplingRecompileMethod; CALLImm4 (0x00000000618DE830); CALLImm4
0x000007FFFDD8E73C fffffff4 [0x000000006081ACB0] b0 ff 59 62 00 00 00 00    dq	0x000000006259ffb0		; DQImm64  PersistentJittedBodyInfo
0x000007FFFDD8E744 fffffffc [0x000000006081AD20] 15 00 05 00                dd	0x00050015		; DDImm4   LinkageInfo This 5 means the offset between interpreter entry point and jit entry point.
0x000007FFFDD8E748 00000000 [0x000000006095B2D0] 48 8b 44 24 08             mov	rax, qword ptr [rsp+0x8]		; L8RegMem, SymRef #521[0000000000000000]+8 startPC
0x000007FFFDD8E74D 00000005 [0x000000006095B348]                            nop (0 bytes)		; Align patchable code @32 [0x0:2]   
0x000007FFFDD8E74D 00000005 [0x000000006081ADC8]                            ProcEntry                       ; PROCENTRY   jitEntryPoint
0x000007FFFDD8E74D 00000005 [0x000000006095B548] 48 3b 65 50                cmp	rsp, qword ptr [rbp+0x50]		; CMP8RegMem, SymRef #522[0000000000000000]+80 
0x000007FFFDD8E751 00000009 [0x000000006095B670]                            Label L0209:			; LABEL	; (Start of internal control flow)
0x000007FFFDD8E751 00000009 [0x000000006095B758] 0f 86 40 02 00 00          jbe	Snippet Label L0208		; JBE4	; (Helper Call Snippet with scratch-reg argument)
0x000007FFFDD8E757 0000000f [0x000000006095B8D0]                            Label L0210:			; LABEL	; (End of internal control flow)
0x000007FFFDD8E757 0000000f [0x000000006095B988] 48 83 ec 08                sub	rsp, 0x00000008	; SUB8RegImms

For methods that cannot be recompiled, we do not have a persistentJittedBodyInfo, so we return 0. This is why I would like to know what type of compilation was the synchronous one that is now getting recompiled. I know that scorching compilations do not get recompiled.

@mpirvu
Copy link
Contributor

mpirvu commented Aug 15, 2022

bool
J9::Recompilation::couldBeCompiledAgain()
   {
   return
      self()->shouldBeCompiledAgain() ||
      _compilation->usesPreexistence() ||
      _compilation->getOption(TR_EnableHCR);
   }
bool
J9::Recompilation::shouldBeCompiledAgain()
   {
   return TR::Options::canJITCompile() && !_doNotCompileAgain;
   }

_doNotCompileAgain is set in the constructor of Recompilation (when the compiler object is created)
_doNotCompileAgain(comp->getOption(TR_NoRecompile) || !comp->allowRecompilation())

So, we have to makes sure that TR_NoRecompile is not set and that allowRecompilation() is set.
We do test TR::Options::getCmdLineOptions()->allowRecompilation() when enabling the localSyncCompiles feature.
The TR_NoRecompile option is set only through command line options, which these tests don't do.

@mpirvu
Copy link
Contributor

mpirvu commented Aug 15, 2022

Recompilation object has the following constructor on x86

bool
J9::Compilation::isRecompilationEnabled()
   {

   if (!self()->cg()->getSupportsRecompilation())
      {
      return false;
      }

   if (self()->isDLT())
      {
      return false;
      }

   // Don't do recompilation on JNI virtual thunk methods
   //
   if (self()->getCurrentMethod()->isJNINative())
      return false;

   return self()->allowRecompilation();
   }

So we need to avoid DLT compilations, JNI (which we do).

@mpirvu
Copy link
Contributor

mpirvu commented Aug 15, 2022

Sampling is disabled when TR_MimicInterpreterFrameShape is enabled.

J9::Recompilation::Recompilation(TR::Compilation *comp) :
      OMR::RecompilationConnector(comp),
      _firstCompile(comp->getCurrentMethod()->isInterpreted()),
      _useSampling(TR::Options::getSamplingFrequency() != 0 && !comp->getOption(TR_MimicInterpreterFrameShape)),
      _doNotCompileAgain(comp->getOption(TR_NoRecompile) || !comp->allowRecompilation()),
      _methodInfo(0),
      _bodyInfo(0),
      _nextLevel(warm),
      _nextCounter(0)
   {
   _timer.initialize(0, comp->trMemory());
   }

Since TR_MimicInterpreterFrameShape is set at runtime, we need a runtime check that TR_MimicInterpreterFrameShape is not set when localSyncCompiles is used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project
Projects
None yet
Development

No branches or pull requests

3 participants