Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java SWIG wrapper for RoutingModel has a race condition during destruction #2466

Closed
Woodz opened this issue Mar 26, 2021 · 1 comment
Closed
Assignees
Labels
Bug Lang: Java Java wrapper issue Solver: Routing Uses the Routing library and the original CP solver
Milestone

Comments

@Woodz
Copy link

Woodz commented Mar 26, 2021

What version of OR-Tools and what language are you using?
Version: 8.1
Language: Java

Which solver are you using (e.g. CP-SAT, Routing Solver, GLOP, BOP, Gurobi)
Routing Solver

What operating system (Linux, Windows, ...) and version?
Across multiple (Mac + Linux)

What did you do?
Leave RoutingModel to be garbage collected

What did you expect to see
No error

What did you see instead?
Occasionally SIGSEGV

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000139e19977, pid=55706, tid=15619
#
# JRE version: OpenJDK Runtime Environment Corretto-15.0.2.7.1 (15.0.2+7) (build 15.0.2+7)
# Java VM: OpenJDK 64-Bit Server VM Corretto-15.0.2.7.1 (15.0.2+7, mixed mode, sharing, tiered, compressed oops, g1 gc, bsd-amd64)
# Problematic frame:
# C  [libjniortools.dylib+0x51977]  _ZN7JNIEnv_15DeleteGlobalRefEP8_jobject+0x7
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   https://github.com/corretto/corretto-jdk/issues/
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  S U M M A R Y ------------

Command Line: -ea -Didea.test.cyclic.buffer.size=1048576 -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=50786:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8 com.intellij.rt.junit.JUnitStarter -ideVersion5 -junit5 @w@/private/var/folders/13/rc5f8b_j4vddj_hyyp8s8wwm0000gp/T/idea_working_dirs_junit.tmp @/private/var/folders/13/rc5f8b_j4vddj_hyyp8s8wwm0000gp/T/idea_junit.tmp -socket50785

Host: MacBookPro15,4 x86_64 1400 MHz, 8 cores, 16G, Darwin 20.3.0
Time: Fri Mar 26 13:00:11 2021 +08 elapsed time: 384.783762 seconds (0d 0h 6m 24s)

---------------  T H R E A D  ---------------

Current thread (0x00007fc6ff00ae00):  JavaThread "Finalizer" daemon [_thread_in_native, id=15619, stack(0x000070000c710000,0x000070000c810000)]

Stack: [0x000070000c710000,0x000070000c810000],  sp=0x000070000c80f740,  free space=1021k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libjniortools.dylib+0x51977]  _ZN7JNIEnv_15DeleteGlobalRefEP8_jobject+0x7
C  [libjniortools.dylib+0x6f103]  _ZN14GlobalRefGuardD2Ev+0x13
C  [libjniortools.dylib+0x6f0e9]  _ZN14GlobalRefGuardD1Ev+0x9
C  [libjniortools.dylib+0x6ee05]  _ZNSt3__120__shared_ptr_emplaceI14GlobalRefGuardNS_9allocatorIS1_EEE16__on_zero_sharedEv+0x15
C  [libjniortools.dylib+0x6f2e5]  _ZNSt3__114__shared_count16__release_sharedEv+0x25
C  [libjniortools.dylib+0x6f29e]  _ZNSt3__119__shared_weak_count16__release_sharedEv+0xe
C  [libjniortools.dylib+0x6f282]  _ZNSt3__110shared_ptrI14GlobalRefGuardED2Ev+0x12
C  [libjniortools.dylib+0x15b29]  _ZNSt3__110shared_ptrI14GlobalRefGuardED1Ev+0x9
C  [libjniortools.dylib+0x5ee2d]  _ZZ94Java_com_google_ortools_constraintsolver_mainJNI_RoutingModel_1registerPositiveTransitCallbackEN4$_93D2Ev+0xd
C  [libjniortools.dylib+0x4a829]  _ZZ94Java_com_google_ortools_constraintsolver_mainJNI_RoutingModel_1registerPositiveTransitCallbackEN4$_93D1Ev+0x9
C  [libjniortools.dylib+0xc01f9]  _ZNSt3__122__compressed_pair_elemIZ94Java_com_google_ortools_constraintsolver_mainJNI_RoutingModel_1registerPositiveTransitCallbackE4$_93Li0ELb0EED2Ev+0x9
C  [libjniortools.dylib+0xc0289]  _ZNSt3__117__compressed_pairIZ94Java_com_google_ortools_constraintsolver_mainJNI_RoutingModel_1registerPositiveTransitCallbackE4$_93NS_9allocatorIS1_EEED2Ev+0x9
C  [libjniortools.dylib+0xc0279]  _ZNSt3__117__compressed_pairIZ94Java_com_google_ortools_constraintsolver_mainJNI_RoutingModel_1registerPositiveTransitCallbackE4$_93NS_9allocatorIS1_EEED1Ev+0x9
C  [libjniortools.dylib+0xc0809]  _ZNSt3__110__function12__alloc_funcIZ94Java_com_google_ortools_constraintsolver_mainJNI_RoutingModel_1registerPositiveTransitCallbackE4$_93NS_9allocatorIS2_EEFxxxEE7destroyEv+0x9
C  [libjniortools.dylib+0xbfef2]  _ZNSt3__110__function6__funcIZ94Java_com_google_ortools_constraintsolver_mainJNI_RoutingModel_1registerPositiveTransitCallbackE4$_93NS_9allocatorIS2_EEFxxxEE18destroy_deallocateEv+0x12
C  [libortools.dylib+0x5af9f3]  _ZN19operations_research12RoutingModelD2Ev+0x813
C  [libjniortools.dylib+0x49ed6]  Java_com_google_ortools_constraintsolver_mainJNI_delete_1RoutingModel+0x16
j  com.google.ortools.constraintsolver.mainJNI.delete_RoutingModel(J)V+0

Anything else we should know about your project / environment
This can be worked around by calling RoutingModel.delete() before GC

This seems to be the underlying issue of #1468, #2178 and #2091

@Mizux Mizux self-assigned this Mar 26, 2021
@Mizux Mizux added Bug Lang: Java Java wrapper issue Solver: Routing - break Routing break related issue labels Mar 26, 2021
@Mizux Mizux added this to the v9.0 milestone Mar 26, 2021
@Mizux Mizux modified the milestones: v9.0, v9.1 Apr 13, 2021
@Mizux Mizux added Solver: Routing Uses the Routing library and the original CP solver and removed Solver: Routing - break Routing break related issue labels May 20, 2021
@Mizux
Copy link
Collaborator

Mizux commented Sep 10, 2021

Our Java wrappers fail to correctly capture any std::function (e.g. routing evaluators).
Java tests have always been flaky for years...

My understanding, is sometime the garbage collector will not be run by the same thread than the one which register the std::function<> so the GlobalRefGuard class shouldn't capture the JNIEnv pointer.

Indeed, the JNIEnv pointer is thread specific, so upon destruction of the GlobalRefGuard (ed to delete the GlobalRef) we may exercise lots of undefined behaviour leading to crash (and random dump traces)

This patch fix it by capturing the JavaVM pointer instead which is thread agnostic.

"The JNI interface pointer is only valid in the current thread"

src: https://docs.oracle.com/en/java/javase/16/docs/specs/jni/design.html#jni-interface-functions-and-pointers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Lang: Java Java wrapper issue Solver: Routing Uses the Routing library and the original CP solver
Projects
None yet
Development

No branches or pull requests

2 participants