-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Foreign threads: GC runs on cancelled thread, causes segfault #47590
Labels
Comments
vtjnash
added a commit
that referenced
this issue
Jan 11, 2023
Closes #47590 (pthread_cancel still forbidden though, since async mode will corrupt the process, and synchronously tested is just a slow implementation of a boolean) Refs #47201 (only deals with thread exit, not other case where this is an issue, like cfunction exit and gc-safe-leave) May help #46537, by blocking jl_wake_libuv before uv_library_shutdown, and other tweaks to GC mode. For example: [4011824] signal (6.-6): Aborted gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) uv__async_send at /workspace/srcdir/libuv/src/unix/async.c:198 uv_async_send at /workspace/srcdir/libuv/src/unix/async.c:73 jl_wake_libuv at /data/vtjnash/julia1/src/jl_uv.c:44 [inlined] JL_UV_LOCK at /data/vtjnash/julia1/src/jl_uv.c:64 [inlined] ijl_iolock_begin at /data/vtjnash/julia1/src/jl_uv.c:72 iolock_begin at ./libuv.jl:48 [inlined] _trywait at ./asyncevent.jl:140 wait at ./asyncevent.jl:155 [inlined] profile_printing_listener at /data/vtjnash/julia1/usr/share/julia/stdlib/v1.10/Profile/src/Profile.jl:39 jfptr_YY.3_58617 at /data/vtjnash/julia1/usr/lib/julia/sys.so (unknown line) _jl_invoke at /data/vtjnash/julia1/src/gf.c:2665 [inlined] ijl_apply_generic at /data/vtjnash/julia1/src/gf.c:2866 jl_apply at /data/vtjnash/julia1/src/julia.h:1870 [inlined] start_task at /data/vtjnash/julia1/src/task.c:1093 Aborted Fixes #37400
vtjnash
added a commit
that referenced
this issue
Jan 11, 2023
Closes #47590 (pthread_cancel still forbidden though, since async mode will corrupt the process, and synchronously tested is just a slow implementation of a boolean) Refs #47201 (only deals with thread exit, not other case where this is an issue, like cfunction exit and gc-safe-leave) May help #46537, by blocking jl_wake_libuv before uv_library_shutdown, and other tweaks to GC mode. For example: [4011824] signal (6.-6): Aborted gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) uv__async_send at /workspace/srcdir/libuv/src/unix/async.c:198 uv_async_send at /workspace/srcdir/libuv/src/unix/async.c:73 jl_wake_libuv at /data/vtjnash/julia1/src/jl_uv.c:44 [inlined] JL_UV_LOCK at /data/vtjnash/julia1/src/jl_uv.c:64 [inlined] ijl_iolock_begin at /data/vtjnash/julia1/src/jl_uv.c:72 iolock_begin at ./libuv.jl:48 [inlined] _trywait at ./asyncevent.jl:140 wait at ./asyncevent.jl:155 [inlined] profile_printing_listener at /data/vtjnash/julia1/usr/share/julia/stdlib/v1.10/Profile/src/Profile.jl:39 jfptr_YY.3_58617 at /data/vtjnash/julia1/usr/lib/julia/sys.so (unknown line) _jl_invoke at /data/vtjnash/julia1/src/gf.c:2665 [inlined] ijl_apply_generic at /data/vtjnash/julia1/src/gf.c:2866 jl_apply at /data/vtjnash/julia1/src/julia.h:1870 [inlined] start_task at /data/vtjnash/julia1/src/task.c:1093 Aborted Fixes #37400
vtjnash
added a commit
that referenced
this issue
Jan 11, 2023
Closes #47590 (pthread_cancel still forbidden though, since async mode will corrupt the process, and synchronously tested is just a slow implementation of a boolean) Refs #47201 (only deals with thread exit, not other case where this is an issue, like cfunction exit and gc-safe-leave) May help #46537, by blocking jl_wake_libuv before uv_library_shutdown, and other tweaks to GC mode. For example: [4011824] signal (6.-6): Aborted gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) uv__async_send at /workspace/srcdir/libuv/src/unix/async.c:198 uv_async_send at /workspace/srcdir/libuv/src/unix/async.c:73 jl_wake_libuv at /data/vtjnash/julia1/src/jl_uv.c:44 [inlined] JL_UV_LOCK at /data/vtjnash/julia1/src/jl_uv.c:64 [inlined] ijl_iolock_begin at /data/vtjnash/julia1/src/jl_uv.c:72 iolock_begin at ./libuv.jl:48 [inlined] _trywait at ./asyncevent.jl:140 wait at ./asyncevent.jl:155 [inlined] profile_printing_listener at /data/vtjnash/julia1/usr/share/julia/stdlib/v1.10/Profile/src/Profile.jl:39 jfptr_YY.3_58617 at /data/vtjnash/julia1/usr/lib/julia/sys.so (unknown line) _jl_invoke at /data/vtjnash/julia1/src/gf.c:2665 [inlined] ijl_apply_generic at /data/vtjnash/julia1/src/gf.c:2866 jl_apply at /data/vtjnash/julia1/src/julia.h:1870 [inlined] start_task at /data/vtjnash/julia1/src/task.c:1093 Aborted Fixes #37400
vtjnash
added a commit
that referenced
this issue
Jan 11, 2023
Closes #47590 (pthread_cancel still forbidden though, since async mode will corrupt the process, and synchronously tested is just a slow implementation of a boolean) Refs #47201 (only deals with thread exit, not other case where this is an issue, like cfunction exit and gc-safe-leave) May help #46537, by blocking jl_wake_libuv before uv_library_shutdown, and other tweaks to GC mode. For example: [4011824] signal (6.-6): Aborted gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) uv__async_send at /workspace/srcdir/libuv/src/unix/async.c:198 uv_async_send at /workspace/srcdir/libuv/src/unix/async.c:73 jl_wake_libuv at /data/vtjnash/julia1/src/jl_uv.c:44 [inlined] JL_UV_LOCK at /data/vtjnash/julia1/src/jl_uv.c:64 [inlined] ijl_iolock_begin at /data/vtjnash/julia1/src/jl_uv.c:72 iolock_begin at ./libuv.jl:48 [inlined] _trywait at ./asyncevent.jl:140 wait at ./asyncevent.jl:155 [inlined] profile_printing_listener at /data/vtjnash/julia1/usr/share/julia/stdlib/v1.10/Profile/src/Profile.jl:39 jfptr_YY.3_58617 at /data/vtjnash/julia1/usr/lib/julia/sys.so (unknown line) _jl_invoke at /data/vtjnash/julia1/src/gf.c:2665 [inlined] ijl_apply_generic at /data/vtjnash/julia1/src/gf.c:2866 jl_apply at /data/vtjnash/julia1/src/julia.h:1870 [inlined] start_task at /data/vtjnash/julia1/src/task.c:1093 Aborted Fixes #37400
vtjnash
added a commit
that referenced
this issue
Jan 13, 2023
Closes #47590 (pthread_cancel still forbidden though, since async mode will corrupt or deadlock the process, and synchronously tested with cancelation disabled whenever this is a lock is just a slow implementation of a boolean) Refs #47201 (only deals with thread exit, not other case where this is an issue, like cfunction exit and gc-safe-leave) May help #46537, by blocking jl_wake_libuv before uv_library_shutdown, and other tweaks to GC mode. For example, avoiding: [4011824] signal (6.-6): Aborted gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) uv__async_send at /workspace/srcdir/libuv/src/unix/async.c:198 uv_async_send at /workspace/srcdir/libuv/src/unix/async.c:73 jl_wake_libuv at /data/vtjnash/julia1/src/jl_uv.c:44 [inlined] JL_UV_LOCK at /data/vtjnash/julia1/src/jl_uv.c:64 [inlined] ijl_iolock_begin at /data/vtjnash/julia1/src/jl_uv.c:72 iolock_begin at ./libuv.jl:48 [inlined] _trywait at ./asyncevent.jl:140 wait at ./asyncevent.jl:155 [inlined] profile_printing_listener at /data/vtjnash/julia1/usr/share/julia/stdlib/v1.10/Profile/src/Profile.jl:39 jfptr_YY.3_58617 at /data/vtjnash/julia1/usr/lib/julia/sys.so (unknown line) _jl_invoke at /data/vtjnash/julia1/src/gf.c:2665 [inlined] ijl_apply_generic at /data/vtjnash/julia1/src/gf.c:2866 jl_apply at /data/vtjnash/julia1/src/julia.h:1870 [inlined] start_task at /data/vtjnash/julia1/src/task.c:1093 Aborted Fixes #37400
vtjnash
added a commit
that referenced
this issue
Jan 13, 2023
Closes #47590 (pthread_cancel still forbidden though, since async mode will corrupt or deadlock the process, and synchronously tested with cancelation disabled whenever this is a lock is just a slow implementation of a boolean) Refs #47201 (only deals with thread exit, not other case where this is an issue, like cfunction exit and gc-safe-leave) May help #46537, by blocking jl_wake_libuv before uv_library_shutdown, and other tweaks to GC mode. For example, avoiding: [4011824] signal (6.-6): Aborted gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) uv__async_send at /workspace/srcdir/libuv/src/unix/async.c:198 uv_async_send at /workspace/srcdir/libuv/src/unix/async.c:73 jl_wake_libuv at /data/vtjnash/julia1/src/jl_uv.c:44 [inlined] JL_UV_LOCK at /data/vtjnash/julia1/src/jl_uv.c:64 [inlined] ijl_iolock_begin at /data/vtjnash/julia1/src/jl_uv.c:72 iolock_begin at ./libuv.jl:48 [inlined] _trywait at ./asyncevent.jl:140 wait at ./asyncevent.jl:155 [inlined] profile_printing_listener at /data/vtjnash/julia1/usr/share/julia/stdlib/v1.10/Profile/src/Profile.jl:39 jfptr_YY.3_58617 at /data/vtjnash/julia1/usr/lib/julia/sys.so (unknown line) _jl_invoke at /data/vtjnash/julia1/src/gf.c:2665 [inlined] ijl_apply_generic at /data/vtjnash/julia1/src/gf.c:2866 jl_apply at /data/vtjnash/julia1/src/julia.h:1870 [inlined] start_task at /data/vtjnash/julia1/src/task.c:1093 Aborted Fixes #37400
vtjnash
added a commit
that referenced
this issue
Jan 14, 2023
Closes #47590 (pthread_cancel still forbidden though, since async mode will corrupt or deadlock the process, and synchronously tested with cancelation disabled whenever this is a lock is just a slow implementation of a boolean) Refs #47201 (only deals with thread exit, not other case where this is an issue, like cfunction exit and gc-safe-leave) May help #46537, by blocking jl_wake_libuv before uv_library_shutdown, and other tweaks to GC mode. For example, avoiding: [4011824] signal (6.-6): Aborted gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) uv__async_send at /workspace/srcdir/libuv/src/unix/async.c:198 uv_async_send at /workspace/srcdir/libuv/src/unix/async.c:73 jl_wake_libuv at /data/vtjnash/julia1/src/jl_uv.c:44 [inlined] JL_UV_LOCK at /data/vtjnash/julia1/src/jl_uv.c:64 [inlined] ijl_iolock_begin at /data/vtjnash/julia1/src/jl_uv.c:72 iolock_begin at ./libuv.jl:48 [inlined] _trywait at ./asyncevent.jl:140 wait at ./asyncevent.jl:155 [inlined] profile_printing_listener at /data/vtjnash/julia1/usr/share/julia/stdlib/v1.10/Profile/src/Profile.jl:39 jfptr_YY.3_58617 at /data/vtjnash/julia1/usr/lib/julia/sys.so (unknown line) _jl_invoke at /data/vtjnash/julia1/src/gf.c:2665 [inlined] ijl_apply_generic at /data/vtjnash/julia1/src/gf.c:2866 jl_apply at /data/vtjnash/julia1/src/julia.h:1870 [inlined] start_task at /data/vtjnash/julia1/src/task.c:1093 Aborted Fixes #37400
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm experimenting with the new foreign thread support, and encountered a case where GC seems to run on a pthread after cancellation. I realize that cancelling threads is Tricky Business, but I hope we can make our scheduler resilient to it (or improve my code to safely do so). Even if actively cancelling threads is rare, threads exiting after their work is done is much more common, and both are pretty much related AFAIK.
Anyway, a MWE:
It's a bit of code, so summarizing the steps:
pthread_join
in order to clean up resources related to the threadAfter these steps, if the GC runs, we get a segfault:
Note that this seems to indicate that the segfault happened during a GC run on thread 2, which is the pthread we just canceled!
Running this code from top level results in a different crash:
On Linux, the crashes is reported as originating from[8905] signal (6.-6): Aborted
; assuming the-6
should be a valid thread ID this does seem like corruption of scheduler state.The workaround for these crashes is to disable the GC around the call to
pthread_testcancel
. The issue looks related to #47185, but callingjl_gc_safe_enter
/jl_gc_safe_leave
aroundpthread_cancel
doesn't seem to help.The text was updated successfully, but these errors were encountered: