-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore SEGV during profiler unwind on Unix #28291
Conversation
src/signals-unix.c
Outdated
bt_size_cur += rec_backtrace_ctx((uintptr_t*)bt_data_prof + bt_size_cur, | ||
bt_size_max - bt_size_cur - 1, signal_context); | ||
} else { | ||
jl_safe_printf("WARNING: profiler attempt to access an invalid memory location\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any risk of printing this message on each profile interval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just mimicking
Line 468 in 614d917
jl_safe_printf("WARNING: profiler attempt to access an invalid memory location\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
We already have a mechanism for this. Line 1023 in 614d917
|
OK, changed the approach. Will squash on merge. |
src/signals-unix.c
Outdated
@@ -225,6 +225,18 @@ static void segv_handler(int sig, siginfo_t *info, void *context) | |||
jl_ptls_t ptls = jl_get_ptls_states(); | |||
assert(sig == SIGSEGV || sig == SIGBUS); | |||
|
|||
// if we're profiling, this segfault is likely caused by the unwinder. | |||
// ignore the signal and jump back to where we came from. | |||
if (running && ptls->safe_restore) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this? Does it not work with the condition below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just being conservative, only affecting the case where the profiler is running. Do it unconditionally then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean you shouldn't need any code here in the segfault handler. Have you tested that it doesn't work without this but with the condition a few lines below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Or in another word, it is meant to be doing this unconditionally)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, no it doesn't work, looking closer it triggers a segfault in jl_call_in_ctx
(via jl_throw_in_ctx..., jl_stackovf_exception, ...)
).
Inferring from the function names, doesn't that behave differently from the plain longjmp
I do here, would I need to catch an exception then somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, segfault in jl_call_in_ctx
? Did you get a NULL
context
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I guess this thread doesn't have signal_stack
allocated. I believe this should fix it (you can merge this with the falllback ifdef
below if you want).
diff --git a/src/signals-unix.c b/src/signals-unix.c
index 0fafe121cd..8da89b5fc4 100644
--- a/src/signals-unix.c
+++ b/src/signals-unix.c
@@ -89,6 +89,14 @@ static void jl_call_in_ctx(jl_ptls_t ptls, void (*fptr)(void), int sig, void *_c
// checks that the syscall is made in the signal handler and that
// the ucontext address is valid. Hopefully the value of the ucontext
// will not be part of the validation...
+ if (!ptls->signal_stack) {
+ sigset_t sset;
+ sigemptyset(&sset);
+ sigaddset(&sset, sig);
+ sigprocmask(SIG_UNBLOCK, &sset, NULL);
+ fptr();
+ return;
+ }
uintptr_t rsp = (uintptr_t)ptls->signal_stack + sig_stack_size;
assert(rsp % 16 == 0);
#if defined(_OS_LINUX_) && defined(_CPU_X86_64_)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that works. Thanks!
Why isn't this used for OSX btw? Mimicking profiler_segv_handler
which does thread_set_state
is what got me here in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't handle segfault the same way on OSX. I don't really know if the two ways could be used together.
@yuyichao any further comments? |
Unix equivalent of #4159. Unpolished, works for my use case (libcuda tripping up libunwind).