Skip to content

Conversation

@XrXr
Copy link

@XrXr XrXr commented Dec 10, 2024

I got a SEGV core dump with the stack trace as follows:

#0  __pthread_kill_implementation (no_tid=0, signo=14, threadid=0) at ./nptl/pthread_kill.c:50
#1  __pthread_kill_internal (signo=14, threadid=0) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=0, signo=14) at ./nptl/pthread_kill.c:89
#3  <signal handler called> () at /lib/x86_64-linux-gnu/libc.so.6
#4  stackprof_start (argc=<optimized out>, argv=<optimized out>, self=<optimized out>) at stackprof.c:228
#5  vm_call_cfunc_with_frame_
...

Notice that threadid=0 in the top frame -- the SEGV comes from inside
libc as it tries to dereference threadid.

The signal comes from stackprof's signal handler:

if (pthread_self() != _stackprof.target_thread) {
    pthread_kill(_stackprof.target_thread, sig);
    return;
}

During stackprof_start(), _stackprof.target_thread is 0.

You can recreate the stack trace in the crash with a program that does
pthread_kill(0, SIGALRM):

#include <signal.h>

int
main(void)
{
   pthread_kill(0, SIGALRM);
}

Only set running after target_thread is set to avoid this crash in
case the timer expires after settimer() but before setting
target_thread.

Also, since the ordering is important here, make running
volatile sig_atomic_t to prevent the compiler from doing unwanted
reordering.

I got a SEGV core dump with the stack trace as follows:

    #0  __pthread_kill_implementation (no_tid=0, signo=14, threadid=0) at ./nptl/pthread_kill.c:50
    tmm1#1  __pthread_kill_internal (signo=14, threadid=0) at ./nptl/pthread_kill.c:78
    tmm1#2  __GI___pthread_kill (threadid=0, signo=14) at ./nptl/pthread_kill.c:89
    tmm1#3  <signal handler called> () at /lib/x86_64-linux-gnu/libc.so.6
    tmm1#4  stackprof_start (argc=<optimized out>, argv=<optimized out>, self=<optimized out>) at stackprof.c:228
    tmm1#5  vm_call_cfunc_with_frame_
    ...

Notice that `threadid=0` in the top frame -- the SEGV comes from inside
libc as it tries to dereference `threadid`.

The signal comes from stackprof's signal handler:

    if (pthread_self() != _stackprof.target_thread) {
        pthread_kill(_stackprof.target_thread, sig);
        return;
    }

During stackprof_start(), `_stackprof.target_thread` is 0.

You can recreate the stack trace in the crash with a program that does
`pthread_kill(0, SIGALRM)`:

    #include <signal.h>

    int
    main(void)
    {
       pthread_kill(0, SIGALRM);
    }

Only set `running` after target_thread is set to avoid this crash in
case the timer expires after `settimer()` but before setting
`target_thread`.

Also, since the ordering is important here, make `running`
`volatile sig_atomic_t` to prevent the compiler from doing unwanted
reordering.
@XrXr
Copy link
Author

XrXr commented Dec 10, 2024

Just in case you're worried about the codegen consequences of switching to violatile sig_atomic_t (the standard thing to do for things checked in signal handlers), it gets compiled down to regular memory operations on x64:

(gdb) disassemble/s stackprof_start
228	    // Do this last since timer set above might expires
229	    _stackprof.running = 1;
   0x0000000000003824 <+260>:	mov    DWORD PTR [rip+0x4c92],0x1        # 0x84c0 <_stackprof>

It's mainly to stop the optimizer from going rogue.

@XrXr
Copy link
Author

XrXr commented Dec 11, 2024

Ah, this is fixing the same problem as #226. #226 seems more comprehensive, and should fix the problem I witnessed, too.

@XrXr XrXr closed this Dec 18, 2024
@froydnj
Copy link
Contributor

froydnj commented Dec 18, 2024

Thank you for noticing #226! Anytime volatile is avoided, the world gets just a little bit better. 😄

@XrXr
Copy link
Author

XrXr commented Dec 18, 2024

In some distant future, everyone will be on C11 and everyone will have access to standard atomics and the standard memory model :D

@XrXr XrXr deleted the signal-during-init branch December 18, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants