-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: signal race condition #14571
Comments
It seems like the question here is whether os/signal.Stop is guaranteed to be an atomic operation. I.e., if we call Stop(c) and c was the last channel registered for a particular signal, do we guarantee that concurrently receiving that signal will always either send to c or trigger the default signal logic? My 2c is we should. If we call disableSignal in os/signal.Stop, then we should process any queued signals before deregistering c and returning. |
After looking at the runtime and os/signal code for a few minutes I can't figure out what we could change. I think we are doing the right thing. Unix signals are inherently racy. Sending a signal to a process sets a bit. When the process checks that bit, it takes some action. Sending two signals to a process simultaneously just sets that bit twice. If the process doesn't check the bit between the two signal sends, then it will only receive one signal. I'm not completely sure but I suspect that is what is happening with this program. |
@ianlancetaylor The race here isn't about receiving two signals and losing one of them. The problem is that if os/signal.Stop grabs handlers.Lock, and then we receive a SIGINT before os/signal.Stop is able to call disableSignal(SIGINT), then we've accepted that signal instead of letting the default OS behavior exuecute. But because we're holding handlers.Lock (and possibly have already deregistered c), we prevent the os/signal.loop goroutine from calling os/signal.process to dispatch the signal to c. |
Put differently:
(Assuming there are no other relevant signal-impacting calls.) The problem is the "change back to exiting" is not atomic. In particular, we're logically deregistering c from receiving SIGINT events before we notify the kernel to stop sending them to us, which @tassadar's test program demonstrates via the "missed signal" messages. Their test program concurrently sends itself SIGINT while calling signal.Stop(c). If SIGINT arrived first, we expect to see "got signal". If SIGINT arrived after, we expect the process to exit (due to SIGINT's default behavior). But occasionally their test program sees neither of these happen (i.e., "missing signal"). |
Thanks for the explanation. Given the current code, it sounds like |
Ping @ianlancetaylor |
CL https://golang.org/cl/46003 mentions this issue. |
1. What version of Go are you using (
go version
)?2. What operating system and processor architecture are you using (
go env
)?go version go1.6 linux/amd64
3. What did you do?
I have summoner process with large number of workers. When disconnected from the server, worker will exit its main loop and call
signal.Stop()
on the SIGINT it had previously set-up withsignal.Notify()
. At the same time, summoner will attempt to kill it withproc.Signal(os.Interrupt)
.Sometimes, the interrupt signal gets lost - won't arrive to the channel and will not crash the program either. Below is an example I can replicate the race with, run it over and over and you will see some "signal missed" soon:
4. What did you expect to see?
I expected the signal not to get lost. The documentation only states that "When Stop returns, it is guaranteed that c will receive no more signals." , so I'm not sure whether it is the right expectation, if it isn't, then the fix is fairly easy - just keep the signal channel registered the whole time.
5. What did you see instead?
Some of the signals get lost.
The text was updated successfully, but these errors were encountered: