-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/signal: notify should filter out runtime-generated SIGURG #37942
Comments
Right. It was an explicit decision to pass these through to the program, in case the program is using SIGURG. Unfortunately, the runtime can't tell if a signal is "runtime generated" and thus can't filter it out. Because the kernel combines signals, it's not even well-specified what it would mean for a received signal to be runtime-generated because the kernel could have combined a runtime-generated signal with a signal from another source. @changkun, I don't really understand your program. First, I don't understand the need for |
@minux I don't see how it is possible even in principle for os/signal to filter out signals generated from the runtime package. Do you have any suggestions as to how that could be done? |
|
That would not work if the kernel combines signals with different |
As far as I can tell, the Linux 5.3.12 kernel does combine signals with a different if (legacy_queue(pending, sig))
goto ret; and static inline bool legacy_queue(struct sigpending *signals, int sig)
{
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
} where |
Yeah, it's very difficult to filter out runtime-generated SIGURG without
also occasionally remove a real SIGURG.
I guess, a slightly better solutions is to make sure the runtime only uses
SIGURG when it's necessary.
My program can tolerate occasional spurious SIGURG, but in the current
runtime implementation, it's occurring too frequently.
(Just like the example program I posted, it contains enough preempt points
that forced preemption shouldn't be necessary.)
Perhaps we can repurpose this issue to "runtime: only use forced preemption
when necessary"? Thoughts?
|
Probably a Go1.14 issue; see golang/go#37942
I found a lot of SIGURGs will timeout the runtime/regex testes of mips64le. The origin test of runtime only takes 43s to finish, now it's 600s.
|
I've considered delaying the first attempt to use signals until some time after we've tried cooperative preemption and it's failed. Most of the time cooperative preemption responds very quickly. I'm not sure what to set that threshold to, though, since every millisecond of delay there is a millisecond added to STW if there's anything not cooperating. |
Could you also provide a build option to enable (or never enable) preemptive? Runtime env flags would presumably override that... |
That option already exists ( |
Um, I meant a compiler flag to set the default scheduler, which the env var could override. |
…14 in git env As seen in trouble shooting go-gitea#11032 the new feature of Go 1.14 is causing several second delays in startup in certain situations. Debugging shows it spending several seconds handling SIGURG commands during init: ``` 6922:04:51.984234 trace init() ./modules/queue/unique_queue_wrapped.go remote: ) = 69 <0.000012> remote: [pid 15984] 22:04:51 write(1, "\ttime taken: 236.761\302\265s\n\n", 25 time taken: 236.761µs remote: remote: ) = 25 <0.000011> remote: [pid 15984] 22:04:51 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} --- remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} --- remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} --- ``` This causes up to 20 seconds added to a push in some cases as it happens for each call of the gitea hook command. This is likely the cause of go-gitea#10661 as well and would start to effect users once we release 1.12 which would be the first release compiled with Go 1.14. I suspect this is just a slight issue with the upstream implementatation as there have been a few very similar bugs fixed and reported: golang/go#37741 golang/go#37942 We should revisit this in the future and see if a newer version of Go has solved it, but for now disable this option in the environment that gitea hook runs in to avoid it.
…14 in git env (#11237) As seen in trouble shooting #11032 the new feature of Go 1.14 is causing several second delays in startup in certain situations. Debugging shows it spending several seconds handling SIGURG commands during init: ``` 6922:04:51.984234 trace init() ./modules/queue/unique_queue_wrapped.go remote: ) = 69 <0.000012> remote: [pid 15984] 22:04:51 write(1, "\ttime taken: 236.761\302\265s\n\n", 25 time taken: 236.761µs remote: remote: ) = 25 <0.000011> remote: [pid 15984] 22:04:51 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} --- remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} --- remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} --- ``` This causes up to 20 seconds added to a push in some cases as it happens for each call of the gitea hook command. This is likely the cause of #10661 as well and would start to effect users once we release 1.12 which would be the first release compiled with Go 1.14. I suspect this is just a slight issue with the upstream implementatation as there have been a few very similar bugs fixed and reported: golang/go#37741 golang/go#37942 We should revisit this in the future and see if a newer version of Go has solved it, but for now disable this option in the environment that gitea hook runs in to avoid it.
…14 in git env (go-gitea#11237) As seen in trouble shooting go-gitea#11032 the new feature of Go 1.14 is causing several second delays in startup in certain situations. Debugging shows it spending several seconds handling SIGURG commands during init: ``` 6922:04:51.984234 trace init() ./modules/queue/unique_queue_wrapped.go remote: ) = 69 <0.000012> remote: [pid 15984] 22:04:51 write(1, "\ttime taken: 236.761\302\265s\n\n", 25 time taken: 236.761µs remote: remote: ) = 25 <0.000011> remote: [pid 15984] 22:04:51 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} --- remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} --- remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} --- ``` This causes up to 20 seconds added to a push in some cases as it happens for each call of the gitea hook command. This is likely the cause of go-gitea#10661 as well and would start to effect users once we release 1.12 which would be the first release compiled with Go 1.14. I suspect this is just a slight issue with the upstream implementatation as there have been a few very similar bugs fixed and reported: golang/go#37741 golang/go#37942 We should revisit this in the future and see if a newer version of Go has solved it, but for now disable this option in the environment that gitea hook runs in to avoid it.
When Docker/containerd binaries are compiled with Go 1.15 the containers generate many signal 23 (SIGURG) events which flood monitoring systems: kubernetes/kops#10388 The SIGURG signal does not kill the process but is generated by Go runtime scheduling: https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md) Because the Go runtime does not know if the process expects external SIGURG signals, the signal is not filtered out but reported to the process: golang/go#37942 The process has to filter this signal out itself before forwarding it to, e.g,. children processes or logs. This change was introduced with the Go 1.15 update (actually Go 1.14 but Flatcar skipped that for Stable), however, while containerd has some workarounds in place, e.g., in containerd/containerd#4532 but there are still areas where the signal is not handled correctly. Until this is the case, downgrade to use the Go 1.13 compiler for Docker/containerd binaries. See flatcar/Flatcar#315
workaround for golang/go#37942 Signed-off-by: Donoban <donoban@riseup.net>
This patch fixes the problem, which was occurring when the watchdog process received a signal SIGURG from go runtime[1][2] and passed it to the forked process before the exec call. Fixed by adding a call of the Ignore function. Receiving this signal is unexpected, cause tt doesn't work with sockets at all. [1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md [2] - golang/go#37942 Closes #325
This patch fixes the problem, which was occurring when the watchdog process received a signal SIGURG from go runtime[1][2] and passed it to the forked process before the exec call. Fixed by adding a call of the Ignore function. Receiving this signal is unexpected, cause tt doesn't work with sockets at all. [1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md [2] - golang/go#37942 Closes #325
This patch fixes the problem, which was occurring when the watchdog process received a signal SIGURG from go runtime[1][2] and passed it to the forked process before the exec call. Fixed by adding a call of the Ignore function. Receiving this signal is unexpected, cause tt doesn't work with sockets at all. [1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md [2] - golang/go#37942 Closes #325
This patch fixes the problem, which was occurring when the watchdog process received a signal SIGURG from go runtime[1][2] and passed it to the forked process before the exec call. Fixed by adding a call of the Ignore function. Receiving this signal is unexpected, cause tt doesn't work with sockets at all. [1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md [2] - golang/go#37942 Closes #325
This patch fixes the problem, which was occurring when the watchdog process received a signal SIGURG from go runtime[1][2] and passed it to the forked process before the exec call. Fixed by adding a call of the Ignore function. Receiving this signal is unexpected, cause tt doesn't work with sockets at all. [1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md [2] - golang/go#37942 Closes #325
This patch fixes the problem, which was occurring when the watchdog process received a signal SIGURG from go runtime[1][2] and passed it to the forked process before the exec call. Fixed by adding a call of the Ignore function. Receiving this signal is unexpected, cause tt doesn't work with sockets at all. [1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md [2] - golang/go#37942 Closes #325
This patch fixes the problem, which was occurring when the watchdog process received a signal SIGURG from go runtime[1][2] and passed it to the forked process before the exec call. Fixed by adding a call of the Ignore function. Receiving this signal is unexpected, cause tt doesn't work with sockets at all. [1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md [2] - golang/go#37942 Closes #325
This patch fixes the problem, which was occurring when the watchdog process received a signal SIGURG from go runtime[1][2] and passed it to the forked process before the exec call. Fixed by adding a call of the Ignore function. Receiving this signal is unexpected, cause tt doesn't work with sockets at all. [1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md [2] - golang/go#37942 Closes #325
This patch fixes the problem, which was occurring when the watchdog process received a signal SIGURG from go runtime[1][2] and passed it to the forked process before the exec call. Fixed by adding a call of the Ignore function. Receiving this signal is unexpected, cause tt doesn't work with sockets at all. [1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md [2] - golang/go#37942 Closes #325
This patch fixes the problem, which was occurring when the watchdog process received a signal SIGURG from go runtime[1][2] and passed it to the forked process before the exec call. Fixed by adding a call of the Ignore function. Receiving this signal is unexpected, cause tt doesn't work with sockets at all. [1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md [2] - golang/go#37942 Closes #325
This patch fixes the problem, which was occurring when the watchdog process received a signal SIGURG from go runtime[1][2] and passed it to the forked process before the exec call. Fixed by adding a call of the Ignore function. Receiving this signal is unexpected, cause tt doesn't work with sockets at all. [1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md [2] - golang/go#37942 Closes #325
This patch fixes the problem, which was occurring when the watchdog process received a signal SIGURG from go runtime[1][2] and passed it to the forked process before the exec call. Fixed by adding a call of the Ignore function. Receiving this signal is unexpected, cause tt doesn't work with sockets at all. [1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md [2] - golang/go#37942 Closes #325
This patch fixes the problem, which was occurring when the watchdog process received a signal SIGURG from go runtime[1][2] and passed it to the forked process before the exec call. Fixed by adding a call of the Ignore function. Receiving this signal is unexpected, cause tt doesn't work with sockets at all. [1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md [2] - golang/go#37942 Closes #325
This patch fixes the problem, which was occurring when the watchdog process received a signal SIGURG from go runtime[1][2] and passed it to the forked process before the exec call. Fixed by adding a call of the Ignore function. Receiving this signal is unexpected, cause tt doesn't work with sockets at all. [1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md [2] - golang/go#37942 Closes #325
This patch fixes the problem, which was occurring when the watchdog process received a signal SIGURG from go runtime[1][2] and passed it to the forked process before the exec call. Fixed by adding a call of the Ignore function. Receiving this signal is unexpected, cause tt doesn't work with sockets at all. [1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md [2] - golang/go#37942 Closes #325
Since go 1.14, SIGURG is emitted internally by the fo runtime, see: - golang/go#37942 - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md#other-considerations "Choosing a signal" This causes two problems: - This will interfere with processes that use SIGURG for a real purpose (though this is unlikely) - If SIGURG is emitted after the child process has exited but before secrets-init, there will be a spurious error message (see doitintl#16)
Since go 1.14, SIGURG is emitted internally by the fo runtime, see: - golang/go#37942 - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md#other-considerations "Choosing a signal" This causes two problems: - This will interfere with processes that use SIGURG for a real purpose (though this is unlikely) - If SIGURG is emitted after the child process has exited but before secrets-init, there will be a spurious error message (see doitintl#16)
Since go 1.14, SIGURG is emitted internally by the fo runtime, see: - golang/go#37942 - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md#other-considerations "Choosing a signal" This causes two problems: - This will interfere with processes that use SIGURG for a real purpose (though this is unlikely) - If SIGURG is emitted after the child process has exited but before secrets-init, there will be a spurious error message (see doitintl#16)
Not sure why, but "ps" seems not to be able to handle the URG signal, returning an error status code. As per some comments in github, looks like the golang runtime might have sent the socket related URG signal. See: golang/go#37942 https://stackoverflow.com/questions/72568024/go-preempt-can-exit-out-of-a-loop The ps program shows this output: [command.go: 72] stderr: Signal 23 (URG) caught by ps (3.3.15). Apparently, ps works well, but it just returns with an error status code due to the received SIGURG. A retry mechanism doesn't work well here: in my tests, I saw the ps command failing up to 5 times in a row, so I think it's better trapping the signal with the bash built-in command "trap". This is the new command sent to the node's shell to get all the processes: const command = "trap \"\" SIGURG ; ps -e -o pidns,pid,args" Also, I refactored the way the pids from the container namespaces are retrieved, as the nsenter used by the previous code is not actually needed. Regular "lsns" and "ps" commands running in the node's debug container (with chroot /root) are good enough to get the final lists of pids from processes that are running in the same namespace as the container's one. In fact, as per the stdout logs of the nsenter commands, the "ps" that was running with nsenter was showing all the processes running in that node. After the refactor, the schedulling UT TestGetProcessCPUScheduling was failing because it was mocking a function that was not used inside the target function. I tried fixing it, but then I decided to remove it as I realized it was not covering more code than the existing UT TestExistsSchedulingPolicyAndPriority.
Not sure why, but "ps" seems not to be able to handle the URG signal, returning an error status code. As per some comments in github, looks like the golang runtime might have sent the socket related URG signal. See: golang/go#37942 https://stackoverflow.com/questions/72568024/go-preempt-can-exit-out-of-a-loop The ps program shows this output: [command.go: 72] stderr: Signal 23 (URG) caught by ps (3.3.15). Apparently, ps works well, but it just returns with an error status code due to the received SIGURG. A retry mechanism doesn't work well here: in my tests, I saw the ps command failing up to 5 times in a row, so I think it's better trapping the signal with the bash built-in command "trap". This is the new command sent to the node's shell to get all the processes: const command = "trap \"\" SIGURG ; ps -e -o pidns,pid,args" Also, I refactored the way the pids from the container namespaces are retrieved, as the nsenter used by the previous code is not actually needed. Regular "lsns" and "ps" commands running in the node's debug container (with chroot /root) are good enough to get the final lists of pids from processes that are running in the same namespace as the container's one. In fact, as per the stdout logs of the nsenter commands, the "ps" that was running with nsenter was showing all the processes running in that node. After the refactor, the schedulling UT TestGetProcessCPUScheduling was failing because it was mocking a function that was not used inside the target function. I tried fixing it, but then I decided to remove it as I realized it was not covering more code than the existing UT TestExistsSchedulingPolicyAndPriority.
Also, we don't proxy SIGURG (Golang uses it internally for waking threads, so Go processes get it constantly (see [1] for more details). [1] golang/go#37942 Signed-off-by: Matt Heon <mheon@redhat.com>
This program, when running under go 1.14, will show a huge number of
SIGURG
are received even though none should be (no network IO, no explicit kills).https://play.golang.org/p/x7hFjPZnrg5
https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md changes to use SIGURG to preempt goroutines, which is fine, but I think
os/signal.Notify
should filter out those runtime generated SIGURGs.The reason I found this is that I have a program that happens to use
SIGURG
as a custom signal likeSIGUSR1
, which works fine with previous Go releases. I understand that it should probably useSIGUSR1
, but still I thinkos/signal
should hide any signals generated by the runtime as it's irrelevant for the user and an implementation detail./cc @aclements thoughts?
The text was updated successfully, but these errors were encountered: