-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing: test process hangs beyond -timeout
if a child process holds I/O streams open
#24050
Comments
It only hangs 60 seconds on my machine.
go/src/cmd/go/internal/test/test.go Line 1271 in f6c6781
And Line 275 in f6c6781
This
After the |
Yes in the example provided it only hangs for 60 seconds because the child process exits. In the test where I first found this issue the child process never exits so it hangs forever. |
I don't think this issue is a regression introduced in Go 1.10. This difference was introduced in bd95f88. If the The timeout doesn't work because it isn't handled by |
Change https://golang.org/cl/97497 mentions this issue: |
See also #23019. |
@gopherbot please open backport tracking issues. This might be a 1.10 regression, or also a 1.9 issue. |
Backport issue(s) opened: #25042 (for 1.10), #25043 (for 1.9). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
In go 1.10.1, these seem to make "go test" hang sometimes. golang/go#24050 No issue # Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tclegg@veritasgenetics.com>
Also, before anyone spends significant time into a fix, I think we could just fix #23019 instead. It would save far more time in the long run, because it's a change we likely want to do anyway and should fix other cascading bugs. But it's also a bit more controversial. |
I'm really not sure about #23019: the copying is a symptom (of child processes left running with open file handles), not the root cause here. I think the main problem in this case is that the child process is not being terminated when the test process is. I think However, I'm not sure what the most appropriate way to achieve that would be. I could imagine:
And I'm not even sure where to start on Windows, since I assume there is no |
I think you're right, we should terminate all children processes. It's a bit alarming that that's not the default behavior, and more so that there doesn't seem to be a portable way to do it. You could blame this on the written test, since technically I could wrap exec calls with a timeout from Shouldn't |
I don't think the |
But yes: I think it would be handy for the |
The problem with a general "signal process and its children" mechanism on Unix systems is that the only options are "send signal to process" and "send signal to thread" and "send signal to process group". And we definitely don't want to start every new process in a different process group, as that will have surprising effects on the use of That said we could in principle invoke the For testing specifically, it might be somewhat acceptable for |
Interesting, I wasn't thinking of the implications of doing this in general. I agree that the use case for testing is pretty narrow, especially because I don't think we should ever leak processes or run in the background. |
See previously #28039. |
Change https://go.dev/cl/400877 mentions this issue: |
There is an issue where 'go test' will hang after the tests complete if a test starts a sub-process that does not exit (see #24050). However, go test only exhibits that behavior when a package name is explicitly passed as an argument. If 'go test' is invoked without any package arguments then the package in the working directory is assumed, however in that case (and only that case) os.Stdout is used as the test process's cmd.Stdout, which does *not* cause 'go test' wait for the sub-process to exit (see #23019). This change wraps os.Stdout in an io.Writer struct in this case, hiding the *os.File from the os/exec package, causing cmd.Wait to always wait for the full output from the test process and any of its sub-processes. In other words, this makes 'go test' exhibit the same behavior as 'go test .' (or 'go test ./...' and so on). Update #23019 Update #24050 Change-Id: Ica09bf156f3b017f9a31aad91ed0f16a7837195b Reviewed-on: https://go-review.googlesource.com/c/go/+/400877 Reviewed-by: Bryan Mills <bcmills@google.com> Run-TryBot: Andrew Gerrand <adg@golang.org> Auto-Submit: Andrew Gerrand <adg@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Andrew Gerrand <adg@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>
This just bit me as well. Ran through a fair few hours of GitHub CI time before I noticed something was wrong. (Tip: set For those finding this thread and hoping for some code to copy/paste to work around the issue, here's the suggestion from @mvdan above, spelled out. Change: cmd := exec.Command("YOUR_PROGRAM") to something along these lines: ctx := context.Background()
deadline, ok := t.Deadline()
if ok {
// Give ourselves 100ms to kill the command before the actual deadline arrives.
deadline = deadline.Add(-100 * time.Millisecond)
var cancel func()
ctx, cancel = context.WithDeadline(ctx, deadline)
t.Cleanup(cancel)
}
cmd := exec.CommandContext(ctx, "YOUR_PROGRAM") (@mvdan @bcmills @ianlancetaylor feel free to edit this comment directly if you'd suggest a different formulation) |
@josharian, if your subprocess is a Go program I suggest sending (My proposal #50436 aims to make that somewhat smoother, but the fact that |
Change https://go.dev/cl/456116 mentions this issue: |
I've been thinking about this some more. Notably, it is possible to reproduce this behavior even for a passing test, not just a failing one: the test may start and leak a subprocess, and then return from all of the I don't think tests should be in the habit of orphaning subprocesses, and perhaps we should fail (or terminate the subprocesses) of tests that do so. On the other hand, that's a much harder problem, and programs that start subprocesses that may run indefinitely should arrange for those processes to be terminated anyway — whether by using (Note that in the integration test, I used a periodic write to So, for now I think we should set a |
-timeout
if a child process holds I/O streams open
Change https://go.dev/cl/464555 mentions this issue: |
Prior to CL 456116 we had an arbitrary 5-second delay after a test times out before we kill the test. In CL 456116, I reused that arbitrary 5-second delay as the WaitDelay as well, but on slower builders it does not seem to be generous enough. Instead of hard-coding the delay, for tests with a finite timout we now use a hard-coded fraction of the overall timeout. That will probably give delays that are longer than strictly necessary for very long timeouts, but if the user is willing to wait for a very long timeout they can probably wait a little longer for I/O too. Fixes #58230. Updates #24050. Change-Id: Ifbf3e576c034c721aa00cd17bf88563474b09955 Reviewed-on: https://go-review.googlesource.com/c/go/+/464555 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Auto-Submit: Bryan Mills <bcmills@google.com>
Prior to CL 456116 we had an arbitrary 5-second delay after a test times out before we kill the test. In CL 456116, I reused that arbitrary 5-second delay as the WaitDelay as well, but on slower builders it does not seem to be generous enough. Instead of hard-coding the delay, for tests with a finite timout we now use a hard-coded fraction of the overall timeout. That will probably give delays that are longer than strictly necessary for very long timeouts, but if the user is willing to wait for a very long timeout they can probably wait a little longer for I/O too. Fixes golang#58230. Updates golang#24050. Change-Id: Ifbf3e576c034c721aa00cd17bf88563474b09955 Reviewed-on: https://go-review.googlesource.com/c/go/+/464555 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Auto-Submit: Bryan Mills <bcmills@google.com>
What version of Go are you using (
go version
)?go version go1.10 linux/amd64
What operating system and processor architecture are you using (
go env
)?What did you do?
After upgrading to 1.10 we had one test that started to hang intermittently. The test in question starts a child process which it kills by canceling a context object at the end of the test method. It does not do an explicit
cmd.Wait()
.Here is a minimal test case that demonstrates the problem:
https://play.golang.org/p/8rq41A5Khsm
I can get this to hang consistently by running it in a bash while loop:
If I explicitly call
cmd.Wait()
the test does not hang. If I don't attach the child process' Stdout and Stderr to os.Std{out,err} the test does not hang.On 1.9.4 the test does not hang.
Its also interesting that even though I specified
-timeout 5s
the test runner hangs forever.The text was updated successfully, but these errors were encountered: