testing: test process hangs beyond `-timeout` if a child process holds I/O streams open #24050

psanford · 2018-02-22T20:51:29Z

What version of Go are you using (`go version`)?

go version go1.10 linux/amd64

What operating system and processor architecture are you using (`go env`)?

$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/psanford/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/psanford/projects/go"
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build953969108=/tmp/go-build -gno-record-gcc-switches"

What did you do?

After upgrading to 1.10 we had one test that started to hang intermittently. The test in question starts a child process which it kills by canceling a context object at the end of the test method. It does not do an explicit cmd.Wait().

Here is a minimal test case that demonstrates the problem:

https://play.golang.org/p/8rq41A5Khsm

I can get this to hang consistently by running it in a bash while loop:

$ while true; do go test -timeout 5s -v -count 1 .; sleep 0.1; done
=== RUN   TestOSExecNoWait
start
done
--- PASS: TestOSExecNoWait (0.01s)
PASS
ok      _/tmp   0.012s
=== RUN   TestOSExecNoWait
start
done
--- PASS: TestOSExecNoWait (0.01s)
PASS
ok      _/tmp   0.012s
=== RUN   TestOSExecNoWait
start
done
--- PASS: TestOSExecNoWait (0.01s)
PASS
ok      _/tmp   0.012s
=== RUN   TestOSExecNoWait
start
done
--- PASS: TestOSExecNoWait (0.01s)
PASS

<hangs here indefinitely>

If I explicitly call cmd.Wait() the test does not hang. If I don't attach the child process' Stdout and Stderr to os.Std{out,err} the test does not hang.

On 1.9.4 the test does not hang.

Its also interesting that even though I specified -timeout 5s the test runner hangs forever.

The text was updated successfully, but these errors were encountered:

crvv · 2018-02-23T09:31:30Z

the test runner hangs forever.

It only hangs 60 seconds on my machine.

go test hangs at

go/src/cmd/go/internal/test/test.go

Line 1271 in f6c6781

done <- cmd.Wait()

And cmd.Wait() is waiting at

go/src/os/exec/exec.go

Line 275 in f6c6781

_, err := io.Copy(w, pr)

This io.Copy() is reading a pipe reader. It won't return until the pipe writer is closed.
The pipe writer will be closed when the sleep 60 exits. So it hangs 60 seconds.

func TestOSExecNoWait(t *testing.T) {
...
	cancel()
}

After the cancel() returns, the main function will also return.
The child process may or may not be killed, so it hangs intermittently.

psanford · 2018-02-23T18:10:16Z

It only hangs 60 seconds on my machine.

Yes in the example provided it only hangs for 60 seconds because the child process exits.

In the test where I first found this issue the child process never exits so it hangs forever.

crvv · 2018-02-24T08:36:18Z

I don't think this issue is a regression introduced in Go 1.10.
I can use Go 1.9 to reproduce it but with a different condition.

This difference was introduced in bd95f88.
bd95f88#diff-acaf53a9cd478507ebbcf85037940b4dL1080

If the go command needs a bytes.Buffer to save the output, os/exec will open a pipe.
And go will hang until the pipe is closed.

The timeout doesn't work because it isn't handled by go.
There is a testKillTimeout in go, but it also hangs at cmd.Wait().

gopherbot · 2018-02-28T03:04:20Z

Change https://golang.org/cl/97497 mentions this issue: cmd/go/internal/test: don't wait for pending I/O if child process has gone

ianlancetaylor · 2018-02-28T05:42:33Z

See also #23019.

FiloSottile · 2018-04-24T00:54:30Z

@gopherbot please open backport tracking issues. This might be a 1.10 regression, or also a 1.9 issue.

gopherbot · 2018-04-24T00:54:36Z

Backport issue(s) opened: #25042 (for 1.10), #25043 (for 1.9).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

In go 1.10.1, these seem to make "go test" hang sometimes. golang/go#24050 No issue # Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tclegg@veritasgenetics.com>

mvdan · 2020-09-09T12:11:18Z

Also, before anyone spends significant time into a fix, I think we could just fix #23019 instead. It would save far more time in the long run, because it's a change we likely want to do anyway and should fix other cascading bugs. But it's also a bit more controversial.

bcmills · 2020-09-09T13:19:31Z

I'm really not sure about #23019: the copying is a symptom (of child processes left running with open file handles), not the root cause here.

I think the main problem in this case is that the child process is not being terminated when the test process is. I think go test should make a best effort to terminate all subprocesses of the test process, unless they were explicitly started in a way that would suggest otherwise (such as by setting Setpgid in the SysProcAttr field).

However, I'm not sure what the most appropriate way to achieve that would be. I could imagine:

Hooking the os package from the testing package to have it set cmd.SysProcAttr.Pdeathsig implicitly where available (but I think “where available” is only Linux)?
Hooking the os package so that the testing.M.startAlarm callback can enumerate and signal all subprocesses started from the test?
Changing cmd/go to start each test in its own process group, forward signals to that process group explicitly, and signal the whole group on timeout?
Changing testing and/or cmd/go to enumerate child processes using procfs or similar?
Something else that I've missed?

And I'm not even sure where to start on Windows, since I assume there is no procfs there.

mvdan · 2020-09-09T19:21:57Z

I think you're right, we should terminate all children processes. It's a bit alarming that that's not the default behavior, and more so that there doesn't seem to be a portable way to do it.

You could blame this on the written test, since technically I could wrap exec calls with a timeout from testing.T.Deadline. Still, you'd be relying on the tests being written properly that way, which is not very realistic, particularly since that Deadline method is very new.

Shouldn't os/exec have some sort of option to apply this "kill all child processes" logic in the most portable way possible? It seems like such a common source of problems, and I've personally had to implement non-portable hacks like Pdeathsig before.

bcmills · 2020-09-09T19:43:01Z

I don't think the go test behavior should rely on folks using os/exec in any particular way. (I would rather we find a more transparent fix in cmd/go.)

bcmills · 2020-09-09T19:44:37Z

But yes: I think it would be handy for the os package to have some straightforward way to express “send this signal to this process and all of its children”. Maybe that's the process-group solution?

ianlancetaylor · 2020-09-09T20:58:06Z

The problem with a general "signal process and its children" mechanism on Unix systems is that the only options are "send signal to process" and "send signal to thread" and "send signal to process group". And we definitely don't want to start every new process in a different process group, as that will have surprising effects on the use of ^Z from the terminal. And of course we have no idea what processes a child process may start.

That said we could in principle invoke the ps program or (on GNU/Linux) read the /proc file system to look for all processes, build a process tree, and use that to identify the children of our child process. But that seems like a rather complex mechanism to implement a standard library function.

For testing specifically, it might be somewhat acceptable for go test to start the test binary in a separate process group, in which case it would be straightforward for the go command to kill the entire test binary process group.

mvdan · 2020-09-09T21:02:51Z

Interesting, I wasn't thinking of the implications of doing this in general. I agree that the use case for testing is pretty narrow, especially because I don't think we should ever leak processes or run in the background.

bcmills · 2020-11-19T15:37:38Z

See previously #28039.

gopherbot · 2022-04-19T05:55:24Z

Change https://go.dev/cl/400877 mentions this issue: cmd/go/internal/test: used lockedStdout instead of os.Stdout

There is an issue where 'go test' will hang after the tests complete if a test starts a sub-process that does not exit (see #24050). However, go test only exhibits that behavior when a package name is explicitly passed as an argument. If 'go test' is invoked without any package arguments then the package in the working directory is assumed, however in that case (and only that case) os.Stdout is used as the test process's cmd.Stdout, which does *not* cause 'go test' wait for the sub-process to exit (see #23019). This change wraps os.Stdout in an io.Writer struct in this case, hiding the *os.File from the os/exec package, causing cmd.Wait to always wait for the full output from the test process and any of its sub-processes. In other words, this makes 'go test' exhibit the same behavior as 'go test .' (or 'go test ./...' and so on). Update #23019 Update #24050 Change-Id: Ica09bf156f3b017f9a31aad91ed0f16a7837195b Reviewed-on: https://go-review.googlesource.com/c/go/+/400877 Reviewed-by: Bryan Mills <bcmills@google.com> Run-TryBot: Andrew Gerrand <adg@golang.org> Auto-Submit: Andrew Gerrand <adg@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Andrew Gerrand <adg@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>

josharian · 2022-05-25T18:22:42Z

This just bit me as well. Ran through a fair few hours of GitHub CI time before I noticed something was wrong. (Tip: set timeout-minutes in your Actions even if you don't need it.)

For those finding this thread and hoping for some code to copy/paste to work around the issue, here's the suggestion from @mvdan above, spelled out.

Change:

cmd := exec.Command("YOUR_PROGRAM")

to something along these lines:

ctx := context.Background()
deadline, ok := t.Deadline()
if ok {
	// Give ourselves 100ms to kill the command before the actual deadline arrives.
	deadline = deadline.Add(-100 * time.Millisecond)
	var cancel func()
	ctx, cancel = context.WithDeadline(ctx, deadline)
	t.Cleanup(cancel)
}
cmd := exec.CommandContext(ctx, "YOUR_PROGRAM")

(@mvdan @bcmills @ianlancetaylor feel free to edit this comment directly if you'd suggest a different formulation)

bcmills · 2022-05-25T18:25:07Z

@josharian, if your subprocess is a Go program I suggest sending SIGQUIT on timeout instead of SIGTERM (so that you get a goroutine dump), but that's currently a bit complicated.

(My proposal #50436 aims to make that somewhat smoother, but the fact that SIGQUIT is not defined on all platforms remains a bit of a rough edge.)

gopherbot · 2022-12-07T22:35:27Z

Change https://go.dev/cl/456116 mentions this issue: cmd/go: use Cancel and WaitDelay to terminate test subprocesses

bcmills · 2022-12-14T16:39:57Z

I've been thinking about this some more.
I've added a reproducer (and proposed fix) in https://go.dev/cl/456116.

Notably, it is possible to reproduce this behavior even for a passing test, not just a failing one: the test may start and leak a subprocess, and then return from all of the Test functions, print PASS, and exit with code 0 while the subprocess is still running (and still holding open the pipes).

I don't think tests should be in the habit of orphaning subprocesses, and perhaps we should fail (or terminate the subprocesses) of tests that do so. On the other hand, that's a much harder problem, and programs that start subprocesses that may run indefinitely should arrange for those processes to be terminated anyway — whether by using Pdeathsig, or having the subprocess periodically probe stderr or stdout, or having it exit when stdin is closed.

(Note that in the integration test, I used a periodic write to stderr to terminate the subprocess. It's not perfect — it can leave the orphan running for up to 100ms after everything else has returned — but it at least does eventually clear up on its own.)

So, for now I think we should set a WaitDelay on the test process, and mark the test as failed if that WaitDelay expires (even if it would have passed otherwise).

gopherbot · 2023-02-01T21:50:25Z

Change https://go.dev/cl/464555 mentions this issue: cmd/go/internal/test: scale the WaitDelay for test subprocesses

Prior to CL 456116 we had an arbitrary 5-second delay after a test times out before we kill the test. In CL 456116, I reused that arbitrary 5-second delay as the WaitDelay as well, but on slower builders it does not seem to be generous enough. Instead of hard-coding the delay, for tests with a finite timout we now use a hard-coded fraction of the overall timeout. That will probably give delays that are longer than strictly necessary for very long timeouts, but if the user is willing to wait for a very long timeout they can probably wait a little longer for I/O too. Fixes #58230. Updates #24050. Change-Id: Ifbf3e576c034c721aa00cd17bf88563474b09955 Reviewed-on: https://go-review.googlesource.com/c/go/+/464555 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Auto-Submit: Bryan Mills <bcmills@google.com>

Prior to CL 456116 we had an arbitrary 5-second delay after a test times out before we kill the test. In CL 456116, I reused that arbitrary 5-second delay as the WaitDelay as well, but on slower builders it does not seem to be generous enough. Instead of hard-coding the delay, for tests with a finite timout we now use a hard-coded fraction of the overall timeout. That will probably give delays that are longer than strictly necessary for very long timeouts, but if the user is willing to wait for a very long timeout they can probably wait a little longer for I/O too. Fixes golang#58230. Updates golang#24050. Change-Id: Ifbf3e576c034c721aa00cd17bf88563474b09955 Reviewed-on: https://go-review.googlesource.com/c/go/+/464555 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Auto-Submit: Bryan Mills <bcmills@google.com>

ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Feb 23, 2018

ianlancetaylor added this to the Go1.10.1 milestone Feb 23, 2018

ianlancetaylor mentioned this issue Feb 28, 2018

os/exec: consider changing Wait to stop copying goroutines rather than waiting for them #23019

Closed

siebenmann mentioned this issue Mar 7, 2018

os/exec: tests break testing package if testlog file is not file descriptor 3 #24285

Closed

andybons removed the release-blocker label Mar 26, 2018

andybons mentioned this issue Mar 27, 2018

all: Go 1.10.1 release status #24563

Closed

andybons modified the milestones: Go1.10.1, Go1.10.2 Mar 27, 2018

FiloSottile modified the milestones: Go1.10.2, Go1.11 Apr 24, 2018

This was referenced Apr 24, 2018

testing: test with child process sometimes hangs on 1.10; -timeout not respected [1.11 backport] #25042

Closed

testing: test with child process sometimes hangs on 1.10; -timeout not respected [1.9 backport] #25043

Closed

arvados-bot pushed a commit to arvados/arvados that referenced this issue Apr 26, 2018

Avoid dangling child procs in test suite.

a221be6

In go 1.10.1, these seem to make "go test" hang sometimes. golang/go#24050 No issue # Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tclegg@veritasgenetics.com>

ianlancetaylor modified the milestones: Go1.11, Go1.12 Jun 29, 2018

ianlancetaylor mentioned this issue Oct 23, 2018

proposal: context/v2: update context package #28342

Open

andybons modified the milestones: Go1.12, Go1.13 Feb 12, 2019

andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019

rsc modified the milestones: Go1.14, Backlog Oct 9, 2019

mvdan mentioned this issue Sep 16, 2020

tests seem to hang on Mac on GitHub Actions ipld/go-ipld-prime#79

Closed

bcmills mentioned this issue Feb 16, 2021

testing: run Cleanup if SIGINT occurs #41891

Open

Kishan-Dhakan mentioned this issue Dec 20, 2021

Livestream tests 0chain/system_test#135

Merged

1 task

bcmills self-assigned this Dec 14, 2022

bcmills modified the milestones: Backlog, Go1.21 Dec 14, 2022

bcmills added the NeedsFix The path to resolution is known, but the work has not been done. label Dec 14, 2022

gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 14, 2022

gopherbot closed this as completed in caf8fb1 Jan 19, 2023

bcmills changed the title ~~testing: test with child process sometimes hangs on 1.10; -timeout not respected~~ testing: test process hangs beyond -timeout if a child process holds I/O streams open Jan 20, 2023

bcmills mentioned this issue Feb 1, 2023

cmd/go: exec: WaitDelay expired before I/O complete #58230

Closed

golang locked and limited conversation to collaborators Feb 1, 2024

gopherbot added the FrozenDueToAge label Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing: test process hangs beyond `-timeout` if a child process holds I/O streams open #24050

testing: test process hangs beyond `-timeout` if a child process holds I/O streams open #24050

psanford commented Feb 22, 2018

crvv commented Feb 23, 2018

psanford commented Feb 23, 2018

crvv commented Feb 24, 2018

gopherbot commented Feb 28, 2018

ianlancetaylor commented Feb 28, 2018

FiloSottile commented Apr 24, 2018

gopherbot commented Apr 24, 2018

mvdan commented Sep 9, 2020

bcmills commented Sep 9, 2020

mvdan commented Sep 9, 2020

bcmills commented Sep 9, 2020 •

edited

Loading

bcmills commented Sep 9, 2020

ianlancetaylor commented Sep 9, 2020

mvdan commented Sep 9, 2020 •

edited

Loading

bcmills commented Nov 19, 2020

gopherbot commented Apr 19, 2022

josharian commented May 25, 2022

bcmills commented May 25, 2022

gopherbot commented Dec 7, 2022

bcmills commented Dec 14, 2022

gopherbot commented Feb 1, 2023

testing: test process hangs beyond -timeout if a child process holds I/O streams open #24050

testing: test process hangs beyond -timeout if a child process holds I/O streams open #24050

Comments

psanford commented Feb 22, 2018

What version of Go are you using (go version)?

What operating system and processor architecture are you using (go env)?

What did you do?

crvv commented Feb 23, 2018

psanford commented Feb 23, 2018

crvv commented Feb 24, 2018

gopherbot commented Feb 28, 2018

ianlancetaylor commented Feb 28, 2018

FiloSottile commented Apr 24, 2018

gopherbot commented Apr 24, 2018

mvdan commented Sep 9, 2020

bcmills commented Sep 9, 2020

mvdan commented Sep 9, 2020

bcmills commented Sep 9, 2020 • edited Loading

bcmills commented Sep 9, 2020

ianlancetaylor commented Sep 9, 2020

mvdan commented Sep 9, 2020 • edited Loading

bcmills commented Nov 19, 2020

gopherbot commented Apr 19, 2022

josharian commented May 25, 2022

bcmills commented May 25, 2022

gopherbot commented Dec 7, 2022

bcmills commented Dec 14, 2022

gopherbot commented Feb 1, 2023

testing: test process hangs beyond `-timeout` if a child process holds I/O streams open #24050

testing: test process hangs beyond `-timeout` if a child process holds I/O streams open #24050

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

bcmills commented Sep 9, 2020 •

edited

Loading

mvdan commented Sep 9, 2020 •

edited

Loading