-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: "program exceeds 50-thread limit" in bootstrap build for Go 1.9 RC2 on arm64 on Centriq 2400 #21559
Comments
From reading the test code, I think the error is showing up in
|
Note that all tests pass on the Cavium ThunderX, but this one test fails to pass on the Centriq system. Exploring further to see if I can get different results. |
Is this easily repeatable? If so, please run |
Odd. It's reproducible from the ./all.bash shell script, but when I make and run the the |
When I download the binaries for go 1.9 rc2 they pass the |
Is there any difference in the output of |
Both should be plenty high for any reason. |
I haven't been able to think of any reason why this test would behave differently when invoked from all.bash. I can't recall seeing any other reports of this problem. The test is fairly simple, and is basically testing that the pipes are handled by the internal poller. If we could capture it standalone, then running the test under |
I'm also puzzled. The only think I can possibly think of is that this machine is running a unique kernel, and that we're planning to update the kernel to something closer to mainline. I'll put this on hold for now, and rerun it the next time there's a new kernel. |
@valyala In this case the calls should not be slow, because when using a pipe all I/O should be done using non-blocking calls. |
OK, adding to the puzzle:
just worked perfectly on the same machine, no errors; building Curious! |
I can easily reproduce this on a 56-threaded Intel Xeon E5-2660 v4 Linux. (Out of 100 consecutive runs of |
Here is a test panic from that machine, if it may be of any use: os.txt. |
In fact this reproduces only when CPU is under some load. Stressing CPU makes it reproduce practically every time, whereas without any load the test may always succeed. I used the following script to bisect go history, c05b06a is the first bad commit. #!/bin/bash
set -e
cd src
./make.bash
cd os
../../bin/go test -a -c os
stress-ng -t 60 -c 60 &
trap 'killall stress-ng || true' EXIT
for i in `seq 100`; do
./os.test --test.short
done |
Thanks for the bisect. That is the commit that introduced the failing test in the first place. |
Can you generate a failure with |
Change https://golang.org/cl/63650 mentions this issue: |
Sorry for missing that you attached that before. The stack trace suggests that the problem is not with the test itself, it is with the cleanup when the test is done. It looks like we can get a thundering herd as we stop all the goroutines. Can you see if https://golang.org/cl/63650 fixes the problem for you? Thanks. |
When I repeat the test under load (and it fails every time), the number of goroutines in the dump typically is either 8-10, or 107-108. Above was an example with 10 goroutines, and here is one with 108: I'll test your patch now. |
The patch fixes the issue. Thanks! |
Thanks for testing it. Marking issue as 1.9.1 to consider bringing the test fix into 1.9.1 to avoid spurious build failure reports. |
Reopening to consider a backport to 1.9.1. |
I'm confused. The test says "Test that reading from a pipe doesn't use up a thread." and it sounds like it was using up a thread on the breaking system. The rewrite of the test may just be making the test not as precise about finding problems, essentially papering over a real problem. Can you explain why the old test was wrong? I just don't see it. The old channel+waitgroup looks functionally equivalent to the new channel+channel, and I don't see why putting Close into the goroutine (instead of a later cleanup pass) would change anything, unless Close is happening to cause extra serialization that defeats the test. If you don't mind just disabling the test for Go 1.9.2 because you think it's a bad test for some reason, that's fine. But it seems like a worthwhile test and maybe pointing out a real problem. |
The test is intended to test that reading from a pipe when there is no data to read doesn't use up a thread. When data arrives, a thread is required to actually do the read. With the original test, all the goroutines would queue up to read, and all is well. Then we would write to all of them at once, and we would blow past the thread limit. But, yeah, my fix is not well written. I think it may work because of synchronization on the For 1.9 we should probably just disable the test. |
OK. Disabling test OK for Go 1.9.2. |
Sent https://golang.org/cl/70771 for 1.9 branch. |
Change https://golang.org/cl/70771 mentions this issue: |
CL 70771 OK for Go 1.9.2. |
Updates #21559 Change-Id: I90fa8b4ef97c4251440270491ac4c833d76ee872 Reviewed-on: https://go-review.googlesource.com/70771 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
go1.9.2 has been packaged and includes: The release is posted at golang.org/dl. — golang.org/x/build/cmd/releasebot, Oct 26 21:09:08 UTC |
golang/go#19608 https://hydra.nixos.org/build/64329767/nixlog/1 https://hydra.nixos.org/build/64244716/nixlog/1 Remove the patch for golang/go#21559 because it is skipped as flaky since Go 1.9.2.
golang/go#19608 https://hydra.nixos.org/build/64329767/nixlog/1 https://hydra.nixos.org/build/64244716/nixlog/1 Remove the patch for golang/go#21559 because it is skipped as flaky since Go 1.9.2.
What version of Go are you using (
go version
)?go version go1.9rc2 linux/arm64
bootstrap compiler is go version go1.6.2 linux/arm64
Machine is
Does this issue reproduce with the latest release?
This issue does not reproduce on the Cavium ThunderX hardware, which passes all tests without flaws.
What operating system and processor architecture are you using (
go env
)?What did you do?
If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.
What did you expect to see?
All tests pass
What did you see instead?
Complete output at https://gist.github.com/017dbb7fa20d184d66a8d9ea4bf61225
The text was updated successfully, but these errors were encountered: