-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: test failures introduced on ppc64le with change to improve fastrand #22047
Comments
Unfortunately I can't tell which runtime test is failing. The stack trace shows two goroutines whose stack is not reported because when the timeout occurs they are running on a different thread. It would be interesting to know if the failing test is |
I'm trying to get information from others who are building from upstream to see if anyone else has hit this problem. So far we've only seen it on the build page. I can't seem to find the information about the build machine, such as what distro is it and what would be the default GOMAXPROCS on that system? With that information I could see if it reproduces with the exact same environment. The test TestSelectFairness could be skipped on ppc64le at least temporarily to see if that prevents the failure from happening and then we would be pretty sure that test introduced the problem. |
Change https://golang.org/cl/67631 mentions this issue: |
Could it be due some field reordering in |
Yes, I agree it could be something else. If we skip this test and it continues to hang then we'll know it's not the test. |
A new testcase TestSelectFairness was recently added, and since then the ppc64le build tests have intermittently failed. This adds a change to skip this test on ppc64le using SkipFlaky to help determine if the problem is with the test or something else with that commit. Updates #22047 Change-Id: Idfef72ed791c5bd45c42ff180947fea3df280ea7 Reviewed-on: https://go-review.googlesource.com/67631 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Test has failed after skipping this test. |
What happens with such patch: diff --git a/src/runtime/sema.go b/src/runtime/sema.go
index 8715e07d7a..816e894ae5 100644
--- a/src/runtime/sema.go
+++ b/src/runtime/sema.go
@@ -275,7 +275,7 @@ func (root *semaRoot) queue(addr *uint32, s *sudog, lifo bool) {
// on the ticket: s.ticket <= both s.prev.ticket and s.next.ticket.
// https://en.wikipedia.org/wiki/Treap
// http://faculty.washington.edu/aragon/pubs/rst89.pdf
- s.ticket = fastrand()
+ s.ticket = fastrand() | 1
s.parent = last
*pt = s ??? |
There may well be a problem in sema.go but why would it only show up on ppc64? |
Change https://golang.org/cl/68050 mentions this issue: |
Before CL 62530 fastrand always returned non-zero value, and one condition in sema.go depends on this behavior. fastrand is used to generate random weight for treap of sudog, and it is checked against zero to verify sudog were inserted into treap or wait queue. Since its precision is not very important for correctness, lets just always set its lowest bit in this place. Updates #22047 Updates #21806 Change-Id: Iba0b56d81054e6ef9c49ffd293fc5d92a6a31e9b Reviewed-on: https://go-review.googlesource.com/68050 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
ppc64le build tests continue to hang intermittently. |
Change https://golang.org/cl/68710 mentions this issue: |
Fastrand is still guaranteed to always return non-zero values, so surely the patch that went in can’t fix anything. I still wouldn’t revert it because I don’t think code should rely on this property, it’s confusing and fragile. |
I don't think fastrand is guaranteed to return nonzero any more. |
Yes, xorshift64 with 32bit result returns zero easily. |
This test is no longer failing. |
I wish we understood exactly what fixed this, but closing anyway since it's no longer failing. |
The test is no longer failing because it's now skipped on ppc64le. |
What version of Go are you using (
go version
)?tip
Does this issue reproduce with the latest release?
intermittently, only seen on the golang build page so far
What operating system and processor architecture are you using (
go env
)?ppc64le
What did you do?
When the fastrand change was merged e7e4a4f runtime failures started to happen intermittently for ppc64le on the build page. Another change was merged after that resulting in consistent failures on ppc64le that hid this one for a while, but that one has now been fixed.
What did you expect to see?
ok for the ppc64le build on the build page
What did you see instead?
fail for the ppc64le build on the build page intermittently
Here is an example of when it fails https://build.golang.org/log/8552765394dde8879c15b4e35b2f995606b8ed1b.
Looks like the tests are all parked waiting to run and it times out.
If I look at the change where this started happening, the only thing that looks like it could affect this is the code that was moved in mcommoninit to be under control of a lock where it wasn't before, but I can't say that is wrong, just different.
I have not been able to reproduce this anywhere else.
The text was updated successfully, but these errors were encountered: