-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/signal: TestStop flaky on DragonFly #25092
Comments
findflakes says
|
Presumably the new failures are somehow due to https://golang.org/cl/108376, which added two new tests that run before |
You can probably make this fail 1 out of 10 times or so by running
If that works for you--that is, if the test fails--could you attach the DragonFly equivalent of |
Possibly related to the race in #20748? |
@ianlancetaylor I tried to run the test you suggested, but where is "signal.test"? Does something generate that? |
|
@bcmills I forgot the -c option, thanks. Attached is the ktrace from a TestStop failure. |
Interesting, I can't get this failure to occur if I run this test on a single CPU VM, but it readily happens on a dual CPU VM. |
Adding human-readable format of ktrace. |
The builder is running dragonfly release 5.2.0, but I've confirmed that the flakiness in this test also happens on the most recent development branch. I can't recall for sure, but the appearance of this failure might very well coincide with switching the builder from a single CPU to a dual CPU VM a month or so ago. |
Thanks for the ktrace output. Unfortunately I don't see anything unusual in it. The |
Thanks, if you can give me a description of what this test does, I can run it by the dev team. |
I'm not sure that description will help, as most of the activity is in the Go signal handler. That said, the failing test is |
I played around some with signal_test.go and discovered that if I increase the timeout in waitSig to 1900 * time.Millisecond, then I can no longer get it to fail by repeatedly running signal.test. |
I know what's happening. See this commit: Note that the default cap on the timeout for umtx_sleep() is 2 seconds and notice the time in the FAIL: TestStop message - 1.99s. If I increase sysctl kern.umtx_timeout_max to 3 seconds, then the FAIL message time changes to 2.99s. So, the question is what's causing umtx_sleep to hit its max timeout. |
Thanks for the tip. I can recreate the problem. As far as I can tell, there is sometimes an unreasonably long period of time between a call to I see that the kernel changes discusses The sequence is
So if Both the |
If either fork is executed in TestDetectNohup, then the test is flaky. When I remove both, it passes every time. |
It sounds like you've demonstrated that this is a bug in the DragonFly kernel, so I guess we should just skip the test on DragonFly. |
I'm awaiting clarification from the dev team |
Dragonfly has found the issue and will fix the timing window in the kernel. I'll be testing a patch soon. |
Great, thanks! |
The fix has been committed to the master branch and I've cherry picked it for the builder: |
@tdfbsd Thanks! |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?1.10
Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?DragonflyBSD amd64
The builder sporadically fails with this message:
ok net/url 0.021s
ok os 2.430s
ok os/exec 1.919s
--- FAIL: TestStop (1.98s)
signal_test.go:32: timeout waiting for window size changes
FAIL
FAIL os/signal 10.420s
ok os/user 0.009s
ok path 0.008s
2018/04/25 07:54:09 Failed: exit status 1
If this is indicative of an OS bug, I need to know what this test is doing so I can report it to the OS team.
The text was updated successfully, but these errors were encountered: