-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build-and-test jobs time out after 8 hours, possibly a deadlock? #1576
Comments
I haven't looked at this example at all, but another possibility might be #808. If these all use a saga that blows an assertion (or otherwise pancis), I think you'd see this as well. This should be evident from the log files that are created by the tests. I've seen buildomat include them before, but I don't see them from this run. Maybe it doesn't save them when the test times out? |
Given all the "running for over 60 seconds" tests are disk-related, I think that makes sense. |
I see some network-interface-related tests too, so not all disk related. The commonality between those is the instance creation saga I think, though there may be other shared code too. |
I assume this is unrelated but in case it's not: I ran into a couple of runs that also timed out unexpectedly: https://github.com/oxidecomputer/omicron/pull/1123/checks?check_run_id=7796250815 In both of these, it appears to be a new test added in those PRs that's still running. But I've never seen that test take longer than 5 minutes locally or in the Helios or GitHub Ubuntu CI. I imagine this is unrelated to the above but I thought I'd mention it in case there's some other systemic issue going on. |
The issues I linked above were due to a deadlock in Oso that I'm going to pull in with #1123. It's conceivable that's what happened here, too, but it's hard to say. If we see this again and have either live state, a core file from the test process, or log files from the test process, that would help answer that question. |
Haven't seen this issue in a while, #1123 probably fixed it. |
https://github.com/oxidecomputer/omicron/pull/1564/checks?check_run_id=7729390898
While running the nexus
test_all
target, some tests seem to deadlock with each other sometimes?The text was updated successfully, but these errors were encountered: