-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime,cmd/compile: exit status 0xc0000374
(STATUS_HEAP_CORRUPTION
) on windows-amd64-longtest
#52647
Comments
Re-running the scan due to the possibility of failures masked by #52591:
|
That's two matching failures within the Go 1.19 cycle (and even within the past couple weeks!) on (CC @golang/windows) |
One more: |
Three days of continuous testing on 25 windows gomotes has gotten me zero of these failures, so I suspect I am missing some required component of the failure. |
None on the dashboard for the past week or so, although that's somewhat to be expected with the CL rate decreasing from the freeze.
(0 matching logs) Note that this has only been observed in the Maybe it has something to do with the shape of the machine? IIRC the (Or maybe it's some sort of crosstalk between tests and builds somehow? But that seems even more weird.) |
Still no new cases since 2022-05-10. I ran another set of 25 builders over the weekend, this time creating a new VM for each test run. Nothing. I am inclined to close this and reopen if anyone discovers new cases. |
SInce the freeze started 2022-05-07 and the rate of CLs (and thus dashboard test runs) is much lower during the freeze, it's not surprising and not necessarily meaningful to have fewer (or no) failures during that interval, and looking at the Running I would be more comfortable closing out this issue if we have a plausible (even if unconfirmed) theory for how it could have been fixed by a code or configuration change since the last failure. |
Fair enough, reopened. However beyond simply waiting for builders, I'm out of ideas for trying to reproduce this. Perhaps someone on @golang/windows has more context about this error and what may trigger it (I've been assuming memory corruption in the C allocator)? |
Looking for common factors in the
That suggests that the |
The staleness check for I believe that that function runs once per |
The But I don't know how that line could possibly be executed as part of |
One thing I haven't tried is testing at exactly one of the commits that previously failed. To that end, I'll test at f0c0e0f (commit from the 2022-04-27 failure). I've instrumented |
I still can't figure out how the The only (https://cs.opensource.google/search?q=%22%5C%22tool%5C%22,%20%5C%22compile%5C%22%22) |
I modified
|
Change https://go.dev/cl/412774 mentions this issue: |
To match the dashboard logs, I'm looking for an unindented
So far I'm not able to reproduce any exact match running any |
Ok, this is very weird. When
That leaves open several possibilities, but all of them are weird. Some that I can think of:
|
I gave this some more thought this evening.
That suggests to me that the failure mode has something to do with the way we distribute the built Given that, I think it would be ok to slip this issue to Go 1.20 to collect more information about the failure rate during open development and to see whether we see the same failure mode under conditions that aren't so closely tied to buildlet sharding. |
I agree with @bcmills that we have enough evidence at this point to drop release-blocker from this. Given that we're pretty sure we know what command is failing, is there anything we can do to gather more data from future failures? I'm worried we're just going to see logs like the ones we have, which I think we've tapped dry at this point. |
I tweaked the error message in CL 412954 to hopefully confirm Ian's diagnosis of which command is failing, and audited that log line to make sure we're also printing the command's Beyond that I think it would help to get a core dump on failure, but I don't know of a way to dump core on the builders without overwhelming the Coordinator's output limits (compare #49165). Coming from the opposite direction, would it make sense to have Perhaps we should also audit the Finally, I notice that the buildlet's |
Confirmed that the failure is indeed during
|
An intriguing clue (2022-08-23T03:09:07-0a52d80/windows-amd64-longtest):
|
No failures since the last one @bcmills reported. |
Timed out in state WaitingForInfo. Closing. (I am just a bot, though. Please speak up if this is a mistake or you have the requested information.) |
Another failure during
(In a TryBot on https://go.dev/cl/518776.) |
Given the |
@bcmills I think that failure is different. This failure is about |
Fair enough. Filed as #62079. |
Thank you! |
According to https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55, this exit code means:
greplogs --dashboard -md -l -e \(\?ms\)\\Awindows-.\*0xc0000374
2022-04-27T14:23:28-f0c0e0f/windows-amd64-longtest
Since this has only been seen once, leaving on the backlog to see whether this is a recurring pattern or a one-off fluke.
(CC @golang/runtime)
The text was updated successfully, but these errors were encountered: