-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime,cmd/compile: frequent memory corruption on NetBSD and OpenBSD since 2021-10-11 #49209
Comments
This appears to be a regression in Go 1.18. Since NetBSD is not a first-class port, this doesn't necessarily block the 1.18 release — however, if the regression remains at the time of the release it at least needs a clear writeup in the release notes. (That part, at least, is a release-blocker.) |
@jeremyfaller, any idea who on the Go team might have insight into what broke on the 18th specifically? |
I'm not sure this error is entirely a regression, as I've seen this with pre-Go-1.18 on NetBSD. But perhaps something is making it much more frequent. |
Did a little digging on Oct17-18, and didn't see much that looked suspicious. pinging @mknyszek |
On it. |
This is more difficult to reproduce than I anticipated... |
My gomote swarm ran all night and I got nowhere, unfortunately. |
Looking more carefully at the For example, the @golang/release, was there a change to the builder images on or around Oct. 11 that could explain these failures? (If so, is there an older image we can restore, or a platform expert who can, say, try bisecting the kernel?) |
@mknyszek, the |
Ah OK. I'll focus on other failure modes. |
Got a core dump... for a go1.4 failure. I feel like I'm cursed.
|
Got 2 crashes in one go: a |
OK so I can't seem to find the core for the So, the check that's failing has code that looks like
I accessed the relevant When I looked at If all this makes sense, then I think that means what we have here is some kind of stack corruption. Specifically, the value of |
Looks like this also affects OpenBSD:
2021-11-17T04:31:22-f384c70/openbsd-386-70 |
From #34988 it seems we can conclude that it is a kernel issue on NetBSD? Is there any conclusion for OpenBSD? Do we still need to keep this issue open? Thanks. |
Change https://golang.org/cl/372355 mentions this issue: |
From my perspective, we do still need to keep the issue open, to the extent that we need to:
|
Hmm. Actually, I think it suffices to mark this as a duplicate of #34988, and do the above things for that bug. 👍 |
Not yet, as the C reproducer doesn't currently work on OpenBSD. We probably need to go back to David's Go reproducer and see if that works, and if so try to get it working in C again. |
This is not a bug in Go. The failing builders will be annotated with a known issue until it is resolved. Because of this, it is no longer a release blocker. |
Change https://golang.org/cl/377474 mentions this issue: |
Remove freebsd 12.2, which is replaced by 12.3 with the XSAVE fix. Move freebsd 11.* to N2 machines, which are not affected. Remove openbsd and netbsd e2/n1/n2/n2d-specific configurations now that we have mostly understood the nature of that problem. Keep one around so that the runtime team can create gomotes. Move the "official" builder to the n2 cpu that works. For golang/go#49967, golang/go#49209, golang/go#40561. Fixes golang/go#50496. Change-Id: If6989317f06cbec95d5addb19d9e968aecfa3f8a Reviewed-on: https://go-review.googlesource.com/c/build/+/377474 Trust: Heschi Kreinick <heschi@google.com> Run-TryBot: Heschi Kreinick <heschi@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Alex Rakoczy <alex@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
Knowing whether test failures are correlated with specific CPU models on has proven useful on several issues. Log it for prior to testing so it is always available. internal/sysinfo provides the CPU model, but it is not available in the bootstrap toolchain, so we can't access this unconditionally in cmd/dist. Instead use a build-tagged file, as the final version of cmd/dist will use the final toolchain. The addition of new data to the beginning of cmd/dist output will break x/build/cmd/coordinator's banner parsing, leaving extra lines in the log output, though information will not be lost. https://golang.org/cl/372538 fixes up the coordinator and should be submitted and deployed before this CL is submitted. For #46272. For #49209. For #50146. Change-Id: I515d2ec58e4c0034b76bf624ecaab38f16146074 Reviewed-on: https://go-review.googlesource.com/c/go/+/371474 Trust: Benny Siegert <bsiegert@gmail.com> Reviewed-by: Benny Siegert <bsiegert@gmail.com> Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
Change https://golang.org/cl/378589 mentions this issue: |
If our conclusion is that this is not a bug in Go, we should consider moving this to the 1.19 milestone. |
The issue is documented and the builders are configured to work around it. As far as I am concerned it would be appropriate to close this issue. |
SGTM. Thanks. |
Knowing whether test failures are correlated with specific CPU models on has proven useful on several issues. Log it for prior to testing so it is always available. internal/sysinfo provides the CPU model, but it is not available in the bootstrap toolchain, so we can't access this in cmd/dist. Instead use a separate binary which cmd/dist will only build once testing begins. The addition of new data to the beginning of cmd/dist output will break x/build/cmd/coordinator's banner parsing, leaving extra lines in the log output, though information will not be lost. https://golang.org/cl/372538 fixes up the coordinator and should be submitted and deployed before this CL is submitted. This is a redo of CL 371474. It switches back to the original approach of using a separate binary, as the bootstap toolchain won't allow cmd/dist to import internal packages. For #46272. For #49209. For #50146. Change-Id: I906bbda987902a2120c5183290a4e89a2440de58 Reviewed-on: https://go-review.googlesource.com/c/go/+/378589 Reviewed-by: Austin Clements <austin@google.com> Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
Knowing whether test failures are correlated with specific CPU models on has proven useful on several issues. Log it for prior to testing so it is always available. internal/sysinfo provides the CPU model, but it is not available in the bootstrap toolchain, so we can't access this unconditionally in cmd/dist. Instead use a build-tagged file, as the final version of cmd/dist will use the final toolchain. The addition of new data to the beginning of cmd/dist output will break x/build/cmd/coordinator's banner parsing, leaving extra lines in the log output, though information will not be lost. https://golang.org/cl/372538 fixes up the coordinator and should be submitted and deployed before this CL is submitted. For golang#46272. For golang#49209. For golang#50146. Change-Id: I515d2ec58e4c0034b76bf624ecaab38f16146074 Reviewed-on: https://go-review.googlesource.com/c/go/+/371474 Trust: Benny Siegert <bsiegert@gmail.com> Reviewed-by: Benny Siegert <bsiegert@gmail.com> Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
greplogs --dashboard -md -l -e 'freeIndex is not valid' --since=2021-05-01
2021-10-28T19:04:41-4e1c44d-18b9702/netbsd-386-9_0
2021-10-28T18:17:57-f229e70/netbsd-386-9_0
2021-10-28T18:01:34-03971e3-18b9702/netbsd-386-9_0
2021-10-28T01:15:26-103d89b-b2fe2eb/netbsd-386-9_0
2021-10-27T20:03:17-7b0b504-68bd512/netbsd-386-9_0
2021-10-27T16:39:27-94870a3-4f73fd0/netbsd-386-9_0
2021-10-27T13:12:49-d418f37-cfb5321/netbsd-386-9_0
2021-10-27T06:23:35-5786a54/netbsd-386-9_0
2021-10-27T05:33:58-ca5f65d/netbsd-386-9_0
2021-10-26T22:24:36-591e12a-80be4a4/netbsd-386-9_0
2021-10-26T22:05:53-80be4a4/netbsd-amd64-9_0
2021-10-26T18:40:06-9626607-11b64b4/netbsd-386-9_0
2021-10-26T15:46:18-c4ead46-1b2362b/netbsd-386-9_0
2021-10-19T07:45:46-98f6e03-ee92daa/netbsd-386-9_0
2021-10-18T21:52:05-98f6e03-425db64/netbsd-386-9_0
2021-10-18T21:52:05-425db64/netbsd-amd64-9_0
The text was updated successfully, but these errors were encountered: