-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: possible memory corruption on FreeBSD #46272
Comments
For information, the CC @paulzhol, @dmgk, @cagedmantis. |
If it's correlated with FreeBSD being updated, this may not be a release blocker. We should still probably figure out what's wrong, but I don't have any good ideas besides stress-testing It's also still possible that this is a Go issue, but just that it's only a problem on FreeBSD 12.2. Between when the builders got updated (looks like... April 23rd) and when the first failure happened, there's about 2 weeks. Also those two weeks happened to have the last week before the freeze. |
Running |
109 |
I stand corrected! I do actually have a failure that looks promising. Again, in
|
Due to a bug in my script, I have lost the gomote state (and any potential core dump). Re-trying. At least I've found it's reproducible (kind of). |
Ping -- I understand this is a tricky one, but it does still seem important to resolve in some way. Worst case we might need a prominent release note. Do we know if this is a regression in Go? That seems worth understanding. |
I think a prominent release note is overkill. It's unclear where the regression is. Given the frequency of failure, it is still possible it's a FreeBSD 12.2 x Go 1.17 thing. I was trying to reproduce it in gomotes but failed to since that one time. I'm going to check the logs again and update this. I'll also spin up the gomotes again. |
I still haven't seen any such failure on the builders since those three I posted earlier. |
#45887 looks like memory corruption too (on the |
Thanks Bryan. I think those both are related. I'm still trying to reproduce. I'm now somewhat worried that by setting |
This could also be related to #34988, in that they both involve memory corruption on BSD variants in programs that fork subprocesses. |
I reproduced it! Except I think if |
I think the theory that it's related to process startup makes sense -- we don't have very many sample points, but including the two failures I was able to reproduce, it seems like they happen in cmd/go tests, cmd/vet tests, and runtime tests. These tend to spawn a non-trivial amount of subprocesses, so it makes sense that the failure would likely happen there if this was true. I'm refocusing my efforts on those packages' tests instead of on |
Maybe the same root cause, maybe different, but here is a recent |
Refocusing on those packages didn't help. I actually got the gomotes to run continuously all weekend on runtime tests and I got absolutely nothing. Back to |
@bcmills That 11.2 failure is interesting, and suggests maybe it isn't just a 12.2 bug? |
@bcmills there have been several fixes since I ported the initial fast gettimeofday code (stronger memory barriers) and a code re-org: https://cgit.freebsd.org/src/log/lib/libc/x86/sys/__vdso_gettc.c?h=stable/12. Meanwhile there's a |
I would not expect this to be fixed in a stock FreeBSD 12.3 release image (but should be fixed by an errata update available later today via FreeBSD-update). From the change referenced above (https://golang.org/cl/375695) it looks like Go has an entry host-freebsd-12_3 which uses image freebsd-amd64-123-stable-20211230 which is a snapshot built from the stable/12 branch in late Dec 2021, it will indeed have the fix. The distinction is likely immaterial for Go CI, but in case anyone encounters this issue on a production FreeBSD 12.3 deployment they'll want to ensure that they've updated to at least 12.3-RELEASE-p1. |
Timed out in state WaitingForInfo. Closing. (I am just a bot, though. Please speak up if this is a mistake or you have the requested information.) |
@emaste is correct, I built the images from snapshot, they're not exactly -p1. |
I got through 912 iterations of all.bash on 12_3 and 898 on 12_2. I got two failures that appear to be memory corruption on 12_2 and zero on 12_3. That alone isn't enough to base particularly strong statistical conclusions on, I'm afraid, so I'm going to take a Bayesian out and say that in combination with the fact that we're already pretty sure the bug was fixed, we can close this issue as fixed. Thank you everyone! |
Thanks to the incredible analysis from @rsc in #46272 (comment) I'm quite confident this is indeed fixed. |
Knowing whether test failures are correlated with specific CPU models on has proven useful on several issues. Log it for prior to testing so it is always available. internal/sysinfo provides the CPU model, but it is not available in the bootstrap toolchain, so we can't access this unconditionally in cmd/dist. Instead use a build-tagged file, as the final version of cmd/dist will use the final toolchain. The addition of new data to the beginning of cmd/dist output will break x/build/cmd/coordinator's banner parsing, leaving extra lines in the log output, though information will not be lost. https://golang.org/cl/372538 fixes up the coordinator and should be submitted and deployed before this CL is submitted. For #46272. For #49209. For #50146. Change-Id: I515d2ec58e4c0034b76bf624ecaab38f16146074 Reviewed-on: https://go-review.googlesource.com/c/go/+/371474 Trust: Benny Siegert <bsiegert@gmail.com> Reviewed-by: Benny Siegert <bsiegert@gmail.com> Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
Change https://golang.org/cl/378589 mentions this issue: |
Sorry for asking for TL;DR: is patch available or not yet? |
@tarkhil We believe this is fixed by a FreeBSD update. You want to be using at least version 12.3-RELEASE-p1. |
13.0 fails. Okay, tomorrow I'll add a disk, so it's a chance to upgrade to actual patchlevel for 13.0 |
For 13 the fix is in 13.0-RELEASE-p6 |
PR: golang/go#46272 Reviewed by: emaste Sponsored by: The FreeBSD Foundation MFC after: 3 days
Knowing whether test failures are correlated with specific CPU models on has proven useful on several issues. Log it for prior to testing so it is always available. internal/sysinfo provides the CPU model, but it is not available in the bootstrap toolchain, so we can't access this in cmd/dist. Instead use a separate binary which cmd/dist will only build once testing begins. The addition of new data to the beginning of cmd/dist output will break x/build/cmd/coordinator's banner parsing, leaving extra lines in the log output, though information will not be lost. https://golang.org/cl/372538 fixes up the coordinator and should be submitted and deployed before this CL is submitted. This is a redo of CL 371474. It switches back to the original approach of using a separate binary, as the bootstap toolchain won't allow cmd/dist to import internal packages. For #46272. For #49209. For #50146. Change-Id: I906bbda987902a2120c5183290a4e89a2440de58 Reviewed-on: https://go-review.googlesource.com/c/go/+/378589 Reviewed-by: Austin Clements <austin@google.com> Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
Knowing whether test failures are correlated with specific CPU models on has proven useful on several issues. Log it for prior to testing so it is always available. internal/sysinfo provides the CPU model, but it is not available in the bootstrap toolchain, so we can't access this unconditionally in cmd/dist. Instead use a build-tagged file, as the final version of cmd/dist will use the final toolchain. The addition of new data to the beginning of cmd/dist output will break x/build/cmd/coordinator's banner parsing, leaving extra lines in the log output, though information will not be lost. https://golang.org/cl/372538 fixes up the coordinator and should be submitted and deployed before this CL is submitted. For golang#46272. For golang#49209. For golang#50146. Change-Id: I515d2ec58e4c0034b76bf624ecaab38f16146074 Reviewed-on: https://go-review.googlesource.com/c/go/+/371474 Trust: Benny Siegert <bsiegert@gmail.com> Reviewed-by: Benny Siegert <bsiegert@gmail.com> Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
Several failures in the last month on FreeBSD builders have failure modes that are very difficult to explain, such as
SIGSEGV
s in hot runtime paths accessing values that don't quite make sense (e.g.spanOf
sees a bad arena entry,fixalloc
accesses an out-of-bounds slot, a broken return PC in a runtime stack frame). I suspect they share an underlying cause. Three issues have already been opened for these: #45977, #46103, #46182.As far as I know, these all seem to be specific to FreeBSD, and even more specifically, the "race" and "12_2" builders.
The relevant logs are available below.
2021-05-04T20:50:35-d19e549/freebsd-amd64-race
2021-05-10T23:42:56-5c48951/freebsd-amd64-12_2
2021-05-14T16:42:01-3d324f1/freebsd-amd64-race
The text was updated successfully, but these errors were encountered: