Description
Go version
go version go1.22.4 linux/amd64
What did you do?
- Introduce a data race on a map (repro: https://go.dev/play/p/rQ1QQ7PMZ86)
- Don't have sufficient test coverage that runs concurrently to catch it in CI
- Ship it to prod
- Have over a million goroutines
- Wait an hour....
What did you see happen?
... we hit the race on the map in prod after an hour and 💣
The Go runtime then did its fatal
:
fatal error: concurrent map iteration and map write
But then we got 3 GB of GOTRACEBACK stacks from the Go runtime for 1M+ stacks, about 3 GB more than we needed to find the bug, overwhelming our logging system in the process, causing a secondary outage.
GOTRACEBACK docs say:
The failure prints stack traces for all goroutines if there is no current goroutine or the failure is internal to the run-time.
And arguably a map race is internal to the run-time insofar as maps are implemented in the runtime.
But the Go runtime know it's the user's fault; note the "but is used when user code is expected to be at fault for the failure" bit here:
// fatal triggers a fatal error that dumps a stack trace and exits.
//
// fatal is equivalent to throw, but is used when user code is expected to be
// at fault for the failure, such as racing map writes.
//
// fatal does not include runtime frames, system goroutines, or frame metadata
// (fp, sp, pc) in the stack trace unless GOTRACEBACK=system or higher.
//
//go:nosplit
func fatal(s string) {
// Everything fatal does should be recursively nosplit so it
// can be called even when it's unsafe to grow the stack.
systemstack(func() {
print("fatal error: ")
printindented(s) // logically printpanicval(s), but avoids convTstring write barrier
print("\n")
})
fatalthrow(throwTypeUser)
}
What did you expect to see?
I'd expect fatalthrow(throwTypeUser)
to be treated as a normal panic with GOTRACEBACK=single
respected.
Currently it's ignored:
bradfitz@book1pro ~ % GOTRACEBACK=single go run maprace.go 2>&1 | wc -l
618
bradfitz@book1pro ~ % GOTRACEBACK=none go run maprace.go 2>&1 | wc -l
2
bradfitz@book1pro ~ % GOTRACEBACK=system go run maprace.go 2>&1 | wc -l
1684
(At least none
works, but that's too quiet, hiding the actual problem).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status