Skip to content

runtime: concurrent map iter+write fatal doesn't respect GOTRACEBACK #68019

Open
@bradfitz

Description

@bradfitz

Go version

go version go1.22.4 linux/amd64

What did you do?

  • Introduce a data race on a map (repro: https://go.dev/play/p/rQ1QQ7PMZ86)
  • Don't have sufficient test coverage that runs concurrently to catch it in CI
  • Ship it to prod
  • Have over a million goroutines
  • Wait an hour....

What did you see happen?

... we hit the race on the map in prod after an hour and 💣

The Go runtime then did its fatal:

fatal error: concurrent map iteration and map write

But then we got 3 GB of GOTRACEBACK stacks from the Go runtime for 1M+ stacks, about 3 GB more than we needed to find the bug, overwhelming our logging system in the process, causing a secondary outage.

GOTRACEBACK docs say:

The failure prints stack traces for all goroutines if there is no current goroutine or the failure is internal to the run-time.

And arguably a map race is internal to the run-time insofar as maps are implemented in the runtime.

But the Go runtime know it's the user's fault; note the "but is used when user code is expected to be at fault for the failure" bit here:

// fatal triggers a fatal error that dumps a stack trace and exits.
//
// fatal is equivalent to throw, but is used when user code is expected to be
// at fault for the failure, such as racing map writes.
//
// fatal does not include runtime frames, system goroutines, or frame metadata
// (fp, sp, pc) in the stack trace unless GOTRACEBACK=system or higher.
//
//go:nosplit
func fatal(s string) {
        // Everything fatal does should be recursively nosplit so it
        // can be called even when it's unsafe to grow the stack.
        systemstack(func() {
                print("fatal error: ")
                printindented(s) // logically printpanicval(s), but avoids convTstring write barrier
                print("\n")
        })

        fatalthrow(throwTypeUser)
}

What did you expect to see?

I'd expect fatalthrow(throwTypeUser) to be treated as a normal panic with GOTRACEBACK=single respected.

Currently it's ignored:

bradfitz@book1pro ~ % GOTRACEBACK=single go run maprace.go 2>&1 | wc -l
     618
bradfitz@book1pro ~ % GOTRACEBACK=none go run maprace.go 2>&1 | wc -l
       2
bradfitz@book1pro ~ % GOTRACEBACK=system go run maprace.go 2>&1 | wc -l
    1684

(At least none works, but that's too quiet, hiding the actual problem).

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.compiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions