Skip to content

Commit 08ecdf7

Browse files
committed
runtime: fix racy allgs access on weak memory architectures
Currently, markroot is very clever about accessing the allgs slice to find stack roots. Unfortunately, on weak memory architectures, it's a little too clever and can sometimes read a nil g, causing a fatal panic. Specifically, gcMarkRootPrepare snapshots the length of allgs during STW and then markroot accesses allgs up to this length during concurrent marking. During concurrent marking, allgadd can append to allgs *without synchronizing with markroot*, but the argument is that the markroot access should be safe because allgs only grows monotonically and existing entries in allgs never change. This reasoning is insufficient on weak memory architectures. Suppose thread 1 calls allgadd during concurrent marking and that allgs is already at capacity. On thread 1, append will allocate a new slice that initially consists of all nils, then copy the old backing store to the new slice (write A), then allgadd will publish the new slice to the allgs global (write B). Meanwhile, on thread 2, markroot reads the allgs slice base pointer (read A), computes an offset from that base pointer, and reads the value at that offset (read B). On a weak memory machine, thread 2 can observe write B *before* write A. If the order of events from thread 2's perspective is write B, read A, read B, write A, then markroot on thread 2 will read a nil g and then panic. Fix this by taking a snapshot of the allgs slice header in gcMarkRootPrepare while the world is stopped and using that snapshot as the list of stack roots in markroot. This eliminates all read/write concurrency around the access in markroot. Alternatively, we could make markroot use the atomicAllGs API to atomically access the allgs list, but in my opinion it's much less subtle to just eliminate all of the interesting concurrency around the allgs access. Fixes #49686. Fixes #48845. Fixes #43824. (These are all just different paths to the same ultimate issue.) Change-Id: I472b4934a637bbe88c8a080a280aa30212acf984 Reviewed-on: https://go-review.googlesource.com/c/go/+/368134 Trust: Austin Clements <austin@google.com> Trust: Bryan C. Mills <bcmills@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
1 parent 8ebb8c9 commit 08ecdf7

File tree

3 files changed

+34
-8
lines changed

3 files changed

+34
-8
lines changed

src/runtime/mgc.go

+14
Original file line numberDiff line numberDiff line change
@@ -320,11 +320,20 @@ var work struct {
320320
nwait uint32
321321

322322
// Number of roots of various root types. Set by gcMarkRootPrepare.
323+
//
324+
// nStackRoots == len(stackRoots), but we have nStackRoots for
325+
// consistency.
323326
nDataRoots, nBSSRoots, nSpanRoots, nStackRoots int
324327

325328
// Base indexes of each root type. Set by gcMarkRootPrepare.
326329
baseData, baseBSS, baseSpans, baseStacks, baseEnd uint32
327330

331+
// stackRoots is a snapshot of all of the Gs that existed
332+
// before the beginning of concurrent marking. The backing
333+
// store of this must not be modified because it might be
334+
// shared with allgs.
335+
stackRoots []*g
336+
328337
// Each type of GC state transition is protected by a lock.
329338
// Since multiple threads can simultaneously detect the state
330339
// transition condition, any thread that detects a transition
@@ -1368,6 +1377,11 @@ func gcMark(startTime int64) {
13681377
throw("work.full != 0")
13691378
}
13701379

1380+
// Drop allg snapshot. allgs may have grown, in which case
1381+
// this is the only reference to the old backing store and
1382+
// there's no need to keep it around.
1383+
work.stackRoots = nil
1384+
13711385
// Clear out buffers and double-check that all gcWork caches
13721386
// are empty. This should be ensured by gcMarkDone before we
13731387
// enter mark termination.

src/runtime/mgcmark.go

+6-8
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,8 @@ func gcMarkRootPrepare() {
102102
// ignore them because they begin life without any roots, so
103103
// there's nothing to scan, and any roots they create during
104104
// the concurrent phase will be caught by the write barrier.
105-
work.nStackRoots = int(atomic.Loaduintptr(&allglen))
105+
work.stackRoots = allGsSnapshot()
106+
work.nStackRoots = len(work.stackRoots)
106107

107108
work.markrootNext = 0
108109
work.markrootJobs = uint32(fixedRootCount + work.nDataRoots + work.nBSSRoots + work.nSpanRoots + work.nStackRoots)
@@ -194,15 +195,12 @@ func markroot(gcw *gcWork, i uint32, flushBgCredit bool) int64 {
194195
default:
195196
// the rest is scanning goroutine stacks
196197
workCounter = &gcController.stackScanWork
197-
var gp *g
198-
if work.baseStacks <= i && i < work.baseEnd {
199-
// N.B. Atomic read of allglen in gcMarkRootPrepare
200-
// acts as a barrier to ensure that allgs must be large
201-
// enough to contain all relevant Gs.
202-
gp = allgs[i-work.baseStacks]
203-
} else {
198+
if i < work.baseStacks || work.baseEnd <= i {
199+
printlock()
200+
print("runtime: markroot index ", i, " not in stack roots range [", work.baseStacks, ", ", work.baseEnd, ")\n")
204201
throw("markroot: bad index")
205202
}
203+
gp := work.stackRoots[i-work.baseStacks]
206204

207205
// remember when we've first observed the G blocked
208206
// needed only to output in traceback

src/runtime/proc.go

+14
Original file line numberDiff line numberDiff line change
@@ -547,6 +547,20 @@ func allgadd(gp *g) {
547547
unlock(&allglock)
548548
}
549549

550+
// allGsSnapshot returns a snapshot of the slice of all Gs.
551+
//
552+
// The world must be stopped or allglock must be held.
553+
func allGsSnapshot() []*g {
554+
assertWorldStoppedOrLockHeld(&allglock)
555+
556+
// Because the world is stopped or allglock is held, allgadd
557+
// cannot happen concurrently with this. allgs grows
558+
// monotonically and existing entries never change, so we can
559+
// simply return a copy of the slice header. For added safety,
560+
// we trim everything past len because that can still change.
561+
return allgs[:len(allgs):len(allgs)]
562+
}
563+
550564
// atomicAllG returns &allgs[0] and len(allgs) for use with atomicAllGIndex.
551565
func atomicAllG() (**g, uintptr) {
552566
length := atomic.Loaduintptr(&allglen)

0 commit comments

Comments
 (0)