runtime: change mutex profile to scale contention by number of blocked goroutines #61015

rsc · 2023-06-26T21:26:52Z

The pprof mutex profile was meant to match the Google C++ (now Abseil) mutex profiler, originally designed and implemented by Mike Burrows. When we worked on the Go version, pjw and I missed that C++ counts the time each thread is blocked, even if multiple threads are blocked on a mutex. That is, if 100 threads are blocked on the same mutex for the same 10ms, that still counts as 1000ms of contention in C++. In Go, to date, /debug/pprof/mutex has counted that as only 10ms of contention. If 100 goroutines are blocked on one mutex and only 1 goroutine is blocked on another mutex, we probably do want to see the first mutex as being more contended, so the Abseil approach is the more useful one.

I propose we change the pprof mutex profile in Go 1.22 to scale contention by the number of blocked goroutines, matching Abseil's more useful behavior.

It is worth noting that if you have 100 goroutines contending a mutex and what happens is one of them holds it for 10ms at each acquisition while the others immediately drop it, the round-robin mutex granting combined with charging the Unlock stack that unblocks a goroutine will mean that only 1 of each 100 samples will be attributed to the 10ms goroutine. The others will be attributed to the previous goroutine in the chain. It is important to interpret such profiles as providing useful information about which locks are contended but not always providing clear-cut evidence about the slow Unlock stack. (See my next comment below for a way to avoid this problem while still scaling reported mutex contention by number of blocked goroutines.)

gopherbot · 2023-06-26T21:28:40Z

Change https://go.dev/cl/506415 mentions this issue: runtime: change mutex profile to count every blocked goroutine

rsc · 2023-06-27T18:06:08Z

Elaborating on my final paragraph above, consider this simple example with 6 threads:

T0: acquire M
T1: block on M
T2: block on M
T3: block on M
T4: block on M
T5: block on M
T0: work for 10ms
T0: unlock M, waking T1
T1: acquire M, immediately unlock M, waking T2
T2: acquire M, immediately unlock M, waking T3
T3: acquire M, immediately unlock M, waking T4
T4: acquire M, immediately unlock M, waking T5
T5: acquire M, immediately unlock M

Ignoring sampling, in the Google C++ implementation, this sequence results in 10ms attributed to T0's unlock (because of T1's wait) as well as 10ms attributed to T1-T4's unlocks (because of T2-T5's waits), even though those threads barely held the lock at all. So the overall 50ms in the profile may be useful for getting a sense of the scaled total contention on a given mutex, but the associated stacks do a bad job of explaining why the contention happened. The actual thread that caused the delay only shows up as 20% of the profile. If all the unlocks have the same stack, no harm done, but if a lock is used in multiple places in a program, the profile misdirects debugging effort toward innocent code.

As I remember it (vaguely), we did what we did in the Go mutex profiler - only counting the 10ms of contention during the T0 unlock - primarily to avoid pointing fingers at the innocent T1-T4 unlocks.

While I think everyone agrees that assigning more contention to more highly contended mutexes is good, I think we also agree that pointing fingers in the wrong direction is not good. I spoke to Mike Burrows, the original author of the C++ mutex profiling. He confirmed that the C++ implementation was mainly based on what was easy at the time and did not consider these subtle cases. He and I agree that given Go's history, it would make sense to keep the highly accurate finger-pointing, even as we also scale up the amount of contention reported for highly contended locks. Patch set 2 of https://go.dev/cl/506415 is one way to do this. T0's unlock of M attributes the exact time T1 has spent waiting plus 4X the average of how long T1 and T5 have spent waiting, and then as before it avoids additional double-counting, by resetting the "start of waiting" time associated with T2 and T5. (Future unlocks will reset T3 and T4 before they are consulted.)

So my amended proposal is to scale the contention up to match Google C++ / Abseil, but to do so while keeping the contentionz an accurate portrait of the stacks that caused the contention, keeping Go's improvement on C++ / Abseil in that regard.

It's also worth noting that the proposed change would bring the contention profile in line with the runtime/metrics metric /sync/mutex/wait/total:seconds, which already counts multiple waiting goroutines multiple times.

rsc · 2023-06-28T22:00:46Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

felixge · 2023-06-29T06:38:07Z

Thank you for raising this proposal and the deep analysis. I was surprised to learn that the mutex profile behaves the way it does. I had incorrectly assumed that it operates more like your amended proposal 🙈.

About the specifics of this proposal: The mutex profile consists of two sample types: contentions/count and delay/nanoseconds. Are you proposing to change both, or just the latter? I'd argue for both, since this would allow users to manually calculate the average waiting time that the contended goroutines have spent waiting.

Other than this detail, I think this is a great great and I'd love to see it implemented 👍

rsc · 2023-06-30T16:54:59Z

In the example in my second comment, today contentions/count is already counting 5 (one for each "waking" in the transcript). I think that's still the right number, not 5+4+3+2+1 = 15.

felixge · 2023-06-30T19:31:34Z

Ah, I think I understand now. To double check: Can you confirm that Go will produce the following mutexevent calls for your example if your proposal is applied?

mutexevent(~50ms, ...)
mutexevent(~0ms, ...)
mutexevent(~0ms, ...)
mutexevent(~0ms, ...)
mutexevent(~0ms, ...)

If yes, then I fully support this proposal.

rsc · 2023-07-03T14:45:17Z

@felixge, yes.

rsc · 2023-07-05T18:22:26Z

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

rsc · 2023-07-12T19:53:49Z

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

gopherbot · 2023-12-01T22:25:00Z

Change https://go.dev/cl/546475 mentions this issue: doc: add release notes for runtime/pprof changes

For #50891. For #61015. For #61422. Change-Id: I30d580814ac02fe9f3fbd1a101b2cc05947a9aaa Reviewed-on: https://go-review.googlesource.com/c/go/+/546475 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

gopherbot · 2023-12-04T21:48:28Z

Change https://go.dev/cl/547057 mentions this issue: runtime/pprof: document block and mutex profiles

Amazingly, we seem to have nearly no in-tree documentation on the semantics of block and mutex profiles. Add brief summaries, including the new behavior from CL 506415 and CL 544195. For #14689. For #44920. For #57071. For #61015. Change-Id: I1a6edce7c434fcb43f17c83eb362b1f9d1a32df1 Reviewed-on: https://go-review.googlesource.com/c/go/+/547057 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Rhys Hiltner <rhys@justin.tv> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>

For golang#50891. For golang#61015. For golang#61422. Change-Id: I30d580814ac02fe9f3fbd1a101b2cc05947a9aaa Reviewed-on: https://go-review.googlesource.com/c/go/+/546475 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Amazingly, we seem to have nearly no in-tree documentation on the semantics of block and mutex profiles. Add brief summaries, including the new behavior from CL 506415 and CL 544195. For golang#14689. For golang#44920. For golang#57071. For golang#61015. Change-Id: I1a6edce7c434fcb43f17c83eb362b1f9d1a32df1 Reviewed-on: https://go-review.googlesource.com/c/go/+/547057 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Rhys Hiltner <rhys@justin.tv> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>

rsc added this to the Proposal milestone Jun 26, 2023

gopherbot added the Proposal label Jun 26, 2023

ianlancetaylor added this to Proposals Jun 27, 2023

ianlancetaylor moved this to Incoming in Proposals Jun 27, 2023

rsc moved this from Incoming to Active in Proposals Jun 28, 2023

rsc moved this from Active to Likely Accept in Proposals Jul 5, 2023

rsc added the Proposal-FinalCommentPeriod label Jul 5, 2023

rsc moved this from Likely Accept to Accepted in Proposals Jul 12, 2023

rsc changed the title ~~proposal: runtime: change mutex profile to scale contention by number of blocked goroutines~~ runtime: change mutex profile to scale contention by number of blocked goroutines Jul 12, 2023

rsc modified the milestones: Proposal, Backlog Jul 12, 2023

rsc added Proposal-Accepted and removed Proposal-FinalCommentPeriod labels Jul 12, 2023

gopherbot closed this as completed in 1c00354 Aug 17, 2023

rsc removed this from Proposals Aug 29, 2024

golang locked and limited conversation to collaborators Dec 3, 2024

gopherbot added the FrozenDueToAge label Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: change mutex profile to scale contention by number of blocked goroutines #61015

runtime: change mutex profile to scale contention by number of blocked goroutines #61015

rsc commented Jun 26, 2023 •

edited

Loading

gopherbot commented Jun 26, 2023

rsc commented Jun 27, 2023 •

edited

Loading

rsc commented Jun 28, 2023

felixge commented Jun 29, 2023 •

edited

Loading

rsc commented Jun 30, 2023

felixge commented Jun 30, 2023

rsc commented Jul 3, 2023

rsc commented Jul 5, 2023

rsc commented Jul 12, 2023

gopherbot commented Dec 1, 2023

gopherbot commented Dec 4, 2023

runtime: change mutex profile to scale contention by number of blocked goroutines #61015

runtime: change mutex profile to scale contention by number of blocked goroutines #61015

Comments

rsc commented Jun 26, 2023 • edited Loading

gopherbot commented Jun 26, 2023

rsc commented Jun 27, 2023 • edited Loading

rsc commented Jun 28, 2023

felixge commented Jun 29, 2023 • edited Loading

rsc commented Jun 30, 2023

felixge commented Jun 30, 2023

rsc commented Jul 3, 2023

rsc commented Jul 5, 2023

rsc commented Jul 12, 2023

gopherbot commented Dec 1, 2023

gopherbot commented Dec 4, 2023

rsc commented Jun 26, 2023 •

edited

Loading

rsc commented Jun 27, 2023 •

edited

Loading

felixge commented Jun 29, 2023 •

edited

Loading