-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: change mutex profile to scale contention by number of blocked goroutines #61015
Comments
Change https://go.dev/cl/506415 mentions this issue: |
Elaborating on my final paragraph above, consider this simple example with 6 threads:
Ignoring sampling, in the Google C++ implementation, this sequence results in 10ms attributed to T0's unlock (because of T1's wait) as well as 10ms attributed to T1-T4's unlocks (because of T2-T5's waits), even though those threads barely held the lock at all. So the overall 50ms in the profile may be useful for getting a sense of the scaled total contention on a given mutex, but the associated stacks do a bad job of explaining why the contention happened. The actual thread that caused the delay only shows up as 20% of the profile. If all the unlocks have the same stack, no harm done, but if a lock is used in multiple places in a program, the profile misdirects debugging effort toward innocent code. As I remember it (vaguely), we did what we did in the Go mutex profiler - only counting the 10ms of contention during the T0 unlock - primarily to avoid pointing fingers at the innocent T1-T4 unlocks. While I think everyone agrees that assigning more contention to more highly contended mutexes is good, I think we also agree that pointing fingers in the wrong direction is not good. I spoke to Mike Burrows, the original author of the C++ mutex profiling. He confirmed that the C++ implementation was mainly based on what was easy at the time and did not consider these subtle cases. He and I agree that given Go's history, it would make sense to keep the highly accurate finger-pointing, even as we also scale up the amount of contention reported for highly contended locks. Patch set 2 of https://go.dev/cl/506415 is one way to do this. T0's unlock of M attributes the exact time T1 has spent waiting plus 4X the average of how long T1 and T5 have spent waiting, and then as before it avoids additional double-counting, by resetting the "start of waiting" time associated with T2 and T5. (Future unlocks will reset T3 and T4 before they are consulted.) So my amended proposal is to scale the contention up to match Google C++ / Abseil, but to do so while keeping the contentionz an accurate portrait of the stacks that caused the contention, keeping Go's improvement on C++ / Abseil in that regard. It's also worth noting that the proposed change would bring the contention profile in line with the runtime/metrics metric /sync/mutex/wait/total:seconds, which already counts multiple waiting goroutines multiple times. |
This proposal has been added to the active column of the proposals project |
Thank you for raising this proposal and the deep analysis. I was surprised to learn that the mutex profile behaves the way it does. I had incorrectly assumed that it operates more like your amended proposal 🙈. About the specifics of this proposal: The mutex profile consists of two sample types: Other than this detail, I think this is a great great and I'd love to see it implemented 👍 |
In the example in my second comment, today contentions/count is already counting 5 (one for each "waking" in the transcript). I think that's still the right number, not 5+4+3+2+1 = 15. |
Ah, I think I understand now. To double check: Can you confirm that Go will produce the following
If yes, then I fully support this proposal. |
@felixge, yes. |
Based on the discussion above, this proposal seems like a likely accept. |
No change in consensus, so accepted. 🎉 |
Change https://go.dev/cl/546475 mentions this issue: |
For #50891. For #61015. For #61422. Change-Id: I30d580814ac02fe9f3fbd1a101b2cc05947a9aaa Reviewed-on: https://go-review.googlesource.com/c/go/+/546475 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Change https://go.dev/cl/547057 mentions this issue: |
Amazingly, we seem to have nearly no in-tree documentation on the semantics of block and mutex profiles. Add brief summaries, including the new behavior from CL 506415 and CL 544195. For #14689. For #44920. For #57071. For #61015. Change-Id: I1a6edce7c434fcb43f17c83eb362b1f9d1a32df1 Reviewed-on: https://go-review.googlesource.com/c/go/+/547057 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Rhys Hiltner <rhys@justin.tv> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
For golang#50891. For golang#61015. For golang#61422. Change-Id: I30d580814ac02fe9f3fbd1a101b2cc05947a9aaa Reviewed-on: https://go-review.googlesource.com/c/go/+/546475 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Amazingly, we seem to have nearly no in-tree documentation on the semantics of block and mutex profiles. Add brief summaries, including the new behavior from CL 506415 and CL 544195. For golang#14689. For golang#44920. For golang#57071. For golang#61015. Change-Id: I1a6edce7c434fcb43f17c83eb362b1f9d1a32df1 Reviewed-on: https://go-review.googlesource.com/c/go/+/547057 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Rhys Hiltner <rhys@justin.tv> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
The pprof mutex profile was meant to match the Google C++ (now Abseil) mutex profiler, originally designed and implemented by Mike Burrows. When we worked on the Go version, pjw and I missed that C++ counts the time each thread is blocked, even if multiple threads are blocked on a mutex. That is, if 100 threads are blocked on the same mutex for the same 10ms, that still counts as 1000ms of contention in C++. In Go, to date, /debug/pprof/mutex has counted that as only 10ms of contention. If 100 goroutines are blocked on one mutex and only 1 goroutine is blocked on another mutex, we probably do want to see the first mutex as being more contended, so the Abseil approach is the more useful one.
I propose we change the pprof mutex profile in Go 1.22 to scale contention by the number of blocked goroutines, matching Abseil's more useful behavior.
It is worth noting that if you have 100 goroutines contending a mutex and what happens is one of them holds it for 10ms at each acquisition while the others immediately drop it, the round-robin mutex granting combined with charging the Unlock stack that unblocks a goroutine will mean that only 1 of each 100 samples will be attributed to the 10ms goroutine. The others will be attributed to the previous goroutine in the chain. It is important to interpret such profiles as providing useful information about which locks are contended but not always providing clear-cut evidence about the slow Unlock stack.(See my next comment below for a way to avoid this problem while still scaling reported mutex contention by number of blocked goroutines.)The text was updated successfully, but these errors were encountered: