Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: runtime/pprof: document mutex profiler stack traces #44920

Open
prattmic opened this issue Mar 10, 2021 · 8 comments
Open

doc: runtime/pprof: document mutex profiler stack traces #44920

prattmic opened this issue Mar 10, 2021 · 8 comments
Labels
Documentation Issues describing a change to documentation. help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@prattmic
Copy link
Member

The mutex profiler records contention on the Unlock path of locks, so the stack traces in the profiles all appear as Unlock calls rather than the more intuitively expected Lock call [1].

This can be a source of confusion. Here's a Stack Overflow question, and I personally fielded a question about this today.

Our own runtime/pprof docs say nothing about this (in fact, the complete documentation we have for mutex profiles is "mutex - stack traces of holders of contended mutexes"). This should certainly be expanded.

There is a bit of documentation elsewhere online. e.g., @felixge's block profile documentation mentions this property.

Related to #14689.

[1] This also means it isn't possible to directly tell which Lock calls are hottest. In practice I doubt this is an issue as typical mutex use has a unique Unlock call for each unique Lock call. If this were a problem, we could move the recorded stack trace to Lock here (at the expensive of requiring a slot in the sudog to store the stack until we can record the event in Unlock).

@prattmic prattmic added Documentation Issues describing a change to documentation. help wanted NeedsFix The path to resolution is known, but the work has not been done. labels Mar 10, 2021
@prattmic prattmic added this to the Backlog milestone Mar 10, 2021
@prattmic
Copy link
Member Author

cc @mknyszek

@felixge
Copy link
Contributor

felixge commented Mar 10, 2021

@prattmic +100. I think it's fair to say that all of the builtin profilers could use some additional documentation about the kind of data they produce, the sampling that's involved, etc.

I'd be happy to help with patches for this, but I'm currently still a bit unsure about the best place to document this. A lot of the profilers have various APIs and knobs exposed in different places, so there isn't always a natural place to put this info. Perhaps the Diagnostics doc would be the right place? I'm thinking each profiler could use a section there, and then all the API docs could be updated to point people in the right direction?

@prattmic
Copy link
Member Author

The inverse of footnote [1] above is that recording in Unlock potentially shows which lock holders are most problematic (i.e., they hold the lock for too long). This could be a very useful property. e.g., consider a fast "getter" method that holds the lock only long enough to read a single field vs an update method that holds the lock for a very long time. You may have a very high rate of access to the "getters" without causing significant contention, but a low rate of calls to "update" will block all the "getters". IMO, in this case it is likely more useful to see results on the Unlock in "update" rather than on Lock on the "getters".

@felixge
Copy link
Contributor

felixge commented Mar 10, 2021

@prattmic also great points about the tradeoffs between Unlock vs Lock. I found reporting Unlock very unnatural and confusing, but you make a great point about it being potentially better.

@prattmic
Copy link
Member Author

prattmic commented Mar 10, 2021

@felixge Great question about the best to put this kind of documentation since there are several relevant packages. I was initially thinking runtime/pprof would be most natural, but then I didn't even realize that the Diagnostics page existed (discoverability problem, anyone? 😄).

The main possibilities I see are:

  • Package runtime docs: contains some knobs for profiling, but isn't the primary interface, so I don't think this is great.
  • Package runtime/pprof docs: generally the primary interface for profiling.
  • Website Diagnostics page (or other new golang.org page): This feels a bit more suited to long-form exposition on how to interpret, visualize the profiles, etc, but isn't quite as discoverable.

I'd probably lean slightly in favor of golang.org right now. Wherever this ends up, I think we should:

  1. Have a full section about each profile type (at least a paragraph!). No more single sentence being the complete documentation.
  2. Link to the definitive / expanded documentation from the other package doc locations.

Others like @bcmills and @jayconrod may have more thoughts on the best location for this kind of documentation?

@jayconrod
Copy link
Contributor

I think it makes sense to put this somewhere under https://golang.org/doc/ and link to it from that Diagnostics page and from the runtime/pprof package documentation. It can be hosted in https://github.com/golang/website/tree/master/_content/doc.

@hyangah
Copy link
Contributor

hyangah commented Mar 11, 2021

cc @pjweinb

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/547057 mentions this issue: runtime/pprof: document block and mutex profiles

gopherbot pushed a commit that referenced this issue Dec 6, 2023
Amazingly, we seem to have nearly no in-tree documentation on the
semantics of block and mutex profiles. Add brief summaries, including
the new behavior from CL 506415 and CL 544195.

For #14689.
For #44920.
For #57071.
For #61015.

Change-Id: I1a6edce7c434fcb43f17c83eb362b1f9d1a32df1
Reviewed-on: https://go-review.googlesource.com/c/go/+/547057
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
Amazingly, we seem to have nearly no in-tree documentation on the
semantics of block and mutex profiles. Add brief summaries, including
the new behavior from CL 506415 and CL 544195.

For golang#14689.
For golang#44920.
For golang#57071.
For golang#61015.

Change-Id: I1a6edce7c434fcb43f17c83eb362b1f9d1a32df1
Reviewed-on: https://go-review.googlesource.com/c/go/+/547057
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Issues describing a change to documentation. help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants