Description
Proposal Details
Summary
I'm proposing to implement a new profile type that allows to break down goroutine stack space usage by frame.
- No new API. Added as a new sample type to the
goroutine
profile - The value for each stack trace is the sum of space for the leaf frame
- Free stack space is indicated via a virtual
runtime._FreeStack
leaf node - The grand total should be equal (or close) to
/memory/classes/heap/stacks:bytes
- A rough prototype CL is available here: https://go-review.googlesource.com/c/go/+/574795 (200 LoC excluding tests)
Given the above, perhaps this is small and simple enough to skip the official proposal process. But since the design includes a few opinionated choices, it's probably best to have some discussions and consensus upfront.
Motivation
My main motivation for this came from debugging a case of dramatic stack space growth while deploying PGO to production (#65532) which I was able to root cause using a hacky stack space profiler that I implemented in userland (repo).
Additionally I imagine this profile type will be useful for other scenarios, e.g. high cpu usage in morestack
(#18138).
Implementation
I'm proposing to implement this profile type by taking each stack trace in the goroutine profile and looking up it's frame size (❶ shows this for a single stack trace). Then each stack trace is broken into one stack trace per prefix (from the root), and these stack traces are assigned the frame size of their leaf frames as values (❷). This will produce a flame graph where the "self value" of each frame corresponds to its frame size, and its total value corresponds to its frame size plus the frame sizes of its children (❸).
These values are then multiplied by the number of goroutines that were captured for the given stack trace, resulting in the sum of stack space usage.
Last but not least, a runtime._FreeStack
leaf node is added to capture the delta between the stack space used by frames, and the total size of the stack allocated for the goroutine. Additionally a root-level runtime._FreeStack
is used to show the amount of memory reserved for goroutine stacks that is currently not in use. These virtual frames are motivated by producing a profile that adds up to /memory/classes/heap/stacks:bytes
as well as giving the user the ability to reason about potential morestack
issues.
Prototype
I have uploaded a rough CL for a prototype here: https://go-review.googlesource.com/c/go/+/574795 (200 LoC excluding tests).
Using this prototype we can look at a real stack profile for a program with the following goroutines:
- 1 goroutine with a oneThousand byte frame
- 1 goroutine with a twoThousand byte frame
- 2 goroutines with a threeThousand byte frame
Code Snippet
func launchGoroutinesWithKnownStacks() func() {
c1 := make(chan struct{})
c2 := make(chan struct{})
c3 := make(chan struct{})
c4 := make(chan struct{})
go oneThousand(c1)
go twoThousand(c2)
go threeThousand(c3)
go threeThousand(c4)
<-c1
<-c2
<-c3
<-c4
// hacky way to ensure all goroutines reach the same <-ch statement
// TODO(fg) make caller retry in the rare case this could go wrong
time.Sleep(10 * time.Millisecond)
return func() {
c1 <- struct{}{}
c2 <- struct{}{}
c3 <- struct{}{}
c4 <- struct{}{}
}
}
//go:noinline
func oneThousand(ch chan struct{}) [1000]byte {
var a [1000]byte
ch <- struct{}{}
<-ch
return a
}
//go:noinline
func twoThousand(ch chan struct{}) [2000]byte {
var a [2000]byte
ch <- struct{}{}
<-ch
return a
}
//go:noinline
func threeThousand(ch chan struct{}) [3000]byte {
var a [3000]byte
ch <- struct{}{}
<-ch
return a
}
Note: The prototype doesn't implement the proposed root-level runtime._FreeStack
frame yet.
Performance
I not measure this yet, but I suspect all of this can be done with negligible impact on the overhead of the goroutine profile.
Next Steps
Please let me know what you think. cc @prattmic @mknyszek @nsrip-dd @rhysh (this was previously discussed in a recent runtime diagnostics sync, see notes).
Metadata
Metadata
Assignees
Type
Projects
Status