Skip to content

proposal: runtime/pprof: add goroutine stack memory usage profile #66566

Open
@felixge

Description

@felixge

Proposal Details

Summary

I'm proposing to implement a new profile type that allows to break down goroutine stack space usage by frame.

  • No new API. Added as a new sample type to the goroutine profile
  • The value for each stack trace is the sum of space for the leaf frame
  • Free stack space is indicated via a virtual runtime._FreeStack leaf node
  • The grand total should be equal (or close) to /memory/classes/heap/stacks:bytes
  • A rough prototype CL is available here: https://go-review.googlesource.com/c/go/+/574795 (200 LoC excluding tests)

Given the above, perhaps this is small and simple enough to skip the official proposal process. But since the design includes a few opinionated choices, it's probably best to have some discussions and consensus upfront.

Motivation

My main motivation for this came from debugging a case of dramatic stack space growth while deploying PGO to production (#65532) which I was able to root cause using a hacky stack space profiler that I implemented in userland (repo).

Additionally I imagine this profile type will be useful for other scenarios, e.g. high cpu usage in morestack (#18138).

Implementation

I'm proposing to implement this profile type by taking each stack trace in the goroutine profile and looking up it's frame size (❶ shows this for a single stack trace). Then each stack trace is broken into one stack trace per prefix (from the root), and these stack traces are assigned the frame size of their leaf frames as values (❷). This will produce a flame graph where the "self value" of each frame corresponds to its frame size, and its total value corresponds to its frame size plus the frame sizes of its children (❸).

image

These values are then multiplied by the number of goroutines that were captured for the given stack trace, resulting in the sum of stack space usage.

Last but not least, a runtime._FreeStack leaf node is added to capture the delta between the stack space used by frames, and the total size of the stack allocated for the goroutine. Additionally a root-level runtime._FreeStack is used to show the amount of memory reserved for goroutine stacks that is currently not in use. These virtual frames are motivated by producing a profile that adds up to /memory/classes/heap/stacks:bytes as well as giving the user the ability to reason about potential morestack issues.

Prototype

I have uploaded a rough CL for a prototype here: https://go-review.googlesource.com/c/go/+/574795 (200 LoC excluding tests).

Using this prototype we can look at a real stack profile for a program with the following goroutines:

  • 1 goroutine with a oneThousand byte frame
  • 1 goroutine with a twoThousand byte frame
  • 2 goroutines with a threeThousand byte frame
Code Snippet
func launchGoroutinesWithKnownStacks() func() {
	c1 := make(chan struct{})
	c2 := make(chan struct{})
	c3 := make(chan struct{})
	c4 := make(chan struct{})

	go oneThousand(c1)
	go twoThousand(c2)
	go threeThousand(c3)
	go threeThousand(c4)
	<-c1
	<-c2
	<-c3
	<-c4
	// hacky way to ensure all goroutines reach the same <-ch statement
	// TODO(fg) make caller retry in the rare case this could go wrong
	time.Sleep(10 * time.Millisecond)
	return func() {
		c1 <- struct{}{}
		c2 <- struct{}{}
		c3 <- struct{}{}
		c4 <- struct{}{}
	}
}

//go:noinline
func oneThousand(ch chan struct{}) [1000]byte {
	var a [1000]byte
	ch <- struct{}{}
	<-ch
	return a
}

//go:noinline
func twoThousand(ch chan struct{}) [2000]byte {
	var a [2000]byte
	ch <- struct{}{}
	<-ch
	return a
}

//go:noinline
func threeThousand(ch chan struct{}) [3000]byte {
	var a [3000]byte
	ch <- struct{}{}
	<-ch
	return a
}

2024-03-27 pprof test stack_space at 20 27 34@2x

Note: The prototype doesn't implement the proposed root-level runtime._FreeStack frame yet.

Performance

I not measure this yet, but I suspect all of this can be done with negligible impact on the overhead of the goroutine profile.

Next Steps

Please let me know what you think. cc @prattmic @mknyszek @nsrip-dd @rhysh (this was previously discussed in a recent runtime diagnostics sync, see notes).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Incoming

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions