You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
runtime/pprof: add debug=4 goroutine profile with labels and reduced STW
This adds two new goroutine profile modes, debug=3 and debug=4, that
emit in the same output format as debug=0 and debug=1, but contain one
entry per goroutine along with extended per-goroutine information such
as goroutine ID, creator ID, state, wait minutes, and creation location.
Previously the debug=2 mode was the only way to get this per-goroutine
information, however this mode has a significantly different underlying
implementation that mean it a) requires a pontentially lengthy STW pause,
and B) does not include labels.
These new modes make the level of detailed, per-goroutine informaiton
previously only available in debug=2 available in a new format that
also includes labels and does not incur the same duration of stop-the-world
pause during collection.
The difference in latency observed by running goroutines is demonstrated
by the included benchmark:
│ debug=2 │ debug=4
│ max_latency_ns │ max_latency_ns vs base
goroutines=100x3-14 1013.17k ± 47% 84.06k ± 27% -91.70% (p=0.002 n=6)
goroutines=100x10-14 769.23k ± 7% 80.29k ± 22% -89.56% (p=0.002 n=6)
goroutines=100x50-14 2172.4k ± 9% 181.8k ± 46% -91.63% (p=0.002 n=6)
goroutines=1000x3-14 7133.9k ± 3% 195.7k ± 42% -97.26% (p=0.002 n=6)
goroutines=1000x10-14 11787.6k ± 48% 494.4k ± 77% -95.81% (p=0.002 n=6)
goroutines=1000x50-14 20234.0k ± 87% 174.8k ± 137% -99.14% (p=0.002 n=6)
goroutines=10000x3-14 68611.0k ± 49% 168.5k ± 2768% -99.75% (p=0.002 n=6)
goroutines=10000x10-14 60.261M ± 95% 3.460M ± 166% -94.26% (p=0.002 n=6)
goroutines=10000x50-14 284.144M ± 40% 4.672M ± 89% -98.36% (p=0.002 n=6)
goroutines=25000x3-14 171.290M ± 48% 4.287M ± 394% -97.50% (p=0.002 n=6)
goroutines=25000x10-14 150.827M ± 92% 6.424M ± 158% -95.74% (p=0.002 n=6)
goroutines=25000x50-14 708.238M ± 34% 2.249M ± 410% -99.68% (p=0.002 n=6)
geomean 25.08M 624.2k -97.51%
The new debug=4 format is added a new format rather than altering
debug=2 in-place to use the lower latency collection method as the
concurrent collection method would require some behavior changes to
debug=2 that are likely to significant to be made in place, chriefly
that debug=2 includes argument values in its output, which are not
collected by the concurrent collection method, and are not included in
any other format, and that adding labels to the debug=2 format could
break existing parsers.
0 commit comments