-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/trace/api: add an endpoint to enable block profiling #3429
Conversation
} | ||
rate = n | ||
} | ||
runtime.SetBlockProfileRate(rate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much of a performance impact does this actually have? Curious if it's negligible enough to just have enabled at all times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. It seems nobody does. Neither the internet, nor our friends at Datadog. If you look at the standard library, it's clear that an extra code path is triggered which records blocking events. I'd say we keep it as a prevention. On top of that, it also gives us the freedom to fine-tune its value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the block profiler is poorly documented and there is no mention about its impact on production environments.
All I can say is that you probably don't want/need a default value of 1
. That means that every blocking event will be accounted for. As far as I understand from the documentation and the comments in the aforementioned issue, If you set it to, let's say, 100
, you will sample one blocking event for every 100ns
spent blocked, which I assume will have a lower overhead and will still be useful enough to track things like mutexes with high contention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case it might be reasonable to have it enabled at all times with a higher default (say 1000 or even much more)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I've done further digging around, and discovered that this function gets called every time a block profile entry is recorded. That function in turn generates a traceback on each call, which we know from previous findings that it can be costly when called repeatedly. Quite a lot of things are happening there.
For this reason, I updated the PR to default to the value 10000 (1 sample every 10 microseconds) and I'll suggest keeping this endpoint and not having it on by default.
This change adds a new endpoint `/debug/blockrate` which automatically calls `runtime.SetBlockProfileRate(v)` where `v` is an optional query string defaulting to 10000 (1 sample for every 10μs blocked). The block profile rate is reset to 0 once `/debug/pprof/block` is called.
This change adds a new endpoint
/debug/blockrate
which automaticallycalls
runtime.SetBlockProfileRate(v)
wherev
is an optional querystring defaulting to 10000 (1 sample for every 10μs blocked).
The block profile rate is reset to 0 once
/debug/pprof/block
iscalled.