Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/trace/api: add an endpoint to enable block profiling #3429

Merged
merged 1 commit into from
May 9, 2019

Conversation

gbbr
Copy link
Contributor

@gbbr gbbr commented May 8, 2019

This change adds a new endpoint /debug/blockrate which automatically
calls runtime.SetBlockProfileRate(v) where v is an optional query
string defaulting to 10000 (1 sample for every 10μs blocked).

The block profile rate is reset to 0 once /debug/pprof/block is
called.

@gbbr gbbr requested a review from a team as a code owner May 8, 2019 08:40
@gbbr gbbr added this to the 6.12.0 milestone May 8, 2019
@gbbr gbbr force-pushed the gbbr/blockrate branch from e622c7e to 31089a6 Compare May 8, 2019 08:41
}
rate = n
}
runtime.SetBlockProfileRate(rate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of a performance impact does this actually have? Curious if it's negligible enough to just have enabled at all times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. It seems nobody does. Neither the internet, nor our friends at Datadog. If you look at the standard library, it's clear that an extra code path is triggered which records blocking events. I'd say we keep it as a prevention. On top of that, it also gives us the freedom to fine-tune its value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the block profiler is poorly documented and there is no mention about its impact on production environments.

All I can say is that you probably don't want/need a default value of 1. That means that every blocking event will be accounted for. As far as I understand from the documentation and the comments in the aforementioned issue, If you set it to, let's say, 100, you will sample one blocking event for every 100ns spent blocked, which I assume will have a lower overhead and will still be useful enough to track things like mutexes with high contention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case it might be reasonable to have it enabled at all times with a higher default (say 1000 or even much more)

Copy link
Contributor Author

@gbbr gbbr May 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I've done further digging around, and discovered that this function gets called every time a block profile entry is recorded. That function in turn generates a traceback on each call, which we know from previous findings that it can be costly when called repeatedly. Quite a lot of things are happening there.

For this reason, I updated the PR to default to the value 10000 (1 sample every 10 microseconds) and I'll suggest keeping this endpoint and not having it on by default.

This change adds a new endpoint `/debug/blockrate` which automatically
calls `runtime.SetBlockProfileRate(v)` where `v` is an optional query
string defaulting to 10000 (1 sample for every 10μs blocked).

The block profile rate is reset to 0 once `/debug/pprof/block` is
called.
@gbbr gbbr force-pushed the gbbr/blockrate branch from 31089a6 to d1f0028 Compare May 9, 2019 08:17
@gbbr gbbr requested a review from djova May 9, 2019 08:22
@gbbr gbbr merged commit 2d480a2 into master May 9, 2019
@gbbr gbbr deleted the gbbr/blockrate branch May 9, 2019 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants