fix: histogram upper bound is too low #174
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(i believe this closes #151)
i was seeing panics happening when trying to print stats while load-testing a relatively inefficient, but not THAT inefficient site
did a bit of digging and realized the upper bound of the histogram implied an upper limit of 3.6s or 3600ms, which is obviously trivial to exceed when hitting any kind of relatively expensive API
it looks like the histogram initialization was just copy-pasta'd from the example in the docs:
however, in drill's case, it's actually measuring in microseconds, not milliseconds (and then divided down in all the stats, as evidenced by the fact that the request duration (already in ms) is being multiplied by 1000 before appending to the histogram:
as such I made two changes:
if you're against the
MAX_TIMEOUT_SECONDS
constant that I introduced, I'm ok w dropping it but in that case, we either continue to risk panics or we have to find a smarter way to derive the histogram's upper bound. we could base it on the actual timeout option, but the config is initialized inside ofbenchmark::execute
& not returned back out, so it would require a more structural change in order to achieve this which I wanted to avoid in this PRdo let me know if there's a more preferrable solution to the issue, but hopefully this is at least a quick and dirty solution so that 3.6s+ requests don't cause drill to panic :3