-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The H2 log algorithm for the poll time histogram is too granular for prometheus export #6960
Comments
I suppose a third alternative would be to allow arbitrary buckets that are determined outside tokio |
Rather than the label per bucket, you could re export the histogram by computing summaries on the client and exporting that. That's how metric.rs treats histograms in any case. I'll explore if there is a way to have the h2 histogram have an equivalent setting as the old one or if not, expose a non deprecated way to get the old behavior. |
Based on feedback tokio-rs#6960, customers want more flexibility in histogram configuration. The main change here is allowing precision for an H2 Histgoram to go all the way down to zero—there is not fundamental reason this wasn't possible before, but typically H2 histograms stop at P=2. I've also added another optimization that truncates the number of buckets based on the requested max_value. This can save space by truncating in the middle of a bucket group. Along the way and in the process of adding more tests, I found a couple of bugs which are fixed here: 1. The `max_value` in LogHistogramBuilder wasn't precisely respected 2. The `max_value` computed by `LogHistogram` was actually incorrect for some n/p combinations. The new version precisely considers bucket layout instead.
Based on feedback tokio-rs#6960, customers want more flexibility in histogram configuration. The main change here is allowing precision for an H2 Histgoram to go all the way down to zero—there is not fundamental reason this wasn't possible before, but typically H2 histograms stop at P=2. I've also added another optimization that truncates the number of buckets based on the requested max_value. This can save space by truncating in the middle of a bucket group. Along the way and in the process of adding more tests, I found a couple of bugs which are fixed here: 1. The `max_value` in LogHistogramBuilder wasn't precisely respected 2. The `max_value` computed by `LogHistogram` was actually incorrect for some n/p combinations. The new version precisely considers bucket layout instead.
Based on feedback tokio-rs#6960, customers want more flexibility in histogram configuration. The main change here is allowing precision for an H2 Histgoram to go all the way down to zero—there is not fundamental reason this wasn't possible before, but typically H2 histograms stop at P=2. I've also added another optimization that truncates the number of buckets based on the requested max_value. This can save space by truncating in the middle of a bucket group. Along the way and in the process of adding more tests, I found a couple of bugs which are fixed here: 1. The `max_value` in LogHistogramBuilder wasn't precisely respected 2. The `max_value` computed by `LogHistogram` was actually incorrect for some n/p combinations. The new version precisely considers bucket layout instead.
That is a smart idea |
I'm working on #6963 which, among other things, will enable setting bucket 0: 0ns..16.384µs (size: 16.384µs)
bucket 1: 16.384µs..32.768µs (size: 16.384µs)
bucket 2: 32.768µs..65.536µs (size: 32.768µs)
bucket 3: 65.536µs..131.072µs (size: 65.536µs)
bucket 4: 131.072µs..262.144µs (size: 131.072µs)
bucket 5: 262.144µs..524.288µs (size: 262.144µs)
bucket 6: 524.288µs..1.048576ms (size: 524.288µs)
bucket 7: 1.048576ms..2.097152ms (size: 1.048576ms)
bucket 8: 2.097152ms..4.194304ms (size: 2.097152ms)
bucket 9: 4.194304ms..18446744073.709551615s (size: 18446744073.705357311s) |
In #6896/#6897 the old log algorithm for recording poll time metrics was replaced by the H2 log algorithm. I have not been able to configure this histogram to produce a very limited number of buckets that also cover a reasonable range that I can export to prometheus.
With the legacy algorithm I had this configuration:
Which gave me these buckets when exported to prometheus:
Distribution of poll times in these buckets was reasonable for the daily traffic pattern for our application.
With the new algorithm I'm unable to replicate the same range (~10µs–~4ms) with as few buckets. The best I've been able to configure is 17 buckets which is too much cardinality for prometheus when combined with the total number of prometheus labels full deployment of the application uses.
To fix this I'd like the legacy algorithm to be un-deprecated and renamed appropriately.
The alternatives are:
The text was updated successfully, but these errors were encountered: