The H2 log algorithm for the poll time histogram is too granular for prometheus export #6960

drbrain · 2024-11-07T19:39:09Z

In #6896/#6897 the old log algorithm for recording poll time metrics was replaced by the H2 log algorithm. I have not been able to configure this histogram to produce a very limited number of buckets that also cover a reasonable range that I can export to prometheus.

With the legacy algorithm I had this configuration:

rt.enable_metrics_poll_count_histogram()
    .metrics_poll_count_histogram_scale(tokio::runtime::HistogramScale::Log)
    .metrics_poll_count_histogram_resolution(Duration::from_micros(10))
    .metrics_poll_count_histogram_buckets(7);

Which gave me these buckets when exported to prometheus:

0.000016384
0.000032768
0.000065536
0.000131072
0.000262144
0.000524288
0.002097152
0.004194304
+Inf

Distribution of poll times in these buckets was reasonable for the daily traffic pattern for our application.

With the new algorithm I'm unable to replicate the same range (~10µs–~4ms) with as few buckets. The best I've been able to configure is 17 buckets which is too much cardinality for prometheus when combined with the total number of prometheus labels full deployment of the application uses.

To fix this I'd like the legacy algorithm to be un-deprecated and renamed appropriately.

The alternatives are:

Use linear buckets. These will have a poor distribution for the poll times my application sees
Don't record poll count metrics to prometheus. Which means I won't be able to see this data

The text was updated successfully, but these errors were encountered:

drbrain · 2024-11-07T20:01:40Z

I suppose a third alternative would be to allow arbitrary buckets that are determined outside tokio

Darksonn · 2024-11-08T09:27:45Z

cc @hds @rcoh

rcoh · 2024-11-08T10:18:58Z

Rather than the label per bucket, you could re export the histogram by computing summaries on the client and exporting that.

That's how metric.rs treats histograms in any case.

I'll explore if there is a way to have the h2 histogram have an equivalent setting as the old one or if not, expose a non deprecated way to get the old behavior.

Based on feedback tokio-rs#6960, customers want more flexibility in histogram configuration. The main change here is allowing precision for an H2 Histgoram to go all the way down to zero—there is not fundamental reason this wasn't possible before, but typically H2 histograms stop at P=2. I've also added another optimization that truncates the number of buckets based on the requested max_value. This can save space by truncating in the middle of a bucket group. Along the way and in the process of adding more tests, I found a couple of bugs which are fixed here: 1. The `max_value` in LogHistogramBuilder wasn't precisely respected 2. The `max_value` computed by `LogHistogram` was actually incorrect for some n/p combinations. The new version precisely considers bucket layout instead.

wo401555661 · 2024-11-08T16:40:44Z

That is a smart idea

rcoh · 2024-11-09T14:29:53Z

I'm working on #6963 which, among other things, will enable setting p=0 — there is no inherent reason to restrict it to >=2. Here's what buckets will look like in that case:

bucket 0: 0ns..16.384µs (size: 16.384µs)
bucket 1: 16.384µs..32.768µs (size: 16.384µs)
bucket 2: 32.768µs..65.536µs (size: 32.768µs)
bucket 3: 65.536µs..131.072µs (size: 65.536µs)
bucket 4: 131.072µs..262.144µs (size: 131.072µs)
bucket 5: 262.144µs..524.288µs (size: 262.144µs)
bucket 6: 524.288µs..1.048576ms (size: 524.288µs)
bucket 7: 1.048576ms..2.097152ms (size: 1.048576ms)
bucket 8: 2.097152ms..4.194304ms (size: 2.097152ms)
bucket 9: 4.194304ms..18446744073.709551615s (size: 18446744073.705357311s)

drbrain added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. labels Nov 7, 2024

Darksonn added the M-metrics Module: tokio/runtime/metrics label Nov 8, 2024

rcoh mentioned this issue Nov 8, 2024

metrics: Improve flexibility of H2Histogram Configuration #6963

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The H2 log algorithm for the poll time histogram is too granular for prometheus export #6960

The H2 log algorithm for the poll time histogram is too granular for prometheus export #6960

drbrain commented Nov 7, 2024 •

edited

Loading

drbrain commented Nov 7, 2024

Darksonn commented Nov 8, 2024

rcoh commented Nov 8, 2024

wo401555661 commented Nov 8, 2024

rcoh commented Nov 9, 2024

The H2 log algorithm for the poll time histogram is too granular for prometheus export #6960

The H2 log algorithm for the poll time histogram is too granular for prometheus export #6960

Comments

drbrain commented Nov 7, 2024 • edited Loading

drbrain commented Nov 7, 2024

Darksonn commented Nov 8, 2024

rcoh commented Nov 8, 2024

wo401555661 commented Nov 8, 2024

rcoh commented Nov 9, 2024

drbrain commented Nov 7, 2024 •

edited

Loading