Provide some guidance for coming up with default histogram buckets for various metrics #316

trask · 2023-09-12T19:46:08Z

Coming up with default histogram buckets for various metrics can be challenging, as we've seen in #274.

I'd like to propose some guidance for coming up with default histogram buckets which we can lean on in each time the issue of defining default histogram buckets comes up.

Proposal:

If values align with service timings, use our default buckets, translated from millis to seconds, [ 0, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10 ] (dropping the zero bucket?)
Otherwise determine roughly the smallest and largest buckets that you care about, e.g. <0.01 seconds and >10 seconds, and use an exponential range in between (consider using base 2 if you need higher granularity and base 10 if you need lower granularity).

These would only be recommendations, and so if there is compelling reason to come up with a completely custom set of buckets for a particular metric that would always be ok.

The text was updated successfully, but these errors were encountered:

trask · 2023-09-12T22:24:25Z

another option could be to follow the pattern [ .1, .25, .5, .75, 1] in between the low and high buckets, e.g.

[ 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10 ]

and another similar option with less buckets:

[ 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10 ]

or even fewer:

[ 0.01, 0.05, 0.1, 0.5, 1, 5, 10 ]

jack-berg · 2023-10-27T21:25:35Z

Generalizing your last comment into an algorithm we could encode into the spec might look like:

Start with base 10 "outer" bucket boundaries.
How many and which base 10 orders of magnitude do measurements typically span? E.g. two orders of magnitude (three buckets) starting at 1 would be [1, 10], or [10, 100]if starting at ten. Three order of magnitude (four buckets) starting at 1 would be[1, 10, 100]or[10, 100, 1_000]` if starting at ten.
Decide how many inner buckets you need to get a decent distribution useful to most users most of the time. For each outer bucket boundary (skipping the first), the even bucket boundaries are distributed evenly between [0, out_bucket_bound]. Maybe we constrain it to multiples of two so the buckets boundaries don't fall on numbers with infinitely repeating decimals. For example:
- One inner bucket boundary with two outer bucket orders of magnitude starting at 1 yields: [1, 5, 10]
- One inner bucket boundary with three outer bucket order of magnitude starting at 1 yields: [1, 5, 10, 50, 100]
- Three inner bucket boundaries with two outer bucket orders of magnitude starting at 1 yields: [1, 2.5, 5, 7.5, 10]
- Three inner bucket boundaries with three outer bucket orders of magnitude starting at 1 yields: [1, 2.5, 5, 7.5, 10, 25, 50, 75, 100]
The number of buckets becomes equal to orders_of_magnitude * (inner_bucket_boundaries + 1)

The advantage of laying down some convention like this is while there are still domain specific decisions to debate like "how many orders of magnitude is typical?" and "how many inner buckets yield a useful distribution for most users?", it does narrow the solution space quite a bit. It also produces bucket boundaries which are nice even numbers with an intuitive explanation.

trask added this to Spec: JVM runtime metric stability Sep 12, 2023

github-actions bot assigned arminru Sep 12, 2023

trask mentioned this issue Sep 12, 2023

Update jvm.gc.duration histogram buckets to [ 0.01, 0.1, 1, 10 ] #317

Merged

3 tasks

trask mentioned this issue Sep 21, 2023

Recommended histogram bucket sizes for HTTP connection duration #336

Open

trask removed this from Spec: JVM runtime metric stability Nov 13, 2023

github-actions bot added the Stale label Feb 14, 2024

joaopgrassi added Stale and removed Stale labels Feb 14, 2024

trask mentioned this issue Apr 5, 2024

Add db.client.operation.duration metric #735

Merged

2 tasks

This was referenced May 8, 2024

Add experimental go runtime metrics semantic conventions #981

Merged

Add node.js runtime metrics semantic conventions #991

Merged

trask mentioned this issue Oct 20, 2024

Cosmos DB: Operation Level Metrics #1438

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide some guidance for coming up with default histogram buckets for various metrics #316

Provide some guidance for coming up with default histogram buckets for various metrics #316

trask commented Sep 12, 2023

trask commented Sep 12, 2023

jack-berg commented Oct 27, 2023

Provide some guidance for coming up with default histogram buckets for various metrics #316

Provide some guidance for coming up with default histogram buckets for various metrics #316

Comments

trask commented Sep 12, 2023

trask commented Sep 12, 2023

jack-berg commented Oct 27, 2023