-
Notifications
You must be signed in to change notification settings - Fork 1.6k
metrics: Increase the resolution of histogram metrics #7335
metrics: Increase the resolution of histogram metrics #7335
Conversation
These metrics are using the default histogram buckets: ``` pub const DEFAULT_BUCKETS: &[f64; 11] = &[ 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, ]; ``` Which give us a resolution of 5ms, that's good, but there are some subsystems where we process hundreds or even a few thousands of messages per second like approval-voting or approval-distribution, so it makes sense to increse the resoution of the bucket to better understand if the procesisng is in the range of useconds. The new bucket ranges will be: ``` [0.0001, 0.0004, 0.0016, 0.0064, 0.0256, 0.1024, 0.4096, 1.6384, 6.5536] ``` Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alexggh , this is indeed a good idea. Also it might also be worthy, but up to you, to have 2 additional buckets between the last 2: [0.0001, 0.0004, 0.0016, 0.0064, 0.0256, 0.1024, 0.4096, 1.6384, <here>, 6.5536,]
, just to give more granular info for any very high timings.
yes, makes sense, I won't be able to use the Done! |
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
3b754ce
to
9efcf3d
Compare
Unfortunately, I ran this PR in versi on subset of nodes and it seems that grafana get's confused on situations where some nodes output a range of buckets and the other another different range: The graphic for a 6h period for versi would look like some data is missing, e.g: The data is there, and the graphic is shown ok if you select a time period when only just a flavor of the buckets have been used, but it will affect the dashboards for the transition period. @sandreim any idea if this is something to be concerned with ? |
This happens when you aggregate metrics from nodes with different bucket configurations. I expect this to be fine. You should try to query the subset of nodes running your changes and select only the time period when only that bucket configuration is present. That should work. |
Yes, that's exactly what is happening .
Yes, that works. |
bot merge |
Error: Statuses failed for 9efcf3d |
bot merge |
These metrics are using the default histogram buckets:
Which give us a resolution of 5ms, that's good, but there are some subsystems where we process hundreds or even a few thousands of messages per second like approval-voting or approval-distribution, so it makes sense to increse the resoution of the bucket to better understand if the procesisng is in the range of useconds.
The new bucket ranges will be: