-
Notifications
You must be signed in to change notification settings - Fork 92
monitoring: use histograms for api latency and cycle time metrics #164
Conversation
I don't quite understand how the histograms are implemented in this visualization. Typically I see histograms like this: http://docs.grafana.org/features/panels/heatmap/#histograms-and-buckets What does the |
This visualization shows the 50, 75, 90 and max percentiles for each discoverer. It was confusing for me at first, but after reading a bit I now understand it better. The way I read the graphs are: "50% (or 75%, or 90%) of the requests are taking less than X milliseconds". If we have multiple discoverers, then we would indeed have 4 more data series on the graph. I'll let @rosskukulinski weigh in as well as this is somewhat new for me still. |
@alexbrand is correct about what's being visualized That said, @stevesloka - you're right that these are not actually histogram visualizations like the ones you linked to. This is partly because it's only in the latest release of Grafana (#148) is there a proper Prometheus histograph panel. It's also a common pattern to display the latency percentiles (especially @alexbrand recommend:
|
What do the @rosskukulinski Should we update the Grafana version and use the proper histogram panel? |
100% is basically Inf -- all values are less than this number. Its the same thing as maximum latency value recorded.
I do think we should update Grafana, but I do not think it's necessary for this PR. FWIW, I prefer line-graph representation of latencies because you can display many different lines. Doing that with true histograms is more complex and confusing to understand. |
@rosskukulinski @stevesloka If we remove the 100th percentile we would most likely be oblivious to the fact that some requests are taking a very long time. For example, in the attached screenshot, we can see that the 90th percentile of the Load Balancers Endpoint is below 20 seconds, but the 100th percentile goes up to 50 secs. What are your thoughts on showing 50th and 99th percentiles? |
+1 for 50th and 99th |
I don't have an OS system up to test with, but latest attached image LGTM. |
LGTM, @alexbrand could you rebase to update? |
Signed-off-by: Alexander Brand <alexbrand09@gmail.com>
Signed-off-by: Alexander Brand <alexbrand09@gmail.com>
Rebased. Thanks @stevesloka! |
Fixes #163
I suspect we'll have to make the bucket sizes configurable, but I think we can follow up on that when time comes.