Prometheus choice of bucket boundaries for http_request_duration_seconds #3196

luong-komorebi · 2021-02-25T10:25:29Z

Hi, first of all, thanks for the OPA. Awesome work.

I am creating a Grafana dashboard for OPA. When visualizing http_request_duration_seconds, I face a problem when the data of average response time of all HTTP requests to opa doesnt match the percentile of duration of requests.

Here're some images showing the problem.
You can see that we have the same spike in two graphs, which is a good thing. On the avg request duration, I am able to find out the avg request duration down to microseconds.

However, the quantile stops at seconds

and the value doesnt change much overtime

which leads me to thinking that the we havent had the right bucket configuration for http_request_duration_seconds for our OPA.

From my view, I can see #1638 is where the work for http_request_duration_seconds is done. This is using the default bucket configuration that prometheus provides, but these buckets are for typical web application and in OPA's case the numbers are not granular enough.

Another case may happen that my queries are wrong, but these queries are pretty standard I dont think I messed up anything. Feel free to view the code at https://github.com/luong-komorebi/opa-grafana-dashboard

Steps to Reproduce the Problem

OPA version: 0.26.0
Have OPA, prometheus configured and running
Make some http call to OPA
If you have grafana up and running, install my grafana dashboard from Github or Grafana.com
If you dont, go to prometheus and query average request duration as well as, for example, 50th percentile for http_request_duration_seconds in 5 min interval
Check the query for average request duration and see why they differ so much from the http request duration quantiles.

Additional Info

If possible and my assumption is right, I suggest we lower the number for the buckets.

srenatus · 2021-03-02T12:59:30Z

Thanks for bringing this up! 👏 We've started discussing this internally, the current thinking is that ditching some higher granularity buckets in favour of adding a few smaller ones is probably the right move. You'd agree, I presume? 😃

luong-komorebi · 2021-03-03T10:53:15Z

Yes, I totally agree @srenatus

luong-komorebi · 2021-03-03T11:01:18Z

I tried to turn my idea into codes and it looks like this #3214 Maybe this is what you guys are looking for

This pull request ditches some higher granularity buckets in favour of adding a few smaller ones. The bucket that I chose was based on https://www.openpolicyagent.org/docs/latest/policy-performance/#high-performance-policy-decisions where the expectation is "policy evaluation has a budget on the order of 1 millisecond". Also, I tried to stay within Prometheus's default 10 buckets. This fixes open-policy-agent#3196 Signed-off-by: Luong Vo <vo.tran.thanh.luong@gmail.com>

This pull request ditches some higher granularity buckets in favour of adding a few smaller ones. The bucket that I chose was based on https://www.openpolicyagent.org/docs/latest/policy-performance/#high-performance-policy-decisions where the expectation is "policy evaluation has a budget on the order of 1 millisecond". Also, I tried to stay within Prometheus's default 10 buckets. This fixes #3196 Signed-off-by: Luong Vo <vo.tran.thanh.luong@gmail.com>

tsandall added the enhancement label Mar 2, 2021

luong-komorebi mentioned this issue Mar 3, 2021

Internal/prometheus: Add smaller buckets #3214

Merged

tsandall closed this as completed in #3214 Mar 5, 2021

ashutosh-narkar added this to Open Policy Agent Aug 5, 2024

ashutosh-narkar moved this to Done in Open Policy Agent Aug 5, 2024

catstail1 mentioned this issue Aug 21, 2024

Support customizing bucket boundaries for status metrics bundle_loading_duration_ns #6950

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus choice of bucket boundaries for http_request_duration_seconds #3196

Prometheus choice of bucket boundaries for http_request_duration_seconds #3196

luong-komorebi commented Feb 25, 2021

srenatus commented Mar 2, 2021

luong-komorebi commented Mar 3, 2021

luong-komorebi commented Mar 3, 2021 •

edited

Loading

Prometheus choice of bucket boundaries for http_request_duration_seconds #3196

Prometheus choice of bucket boundaries for http_request_duration_seconds #3196

Comments

luong-komorebi commented Feb 25, 2021

Steps to Reproduce the Problem

Additional Info

srenatus commented Mar 2, 2021

luong-komorebi commented Mar 3, 2021

luong-komorebi commented Mar 3, 2021 • edited Loading

luong-komorebi commented Mar 3, 2021 •

edited

Loading