-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus choice of bucket boundaries for http_request_duration_seconds #3196
Comments
Thanks for bringing this up! 👏 We've started discussing this internally, the current thinking is that ditching some higher granularity buckets in favour of adding a few smaller ones is probably the right move. You'd agree, I presume? 😃 |
Yes, I totally agree @srenatus |
I tried to turn my idea into codes and it looks like this #3214 Maybe this is what you guys are looking for |
luong-komorebi
added a commit
to luong-komorebi/opa
that referenced
this issue
Mar 3, 2021
This pull request ditches some higher granularity buckets in favour of adding a few smaller ones. The bucket that I chose was based on https://www.openpolicyagent.org/docs/latest/policy-performance/#high-performance-policy-decisions where the expectation is "policy evaluation has a budget on the order of 1 millisecond". Also, I tried to stay within Prometheus's default 10 buckets. This fixes open-policy-agent#3196 Signed-off-by: Luong Vo <vo.tran.thanh.luong@gmail.com>
tsandall
pushed a commit
that referenced
this issue
Mar 5, 2021
This pull request ditches some higher granularity buckets in favour of adding a few smaller ones. The bucket that I chose was based on https://www.openpolicyagent.org/docs/latest/policy-performance/#high-performance-policy-decisions where the expectation is "policy evaluation has a budget on the order of 1 millisecond". Also, I tried to stay within Prometheus's default 10 buckets. This fixes #3196 Signed-off-by: Luong Vo <vo.tran.thanh.luong@gmail.com>
This was referenced Mar 9, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, first of all, thanks for the OPA. Awesome work.
I am creating a Grafana dashboard for OPA. When visualizing http_request_duration_seconds, I face a problem when the data of average response time of all HTTP requests to opa doesnt match the percentile of duration of requests.
Here're some images showing the problem.
You can see that we have the same spike in two graphs, which is a good thing. On the avg request duration, I am able to find out the avg request duration down to microseconds.
However, the quantile stops at seconds
and the value doesnt change much overtime
which leads me to thinking that the we havent had the right bucket configuration for http_request_duration_seconds for our OPA.
From my view, I can see #1638 is where the work for http_request_duration_seconds is done. This is using the default bucket configuration that prometheus provides, but these buckets are for typical web application and in OPA's case the numbers are not granular enough.
Another case may happen that my queries are wrong, but these queries are pretty standard I dont think I messed up anything. Feel free to view the code at https://github.com/luong-komorebi/opa-grafana-dashboard
Steps to Reproduce the Problem
Additional Info
If possible and my assumption is right, I suggest we lower the number for the buckets.
The text was updated successfully, but these errors were encountered: