-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Allow synapse_http_server_response_time_seconds
Grafana histogram quantiles to show values bigger than 10s
#13478
Changes from 2 commits
ecd5a0d
71a8c55
54b3676
477fad6
62f9d35
400688b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Fix the `synapse_http_server_response_time_seconds` metric not having buckets big enough for requests that take more than 10s. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,6 +43,28 @@ | |
"synapse_http_server_response_time_seconds", | ||
"sec", | ||
["method", "servlet", "tag", "code"], | ||
buckets=( | ||
0.005, | ||
0.01, | ||
0.025, | ||
0.05, | ||
0.075, | ||
0.1, | ||
0.25, | ||
0.5, | ||
0.75, | ||
1.0, | ||
2.5, | ||
5.0, | ||
7.5, | ||
MadLittleMods marked this conversation as resolved.
Show resolved
Hide resolved
|
||
10.0, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This section matches the default buckets: https://github.com/prometheus/client_python/blob/5a5261dd45d65914b5e3d8225b94d6e0578882f3/prometheus_client/metrics.py#L544 ( I chose the default as a base because that is what it was using before. Do we want to tune these or eliminate any to reduce cardinality? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I've just noticed this comment. Perhaps we could drop the 0.075, 0.75 and 7.5 metrics? Then the remaining ones would be separated by roughly factors of two. We'd still be growing the number of buckets by 2 in that case though. If we wanted to avoid growing the cardinality we'd have to pick 2 more to drop. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could drop |
||
30.0, | ||
60.0, | ||
120.0, | ||
180.0, | ||
200.0, | ||
MadLittleMods marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"+Inf", | ||
squahtx marked this conversation as resolved.
Show resolved
Hide resolved
|
||
), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I can create a separate PR to add a specific metric for But do we have any interest in adjusting the buckets for the general case? @erikjohnston mentioned if anything maybe wanting even more fidelity in the lower ranges. @richvdh do you have any interest in increasing for another endpoint? Our limiting factor is cardinality since this multiplies out to all of our servlets.
In terms of reducing cardinality, we could remove There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
) | ||
|
||
response_ru_utime = Counter( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to optimize the slow
/messages
requests, #13356, specifically those that take more than 10s.In order to track progress there, I'd like the metrics to capture them.