-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add per-user query metrics for series and bytes returned #4343
Add per-user query metrics for series and bytes returned #4343
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some concerns doing it a per-user basis. @tomwilkie just yesterday commented here #4259 (comment) on the fact that we have to pay attention each time we add a per-user histogram, cause they easily explode cardinality.
Understood. The approach here (to emit the metrics from the query-frontend) was taken based on a suggestion from @pstibrany to reduce the number of series (since there are usually only a few query frontends). I can definitely reduce the number of buckets of the histograms. |
Based on @tomwilkie 's comment it seems like what I should do is:
|
Yes please Nick! I'd probably go with logs in this case, but a summary would also be fine with 1 or 2 quantiles. |
Add stats included in query responses from the querier and distributor for measuring the number of series and bytes included in successful queries. These stats are emitted per-user as summaries from the query frontends. These stats are picked to add visibility into the same resources limited as part of #4179 and #4216. Fixes #4259 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job @56quarters! I left few comments about naming to try to make it a bit more clear. A part from this LGTM (and no concern about cardinality since you moved away from histograms). Thanks for addressing my initial feedback! 🙏
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (modulo a couple of final nits). Thanks!
Co-authored-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
I think it would be useful as well but I'm not going to make the change here since I think it would involve a fairly large refactoring of the code. Nothing preventing us from doing this as a follow-up though. |
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
…ct#4343) * Add per-user query metrics for series and bytes returned Add stats included in query responses from the querier and distributor for measuring the number of series and bytes included in successful queries. These stats are emitted per-user as summaries from the query frontends. These stats are picked to add visibility into the same resources limited as part of cortexproject#4179 and cortexproject#4216. Fixes cortexproject#4259 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Formatting fix Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Fix changelog to match actual changes Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Typo Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Code review changes, rename things for clarity Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Apply suggestions from code review Co-authored-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Code review changes, remove superfluous summaries Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>
What this PR does:
Add stats included in query responses from the querier and distributor
for measuring the number of series and bytes included in successful
queries. These stats are emitted per-user as counters from the query
frontends.
These stats are picked to add visibility into the same resources limited
as part of #4179 and #4216.
Fixes #4259
Notes to reviewers
Open issue:
The spanlog inpkg/querier/blocks_store_queryable.go
[1] computes the number of bytes for the series (countSeriesBytes()
) differently than the way chunk bytes are limited. I wasn't sure if I should update the spanlog to usecountChunkBytes()
the same way I did for the stats emitted or if it's supposed to be measuring something entirely different. As it is, the number bytes emitted by the spanlog is consistently lower thancountChunkBytes()
since it doesn't include the timestamps for each chunk.Looking for input:
The buckets picked for the histograms were chosen based on queries performed by the cortex-mixin dashboards. As they are now, nearly all queries fall into the largest bucket or lower (number of series and chunk bytes). If there are changes you'd like to see, let me know.uint64
fromint
and vice versa. It seemed from the PRs that added the limits that there wasn't consensus onint
vsuint64
. Let me know if you'd like to seeuint64
here changed to something else.Signed-off-by: Nick Pillitteri nick.pillitteri@grafana.com
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]