Add per-user query metrics for series and bytes returned #4343

56quarters · 2021-07-06T15:03:20Z

What this PR does:

Add stats included in query responses from the querier and distributor
for measuring the number of series and bytes included in successful
queries. These stats are emitted per-user as counters from the query
frontends.

These stats are picked to add visibility into the same resources limited
as part of #4179 and #4216.

Fixes #4259

Notes to reviewers

Open issue:

The spanlog in pkg/querier/blocks_store_queryable.go [1] computes the number of bytes for the series (countSeriesBytes()) differently than the way chunk bytes are limited. I wasn't sure if I should update the spanlog to use countChunkBytes() the same way I did for the stats emitted or if it's supposed to be measuring something entirely different. As it is, the number bytes emitted by the spanlog is consistently lower than countChunkBytes() since it doesn't include the timestamps for each chunk.

Looking for input:

The buckets picked for the histograms were chosen based on queries performed by the cortex-mixin dashboards. As they are now, nearly all queries fall into the largest bucket or lower (number of series and chunk bytes). If there are changes you'd like to see, let me know.
Lots of casting to uint64 from int and vice versa. It seemed from the PRs that added the limits that there wasn't consensus on int vs uint64. Let me know if you'd like to see uint64 here changed to something else.

Signed-off-by: Nick Pillitteri nick.pillitteri@grafana.com

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pracucci

I have some concerns doing it a per-user basis. @tomwilkie just yesterday commented here #4259 (comment) on the fact that we have to pay attention each time we add a per-user histogram, cause they easily explode cardinality.

56quarters · 2021-07-09T14:21:04Z

I have some concerns doing it a per-user basis. @tomwilkie just yesterday commented here #4259 (comment) on the fact that we have to pay attention each time we add a per-user histogram, cause they easily explode cardinality.

Understood. The approach here (to emit the metrics from the query-frontend) was taken based on a suggestion from @pstibrany to reduce the number of series (since there are usually only a few query frontends). I can definitely reduce the number of buckets of the histograms.

56quarters · 2021-07-09T14:27:36Z

Based on @tomwilkie 's comment it seems like what I should do is:

Keep emitting these metrics as part of the query log
Change these histograms to summaries? Or perhaps remove them entirely?

tomwilkie · 2021-07-09T14:29:41Z

Yes please Nick! I'd probably go with logs in this case, but a summary would also be fine with 1 or 2 quantiles.

Add stats included in query responses from the querier and distributor for measuring the number of series and bytes included in successful queries. These stats are emitted per-user as summaries from the query frontends. These stats are picked to add visibility into the same resources limited as part of #4179 and #4216. Fixes #4259 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

replay

LGTM with one comment

pkg/distributor/query.go

pracucci

Good job @56quarters! I left few comments about naming to try to make it a bit more clear. A part from this LGTM (and no concern about cardinality since you moved away from histograms). Thanks for addressing my initial feedback! 🙏

pkg/querier/stats/stats.proto

pkg/querier/blocks_store_queryable.go

pkg/frontend/transport/handler.go

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

pracucci

LGTM (modulo a couple of final nits). Thanks!

pkg/frontend/transport/handler.go

pkg/querier/stats/stats_test.go

Co-authored-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

pstibrany

LGTM, thank you!

pkg/frontend/transport/handler.go

56quarters · 2021-07-20T12:59:37Z

I think it would be useful to emit this metric in some non-successful queries too. The specific case I have in mind is when the query do indeed hit the limit, we should still have this metric emitted so we can detect if any user hit the limit.

I think it would be useful as well but I'm not going to make the change here since I think it would involve a fairly large refactoring of the code. Nothing preventing us from doing this as a follow-up though.

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

…ct#4343) * Add per-user query metrics for series and bytes returned Add stats included in query responses from the querier and distributor for measuring the number of series and bytes included in successful queries. These stats are emitted per-user as summaries from the query frontends. These stats are picked to add visibility into the same resources limited as part of cortexproject#4179 and cortexproject#4216. Fixes cortexproject#4259 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Formatting fix Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Fix changelog to match actual changes Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Typo Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Code review changes, rename things for clarity Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Apply suggestions from code review Co-authored-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Code review changes, remove superfluous summaries Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

pull-request-size bot added the size/L label Jul 6, 2021

pracucci reviewed Jul 9, 2021

View reviewed changes

56quarters changed the title ~~Add per-user query histograms for series and bytes returned~~ Add per-user query metrics for series and bytes returned Jul 13, 2021

56quarters added 3 commits July 13, 2021 16:16

Formatting fix

92e8ead

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

Fix changelog to match actual changes

94248c2

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

56quarters requested a review from pracucci July 14, 2021 13:11

Typo

9f41961

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

replay approved these changes Jul 14, 2021

View reviewed changes

pkg/distributor/query.go Outdated Show resolved Hide resolved

pracucci reviewed Jul 16, 2021

View reviewed changes

Code review changes, rename things for clarity

edb7880

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

56quarters requested a review from pracucci July 16, 2021 14:53

pracucci approved these changes Jul 16, 2021

View reviewed changes

pkg/frontend/transport/handler.go Outdated Show resolved Hide resolved

pkg/querier/stats/stats_test.go Outdated Show resolved Hide resolved

pkg/querier/stats/stats_test.go Outdated Show resolved Hide resolved

Apply suggestions from code review

e6c9d26

Co-authored-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

jtlisi self-requested a review July 16, 2021 15:05

pstibrany approved these changes Jul 19, 2021

View reviewed changes

pkg/frontend/transport/handler.go Outdated Show resolved Hide resolved

Code review changes, remove superfluous summaries

c048c7b

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

pstibrany enabled auto-merge (squash) July 20, 2021 14:16

pstibrany merged commit 1e4e0ca into cortexproject:master Jul 20, 2021

Add per-user query metrics for series and bytes returned #4343

Add per-user query metrics for series and bytes returned #4343

Uh oh!

Conversation

56quarters commented Jul 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

56quarters commented Jul 9, 2021

Uh oh!

56quarters commented Jul 9, 2021

Uh oh!

tomwilkie commented Jul 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

replay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pstibrany left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

56quarters commented Jul 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

56quarters commented Jul 6, 2021 •

edited

Loading

tomwilkie commented Jul 9, 2021 •

edited

Loading