Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduce shardable probabilistic topk for instant queries. (backport k227) #14765

Merged
merged 1 commit into from
Nov 4, 2024

Conversation

loki-gh-app[bot]
Copy link
Contributor

@loki-gh-app loki-gh-app bot commented Nov 4, 2024

Backport 7b53f20 from #14243


What this PR does / why we need it:
This change introduces a very simplified shardable topk approximation through the new vector aggregation approx_topk.

We use a count min sketch and track all labels not just the top k. Since this list can grow quite large the feature is only supported for instant queries. Grouping is also not supported and should be handled by an inner sum by or sum without even though this might not be the same behaviour as topk by.

The sharding works by turning the approx_topk(k, inner) query into the following expression:

topk(
  k,
  eval_cms(
    __count_min_sketch__(inner, shard=1) ++ __count_min_sketch__(inner, shard=2)...
  )
)

__count_min_sketch__ is calculated for each shard and merge on the frontend. eval_cms iterates through the labels list and determines the count for each. topk selects then the top items.

The number of labels tracked on the querier side when evaluating __count_min_sketch__ is limited by the heap. It does count all values but the ke-value pairs might not be known.

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

…4243)

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Callum Styan <callumstyan@gmail.com>
(cherry picked from commit 7b53f20)
@loki-gh-app loki-gh-app bot requested review from a team as code owners November 4, 2024 18:24
@loki-gh-app loki-gh-app bot added area/helm backport size/XXL type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories labels Nov 4, 2024
@loki-gh-app loki-gh-app bot requested a review from cstyan November 4, 2024 18:24
@cstyan cstyan merged commit 02eb024 into k227 Nov 4, 2024
68 checks passed
@cstyan cstyan deleted the backport-14243-to-k227 branch November 4, 2024 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm backport size/XXL type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants