Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(blooms): Exclude label filters where label name is part of the series labels. #14661

Merged
merged 8 commits into from
Oct 31, 2024

Conversation

salvacorts
Copy link
Contributor

@salvacorts salvacorts commented Oct 30, 2024

What this PR does / why we need it:

Before this PR, running this would return no data:

{env=~".+"} | env="prod"

This is because env is an stream label, not structured metadata. If we not exclude it from the checks, it would not match any bloom filter and therefore all chunks would be discarded erroneously.

This PR fixes this by passing down the series labels to the FilterChunkRefs method of the bloom gateway. The series labels are passed (along the FP) via the logproto.GroupedChunkRefs struct. We pass the series labels to v1.ExtractTestableLabelMatchers which will exclude any matcher with a label name present in the series labels.

Note that we only pass the series to v1.ExtractTestableLabelMatchers deep in the bloom gateway as filtering series is relatively expensive. Upper in the filtering stack we call v1.ExtractTestableLabelMatchers w/o the series (just to return early if there are not filters at all).

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@salvacorts salvacorts force-pushed the salvacorts/fix-blooms-filter-series-label branch from 6efff01 to 8c0116d Compare October 30, 2024 10:57
@salvacorts salvacorts marked this pull request as ready for review October 30, 2024 11:56
@salvacorts salvacorts requested a review from a team as a code owner October 30, 2024 11:56
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad we're getting this issue sorted out!

Comment on lines 260 to 263
series, err := g.indexQuerier.GetSeries(ctx, instanceID, req.From, req.Through, matchers...)
if err != nil {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the call to GetChunks on 223, the returned chunks matrix has a field called Metric labels.Labels alongside ChunkRef.

I think that might be the labels of the series (though it's named in a way that makes me less sure). But if it is the labels, we can remove this call in favour of the existing GetChunks call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT from looking at the code, looks like it does contain the stream labels

@@ -58,7 +58,7 @@ type IndexClientWithRange struct {
}

type BloomQuerier interface {
FilterChunkRefs(ctx context.Context, tenant string, from, through model.Time, chunks []*logproto.ChunkRef, plan plan.QueryPlan) ([]*logproto.ChunkRef, error)
FilterChunkRefs(ctx context.Context, tenant string, from, through model.Time, series []labels.Labels, chunks []*logproto.ChunkRef, plan plan.QueryPlan) ([]*logproto.ChunkRef, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's a little unfortunate we have to pass series and chunks separately instead of passing through GroupedChunkRefs. I'm not sure it's worth fixing this though, so I don't mind it staying this way.

pkg/logproto/bloomgateway.proto Show resolved Hide resolved
//
// Unsupported LabelFilterExprs map to an UnsupportedLabelMatcher, for which
// bloom tests should always pass.
func ExtractTestableLabelMatchers(expr syntax.Expr) []LabelMatcher {
func ExtractTestableLabelMatchers(expr syntax.Expr, series ...labels.Labels) []LabelMatcher {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably ok to have this logic here, but I've never been happy with the cognitive overhead in bloom_tester.go and I suspect it might be easier to follow if we moved the series knowledge checks down to the bloom test itself.

WDYT?

If we do move the logic that far down, we'll need to update other functions to make sure the labels get propagated through properly, like inside partitionRequest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. This way we can also filter out

{app="fake"} | app="noexist"
{app="fake"} | app=""

Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM! I was doing some final checks of other instances where we construct logproto.GroupedChunkRefs to see if labels are being lost anywhere important.

I found these:

I don't know if any of these matter, but all three look easy to fix and so might be worth updating anyway.

I'm also worried about what's happening in BloomQuerier.FilterChunkRefs; are we losing label information after we call bq.BlockResolver.Resolve? Is that potentially a problem?

pkg/storage/bloom/v1/bloom_tester.go Outdated Show resolved Hide resolved
pkg/storage/bloom/v1/bloom_tester.go Outdated Show resolved Hide resolved
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after last two comments! Thank you for fixing this!

pkg/storage/bloom/v1/bloom_tester.go Show resolved Hide resolved
Comment on lines 79 to 86
// We cannot support this test case until we can forward a list of structured metadata fields.
// We cannot check if the key is structured metadata using the bloom because these are probabilistic
// E.g. bloom.Test("env") may return true even if env is not structured metadata.
//{
// name: "filter label from series overridden by structured metadata",
// query: `{app="fake"} | app="fake"`, // app is set to other in the structured metadata
// match: false,
//},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm trying to understand this.

Shouldn't {app="fake"} | app="fake" only return false if every single log line in a chunk has app as structured metadata (and none of them are fake)? I'm not sure knowing the list of fields alone would be enough to ensure we're not filtering out chunks incorrectly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I overthought this one too much.

@salvacorts salvacorts merged commit d1668f6 into main Oct 31, 2024
59 checks passed
@salvacorts salvacorts deleted the salvacorts/fix-blooms-filter-series-label branch October 31, 2024 14:51
chaudum added a commit that referenced this pull request Nov 6, 2024
Pull request #14661 added the series labels to the FilterChunkRefs
request in order to be able to filter out full series that do not
contain the filter key when using bloom filters.

However, this chage incorrectly assumed that `Chunk.Metric` from the
`GetChunks()` call contains the labels of the stream. This information
is only populated once the chunk is fetched, though. Therefore the list
of labels was empty, and no chunk refs were filtered.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
chaudum pushed a commit that referenced this pull request Nov 6, 2024
chaudum added a commit that referenced this pull request Nov 6, 2024
Pull request #14661 added the series labels to the FilterChunkRefs
request in order to be able to filter out full series that do not
contain the filter key when using bloom filters.

However, this chage incorrectly assumed that `Chunk.Metric` from the
`GetChunks()` call contains the labels of the stream. This information
is only populated once the chunk is fetched, though. Therefore the list
of labels was empty, and no chunk refs were filtered.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
chaudum added a commit that referenced this pull request Nov 7, 2024
Pull request #14661 added the series labels to the FilterChunkRefs
request in order to be able to filter out full series that do not
contain the filter key when using bloom filters.

However, this chage incorrectly assumed that `Chunk.Metric` from the
`GetChunks()` call contains the labels of the stream. This information
is only populated once the chunk is fetched, though. Therefore the list
of labels was empty, and no chunk refs were filtered.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
chaudum added a commit that referenced this pull request Nov 7, 2024
…ered (#14807)

Pull request #14661 added the series labels to the `FilterChunkRefs` request in order to be able to filter out full series that do not contain the filter key when using bloom filters.

However, this change incorrectly assumed that `Chunk.Metric` from the `GetChunks()` call contains the labels of the stream. This information is only populated once the chunk is fetched, though. Therefore the list of labels was empty, and no chunk refs were filtered.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
loki-gh-app bot pushed a commit that referenced this pull request Nov 7, 2024
…ered (#14807)

Pull request #14661 added the series labels to the `FilterChunkRefs` request in order to be able to filter out full series that do not contain the filter key when using bloom filters.

However, this change incorrectly assumed that `Chunk.Metric` from the `GetChunks()` call contains the labels of the stream. This information is only populated once the chunk is fetched, though. Therefore the list of labels was empty, and no chunk refs were filtered.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
(cherry picked from commit 3b20cb0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants