-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(blooms): Exclude label filters where label name is part of the series labels. #14661
Conversation
6efff01
to
8c0116d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad we're getting this issue sorted out!
pkg/indexgateway/gateway.go
Outdated
series, err := g.indexQuerier.GetSeries(ctx, instanceID, req.From, req.Through, matchers...) | ||
if err != nil { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the call to GetChunks
on 223, the returned chunks matrix has a field called Metric labels.Labels
alongside ChunkRef.
I think that might be the labels of the series (though it's named in a way that makes me less sure). But if it is the labels, we can remove this call in favour of the existing GetChunks call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT from looking at the code, looks like it does contain the stream labels
pkg/indexgateway/gateway.go
Outdated
@@ -58,7 +58,7 @@ type IndexClientWithRange struct { | |||
} | |||
|
|||
type BloomQuerier interface { | |||
FilterChunkRefs(ctx context.Context, tenant string, from, through model.Time, chunks []*logproto.ChunkRef, plan plan.QueryPlan) ([]*logproto.ChunkRef, error) | |||
FilterChunkRefs(ctx context.Context, tenant string, from, through model.Time, series []labels.Labels, chunks []*logproto.ChunkRef, plan plan.QueryPlan) ([]*logproto.ChunkRef, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's a little unfortunate we have to pass series
and chunks
separately instead of passing through GroupedChunkRefs. I'm not sure it's worth fixing this though, so I don't mind it staying this way.
// | ||
// Unsupported LabelFilterExprs map to an UnsupportedLabelMatcher, for which | ||
// bloom tests should always pass. | ||
func ExtractTestableLabelMatchers(expr syntax.Expr) []LabelMatcher { | ||
func ExtractTestableLabelMatchers(expr syntax.Expr, series ...labels.Labels) []LabelMatcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably ok to have this logic here, but I've never been happy with the cognitive overhead in bloom_tester.go
and I suspect it might be easier to follow if we moved the series knowledge checks down to the bloom test itself.
WDYT?
If we do move the logic that far down, we'll need to update other functions to make sure the labels get propagated through properly, like inside partitionRequest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. This way we can also filter out
{app="fake"} | app="noexist"
{app="fake"} | app=""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM! I was doing some final checks of other instances where we construct logproto.GroupedChunkRefs
to see if labels are being lost anywhere important.
I found these:
I don't know if any of these matter, but all three look easy to fix and so might be worth updating anyway.
I'm also worried about what's happening in BloomQuerier.FilterChunkRefs
; are we losing label information after we call bq.BlockResolver.Resolve
? Is that potentially a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after last two comments! Thank you for fixing this!
// We cannot support this test case until we can forward a list of structured metadata fields. | ||
// We cannot check if the key is structured metadata using the bloom because these are probabilistic | ||
// E.g. bloom.Test("env") may return true even if env is not structured metadata. | ||
//{ | ||
// name: "filter label from series overridden by structured metadata", | ||
// query: `{app="fake"} | app="fake"`, // app is set to other in the structured metadata | ||
// match: false, | ||
//}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'm trying to understand this.
Shouldn't {app="fake"} | app="fake"
only return false if every single log line in a chunk has app
as structured metadata (and none of them are fake
)? I'm not sure knowing the list of fields alone would be enough to ensure we're not filtering out chunks incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. I overthought this one too much.
Pull request #14661 added the series labels to the FilterChunkRefs request in order to be able to filter out full series that do not contain the filter key when using bloom filters. However, this chage incorrectly assumed that `Chunk.Metric` from the `GetChunks()` call contains the labels of the stream. This information is only populated once the chunk is fetched, though. Therefore the list of labels was empty, and no chunk refs were filtered. Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Pull request #14661 added the series labels to the FilterChunkRefs request in order to be able to filter out full series that do not contain the filter key when using bloom filters. However, this chage incorrectly assumed that `Chunk.Metric` from the `GetChunks()` call contains the labels of the stream. This information is only populated once the chunk is fetched, though. Therefore the list of labels was empty, and no chunk refs were filtered. Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Pull request #14661 added the series labels to the FilterChunkRefs request in order to be able to filter out full series that do not contain the filter key when using bloom filters. However, this chage incorrectly assumed that `Chunk.Metric` from the `GetChunks()` call contains the labels of the stream. This information is only populated once the chunk is fetched, though. Therefore the list of labels was empty, and no chunk refs were filtered. Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
…ered (#14807) Pull request #14661 added the series labels to the `FilterChunkRefs` request in order to be able to filter out full series that do not contain the filter key when using bloom filters. However, this change incorrectly assumed that `Chunk.Metric` from the `GetChunks()` call contains the labels of the stream. This information is only populated once the chunk is fetched, though. Therefore the list of labels was empty, and no chunk refs were filtered. Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
…ered (#14807) Pull request #14661 added the series labels to the `FilterChunkRefs` request in order to be able to filter out full series that do not contain the filter key when using bloom filters. However, this change incorrectly assumed that `Chunk.Metric` from the `GetChunks()` call contains the labels of the stream. This information is only populated once the chunk is fetched, though. Therefore the list of labels was empty, and no chunk refs were filtered. Signed-off-by: Christian Haudum <christian.haudum@gmail.com> (cherry picked from commit 3b20cb0)
What this PR does / why we need it:
Before this PR, running this would return no data:
This is because env is an stream label, not structured metadata. If we not exclude it from the checks, it would not match any bloom filter and therefore all chunks would be discarded erroneously.
This PR fixes this by passing down the series labels to the
FilterChunkRefs
method of the bloom gateway. The series labels are passed (along the FP) via thelogproto.GroupedChunkRefs
struct. We pass the series labels tov1.ExtractTestableLabelMatchers
which will exclude any matcher with a label name present in the series labels.Note that we only pass the series to
v1.ExtractTestableLabelMatchers
deep in the bloom gateway as filtering series is relatively expensive. Upper in the filtering stack we callv1.ExtractTestableLabelMatchers
w/o the series (just to return early if there are not filters at all).Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.md
deprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR