-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Line deduplication filter #3110
Line deduplication filter #3110
Conversation
c1aed95
to
b9b54e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few comments/suggestions here and there but overall it's well done. Great job figuring out how the sausage is made and extending this.
Todo (Owen/Cyril):
- How will this affect sharding? Presumably
dedup
would need to be applied twice (at the edge and at the aggregation layer. - The new node types need to be added to the new
Walk
functions I think.
pkg/logql/log/dedup_filter.go
Outdated
|
||
func NewLineDedupFilter(labelFilters []string, inverted bool) *LineDedupFilter { | ||
// create a map of labelFilters for O(1) lookups instead of O(n) | ||
var filterMap = make(map[string]interface{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can use map[string]struct{}
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I missed this in my refactor - thanks 👍
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
0c37f18
to
198640e
Compare
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
@owen-d thought I'd also just drop in the benchmarking results to see if this is any cause for concern:
|
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
pkg/logql/log/dedup_filter.go
Outdated
|
||
func NewLineDedupFilter(labelFilters []string, inverted bool) *LineDedupFilter { | ||
// create a map of labelFilters for O(1) lookups instead of O(n) | ||
var filterMap = make(map[string]struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var filterMap = make(map[string]struct{}) | |
var filterMap = make(map[string]struct{}, len(labelFilters)) |
I think this is a great, I wanted to implement something similar. I'm wondering though, wouldn't be this more useful if it includes somehow the number of duplicates reduced for each line ? I think this would help the client to emphasis on a specific message, but then there some UI work. I'm also wondering how it will looks like in Explore histogram, we could get this deployed in dev and give it a try.
Can you share some example of that use-case ? I'm curious to compare to my use-cases, filtering duplicate errors like below using the |
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Funnily enough, I implemented that initially and then disregarded it. How do you see this working though? I'm thinking of a meta-label like
That's almost exactly my use-case too; in my use-case I want to reduce the stream of logs by using |
I tried to quickly hack together a dedup counter, with that dannykopping/loki@dannykopping/line_dedup_filter...dannykopping:dannykopping/line_dedup_filter_count Even if I keep a reference to the Update: I see now that return line, p.builder.LabelsResult(), true What would suggest here? I can't see a way right now to attach a meta-label to the 1 returned line using my existing mechanism of deduplication. |
Not sure yet, but this is not all your problems. Right now, those pipelines are run on ingesters too, so you will be deduping in multiple place, you'll need to merge result a bit differently, this is trickier than you would expect. There's also dedupe from replication that needs to happens before otherwise they will count tower duplicates and they should not, currently it's not a big problem because we only add labels, but I want to solve this separately. I'm fine with the meta label solution, but we should see with everyone in the team, you should join our weekly or community meeting. |
|
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
What this PR does / why we need it:
Introduces a
dedup
filter to reduce a log stream by grouping on unique label dimensions.Which issue(s) this PR fixes:
None. I implemented this quickly on a whim so have not discussed whether we actually want this; feel free to reject on this basis.
Special notes for your reviewer:
I wrote this feature to scratch an itch I had when using Loki as a datasource for annotations in Grafana; LogQL only offers line/label filtering for reducing the returned lines in a log stream, which doesn't quite fit my use-case.
In my use-case, I have multiple log lines that are "duplicated", with only a unique
cluster
label in each; I didn't want to force the user of my dashboard to have to choose a cluster to filter on since this data is generally consistent across clusters (and choosing an arbitrary cluster to filter on to achieve the same result doesn't express my intent clearly); figured this feature might be useful to other folks too.I tried to mirror the nomenclature used in the Grafana Explore view, although this may cause confusion:
Open to suggestions on this.
If this PR is approved, I'm happy to implement the deduplication by exact/number/signature logic in LogQL.
Checklist