Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line deduplication filter #3110

Closed
Closed
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions docs/sources/logql/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ A log pipeline can be composed of:
- [Line Format Expression](#Line-Format-Expression)
- [Labels Format Expression](#Labels-Format-Expression)
- [Unwrap Expression](#Unwrap-Expression)
- [Dedup Filter Expression](#Dedup-Filter-Expression)

The [unwrap Expression](#Unwrap-Expression) is a special expression that should only be used within metric queries.

Expand Down Expand Up @@ -682,3 +683,62 @@ quantile_over_time(
```

>Metric queries cannot contains errors, in case errors are found during execution, Loki will return an error and appropriate status code.

#### Dedup Filter Expression

The line deduplication filter (`dedup`) will reduce the number of log lines returned by filtering on a given set of label dimensions.
Each log line will be examined, and the first line with unique values for these label dimensions will be returned.

`dedup` uses the same label grouping syntax as aggregations (`sum`, `avg`, etc).

```logql
dedup without|by (<label list>)
```

`dedup` is useful when a set of log lines is required, for example with Grafana's [annotations](https://grafana.com/docs/grafana/latest/dashboards/annotations/#annotations) feature using [Loki as a datasource](https://grafana.com/docs/grafana/latest/datasources/loki/#annotations).

**NOTE**: log lines are not _necessarily_ processed chronologically, so the order is not guaranteed.

#### Dedup Examples

Given these log lines:

```log
level=info ts=2020-10-23T20:32:16.094668233Z org_id=29
level=info ts=2020-10-23T20:32:17.068866235Z org_id=12
level=debug ts=2020-10-23T20:32:18.068866235Z org_id=29
level=info ts=2020-10-23T20:32:19.068866235Z org_id=29
```

If we apply a `dedup` filter by the `org_id` label:

```logql
{app="foo"}
| logfmt
| dedup by (org_id)
```

...this will reduce the log lines to:

```log
level=info ts=2020-10-23T20:32:16.094668233Z org_id=29
level=info ts=2020-10-23T20:32:17.068866235Z org_id=12
```

Multiple labels can be used with a `dedup`:

```logql
{app="foo"}
| logfmt
| dedup by (level, org_id)
```

`dedup without (<label list>)` will use all labels except the given ones to perform this filtering.

```logql
{app="foo"}
| logfmt
| dedup without (org_id)
```

The resulting log lines will be identical; since we are excluding the `org_id` label, filtering is now performed on the `level` and `ts` labels. `ts` is unique for each log line, so all log lines are returned.
22 changes: 22 additions & 0 deletions pkg/logql/ast.go
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,26 @@ func (e *lineFilterExpr) Stage() (log.Stage, error) {
return f.ToStage(), nil
}

type lineDedupFilterExpr struct {
grouping *grouping

implicit
}

func newLineDedupFilterExpr(grouping *grouping) *lineDedupFilterExpr {
return &lineDedupFilterExpr{
grouping: grouping,
}
}

func (e *lineDedupFilterExpr) Stage() (log.Stage, error) {
return log.NewLineDedupFilter(e.grouping.groups, e.grouping.without), nil
}

func (e *lineDedupFilterExpr) String() string {
return fmt.Sprintf("%s %s %s", OpPipe, OpDedup, e.grouping.String())
}

type labelParserExpr struct {
op string
param string
Expand Down Expand Up @@ -519,6 +539,8 @@ const (
OpFmtLine = "line_format"
OpFmtLabel = "label_format"

OpDedup = "dedup"

OpPipe = "|"
OpUnwrap = "unwrap"

Expand Down
9 changes: 8 additions & 1 deletion pkg/logql/expr.y
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ import (
DurationFilter log.LabelFilterer
LabelFilter log.LabelFilterer
UnitFilter log.LabelFilterer
LineDedupFilter *lineDedupFilterExpr
LineFormatExpr *lineFmtExpr
LabelFormatExpr *labelFmtExpr
LabelFormat log.LabelFmt
Expand Down Expand Up @@ -78,6 +79,7 @@ import (
%type <DurationFilter> durationFilter
%type <LabelFilter> labelFilter
%type <LineFilters> lineFilters
%type <LineDedupFilter> lineDedupFilter
%type <LineFormatExpr> lineFormatExpr
%type <LabelFormatExpr> labelFormatExpr
%type <LabelFormat> labelFormat
Expand All @@ -92,7 +94,7 @@ import (
OPEN_PARENTHESIS CLOSE_PARENTHESIS BY WITHOUT COUNT_OVER_TIME RATE SUM AVG MAX MIN COUNT STDDEV STDVAR BOTTOMK TOPK
BYTES_OVER_TIME BYTES_RATE BOOL JSON REGEXP LOGFMT PIPE LINE_FMT LABEL_FMT UNWRAP AVG_OVER_TIME SUM_OVER_TIME MIN_OVER_TIME
MAX_OVER_TIME STDVAR_OVER_TIME STDDEV_OVER_TIME QUANTILE_OVER_TIME BYTES_CONV DURATION_CONV DURATION_SECONDS_CONV
ABSENT_OVER_TIME LABEL_REPLACE
ABSENT_OVER_TIME LABEL_REPLACE DEDUP

// Operators are listed with increasing precedence.
%left <binOp> OR
Expand Down Expand Up @@ -213,6 +215,11 @@ pipelineStage:
| PIPE labelFilter { $$ = &labelFilterExpr{LabelFilterer: $2 }}
| PIPE lineFormatExpr { $$ = $2 }
| PIPE labelFormatExpr { $$ = $2 }
| PIPE lineDedupFilter { $$ = $2 }
;

lineDedupFilter:
DEDUP grouping { $$ = newLineDedupFilterExpr($2) }
;

lineFilters:
Expand Down
Loading