Skip to content

Commit

Permalink
Add logfmt selective label extraction (#6675)
Browse files Browse the repository at this point in the history
<!--  Thanks for sending a pull request!  Before submitting:

1. Read our CONTRIBUTING.md guide
2. Name your PR as `<Feature Area>: Describe your change`.
a. Do not end the title with punctuation. It will be added in the
changelog.
b. Start with an imperative verb. Example: Fix the latency between
System A and System B.
  c. Use sentence case, not title case.
d. Use a complete phrase or sentence. The PR title will appear in a
changelog, so help other people understand what your change will be.
3. Rebase your PR if it gets out of sync with main
-->

**What this PR does / why we need it**:
This PR introduces extracting labels from a log line in `logfmt`, with
an extra option to rename them.

For example, this query:
```
{app=”foo”} | logfmt msg,status
```

will extract the labels `msg` and `status` from the following logfmt
line:
```
level=error ts=2021-02-12T19:18:10.037940878Z caller=client.go:294 component=client host=observability-loki-gateway msg="final error sending batch" status=400 error="server returned HTTP status 400 Bad Request (400): entry with timestamp 2021-02-12 19:18:08.917452 +0000 UTC ignored, reason: 'entry out of order' for stream..."
```
With the following results:
```
msg="final error sending batch"
status=”400”
```
--------------
Another possible scenario with label renaming:
```
{app=”foo”} | logfmt message="msg", status
```
That produces the following results:
```
message="final error sending batch"
status=”400”
```


**Which issue(s) this PR fixes**:
Fixes #3355 

**Special notes for your reviewer**:

<!--
Note about CHANGELOG entries, if a change adds:
* an important feature
* fixes an issue present in a previous release, 
* causes a change in operation that would be useful for an operator of
Loki to know
then please add a CHANGELOG entry.

For documentation changes, build changes, simple fixes etc please skip
this step. We are attempting to curate a changelog of the most relevant
and important changes to be easier to ingest by end users of Loki.

Note about the upgrade guide, if this changes:
* default configuration values
* metric names or label names
* changes existing log lines such as the metrics.go query output line
* configuration parameters 
* anything to do with any API
* any other change that would require special attention or extra steps
to upgrade
Please document clearly what changed AND what needs to be done in the
upgrade guide.
-->
**Checklist**
- [x] Documentation added
- [x] Tests updated
- [x] Is this an important fix or new feature? Add an entry in the
`CHANGELOG.md`.
- [x] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`

---------

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Co-authored-by: Christian Haudum <christian.haudum@gmail.com>
  • Loading branch information
btaani and chaudum authored Feb 22, 2023
1 parent 030d323 commit 68a9fd6
Show file tree
Hide file tree
Showing 20 changed files with 1,900 additions and 681 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
## Main/Unreleased
* [6675](https://github.com/grafana/loki/pull/6675) **btaani**: Add logfmt expression parser for selective extraction of labels from logfmt formatted logs

### All Changes

Expand Down
49 changes: 34 additions & 15 deletions docs/sources/logql/log_queries/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -403,25 +403,44 @@ The **json** parser operates in two modes:

#### logfmt

The **logfmt** parser can be added using the `| logfmt` and will extract all keys and values from the [logfmt](https://brandur.org/logfmt) formatted log line.
The **logfmt** parser can operate in two modes:

For example the following log line:
1. **without** parameters:

```logfmt
at=info method=GET path=/ host=grafana.net fwd="124.133.124.161" service=8ms status=200
```
The **logfmt** parser can be added using `| logfmt` and will extract all keys and values from the [logfmt](https://brandur.org/logfmt) formatted log line.

will get those labels extracted:
For example the following log line:

```kv
"at" => "info"
"method" => "GET"
"path" => "/"
"host" => "grafana.net"
"fwd" => "124.133.124.161"
"service" => "8ms"
"status" => "200"
```
```logfmt
at=info method=GET path=/ host=grafana.net fwd="124.133.124.161" service=8ms status=200
```

will result in having the following labels extracted:

```kv
"at" => "info"
"method" => "GET"
"path" => "/"
"host" => "grafana.net"
"fwd" => "124.133.124.161"
"service" => "8ms"
"status" => "200"
```

2. **with** parameters:

Similar to [JSON](#json), using `| logfmt label="expression", another="expression"` in the pipeline will result in extracting only the fields specified by the labels.

For example, `| logfmt host, fwd_ip="fwd"` will extract the labels `host` and `fwd` from the following log line:
```logfmt
at=info method=GET path=/ host=grafana.net fwd="124.133.124.161" service=8ms status=200
```

And rename `fwd` to `fwd_ip`:
```kv
"host" => "grafana.net"
"fwd_ip" => "124.133.124.161"
```

#### Pattern

Expand Down
13 changes: 0 additions & 13 deletions pkg/logql/log/json_expr.go

This file was deleted.

125 changes: 125 additions & 0 deletions pkg/logql/log/logfmt/lexer.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
package logfmt

import (
"bufio"
"fmt"
"io"
"text/scanner"
)

type Scanner struct {
buf *bufio.Reader
data []interface{}
err error
debug bool
}

func NewScanner(r io.Reader, debug bool) *Scanner {
return &Scanner{
buf: bufio.NewReader(r),
debug: debug,
}
}

func (sc *Scanner) Error(s string) {
sc.err = fmt.Errorf(s)
fmt.Printf("syntax error: %s\n", s)
}

func (sc *Scanner) Reduced(rule, state int, lval *LogfmtExprSymType) bool {
if sc.debug {
fmt.Printf("rule: %v; state %v; lval: %v\n", rule, state, lval)
}
return false
}

func (sc *Scanner) Lex(lval *LogfmtExprSymType) int {
return sc.lex(lval)
}

func (sc *Scanner) lex(lval *LogfmtExprSymType) int {
for {
r := sc.read()

if r == 0 {
return 0
}
if isWhitespace(r) {
continue
}

switch true {
case isStartIdentifier(r):
sc.unread()
lval.key = sc.scanField()
return KEY
case r == '"':
sc.unread()
lval.str = sc.scanStr()
return STRING
default:
sc.err = fmt.Errorf("unexpected char %c", r)
return 0
}
}
}

func isStartIdentifier(r rune) bool {
return (r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z') || r == '_'
}

func isIdentifier(r rune) bool {
return isStartIdentifier(r) || (r >= '0' && r <= '9')
}

func (sc *Scanner) scanField() string {
var str []rune

for {
r := sc.read()
if !isIdentifier(r) || isEndOfInput(r) {
sc.unread()
break
}

str = append(str, r)
}
return string(str)
}

// input is either terminated by EOF or null byte
func isEndOfInput(r rune) bool {
return r == scanner.EOF || r == rune(0)
}

func (sc *Scanner) read() rune {
ch, _, _ := sc.buf.ReadRune()
return ch
}

func (sc *Scanner) scanStr() string {
var str []rune
//begin with ", end with "
r := sc.read()
if r != '"' {
sc.err = fmt.Errorf("unexpected char %c", r)
return ""
}

for {
r := sc.read()
if isEndOfInput(r) {
break
}

if r == '"' || r == ']' {
break
}
str = append(str, r)
}
return string(str)
}

func (sc *Scanner) unread() { _ = sc.buf.UnreadRune() }

func isWhitespace(ch rune) bool { return ch == ' ' || ch == '\t' || ch == '\n' }
39 changes: 39 additions & 0 deletions pkg/logql/log/logfmt/logfmtexpr.y
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@


%{
package logfmt

func setScannerData(lex interface{}, data []interface{}) {
lex.(*Scanner).data = data
}

%}

%union {
str string
key string
list []interface{}
}

%token<str> STRING
%token<key> KEY

%type<str> key value
%type<list> expressions

%%

logfmt:
expressions { setScannerData(LogfmtExprlex, $1) }

expressions:
key { $$ = []interface{}{$1} }
| value { $$ = []interface{}{$1} }
| expressions value { $$ = append($1, $2) }
;

key:
KEY { $$ = $1 }

value:
STRING { $$ = $1 }
Loading

0 comments on commit 68a9fd6

Please sign in to comment.