From bcc12fc2b964f311e04693c5f262eabdb5259116 Mon Sep 17 00:00:00 2001 From: Edward Welch Date: Fri, 31 May 2019 13:39:52 -0400 Subject: [PATCH] Improving pipeline docs --- docs/logentry/processing-log-lines.md | 156 ++++++++++++++++++++++++++ 1 file changed, 156 insertions(+) create mode 100644 docs/logentry/processing-log-lines.md diff --git a/docs/logentry/processing-log-lines.md b/docs/logentry/processing-log-lines.md new file mode 100644 index 000000000000..5f8d162a270c --- /dev/null +++ b/docs/logentry/processing-log-lines.md @@ -0,0 +1,156 @@ +# Processing Log Lines + + * [Pipeline](#pipeline) + * [Stages](#stages) + +## Pipeline + +Pipeline stages implement the following interface: + +```go +type Stage interface { + Process(labels model.LabelSet, extracted map[string]interface{}, time *time.Time, entry *string) +} +``` + +Any Stage is capable of modifying the `labels`, `extracted` data, `time`, and/or `entry`, though generally a Stage should only modify one of those things to reduce complexity. + +Stages are grouped into a pipeline which will execute a group of stages. + +More info on each field in the interface: + +##### labels + +A set of prometheus style labels which will be sent with the log line and will be indexed by Loki. + +##### extracted + +metadata extracted during the pipeline execution which can be used by subsequent stages. This data is not sent with the logs and is dropped after the log entry is processed through the pipeline. + +For example, stages like [regex](#regex) and [json](#json) will use expressions to extract data from a log line and store it in the `extracted` map, which following stages like [timestamp](#timestamp) or [output](#output) can use to manipulate the log lines `time` and `entry`. + +##### time + +The timestamp which loki will store for the log line, if not set within the pipeline using the [time](#time) stage, it will default to time.Now(). + +##### entry + +The log line which will be stored by loki, the [output](#output) stage is capable of modifying this value, if no stage modifies this value the log line stored will match what was input to the system and not be modified. + +## Stages + + * [match](#match) + * [regex](#regex) + * [json](#json) + * [timestamp](#timestamp) + * [output](#output) + * [labels](#labels) + * [metrics](#metrics) + +### match + +A match stage will take the provided label `selector` and determine if a group of provided Stages will be executed or not based on labels + +```yaml +- match: + selector: "{app=\"loki\"}" ① + pipeline_name: "loki_pipeline" ② + stages: ③ +``` +① `selector` is **required** and uses logql label matcher expressions TODO LINK +② `piplne_name` is **optional** but when defined, will create an additional label on the `pipeline_duration_seconds` histogram, the value for `pipeline_name` will be concatenated with the `job_name` using an underscore: `job_name`_`pipeline_name` +③ `stages` is a **required** list of additional pipeline stages which will only be executed if the defined `selector` matches the labels. The format is a list of pipeline stages which is defined exactly the same as the root pipeline + + +[Example in unit test](../../pkg/logentry/match_test.go) + +### regex + +A regex stage will take the provided regex and set the named groups as data in the `extracted` map. + +```yaml +- regex: + expression: ① +``` + +① `expression` is **required** and needs to be a [golang RE2 regex string](https://github.com/google/re2/wiki/Syntax). Every capture group `(re)` will be set into the `extracted` map, every capture group **must be named:** `(?Pre)`, the name will be used as the key in the map. + +##### Example: + +```yaml +- regex: + expression: "^(?s)(?P