Merge pull request #710 from alanconway/forward-to-loki

New enhancment proposal: forward-to-Loki
openshift · Apr 15, 2021 · a0f9019 · a0f9019
2 parents ec37598 + 66a341c
commit a0f9019
Showing 1 changed file with 255 additions and 0 deletions.
diff --git a/enhancements/cluster-logging/forward-to-loki.md b/enhancements/cluster-logging/forward-to-loki.md
@@ -0,0 +1,255 @@
+# Forward to Loki
+
+## Release Sign-off Checklist
+
+- [X] Enhancement is `implementable`
+- [X] Design details are appropriately documented from clear requirements
+- [X] Test plan is defined
+- [ ] Operational readiness criteria is defined
+- [ ] Graduation criteria for dev preview, tech preview, GA
+- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)
+
+## Summary
+
+Add a new output type to the `ClusterLogForwarder` API to forward logs to a Loki instance.
+An important part of using Loki is choosing the right _Loki labels_ to define log streams.
+We present a default set of Loki labels, and explain how the choice of labels affect performance and queries.
+
+## Motivation
+
+Loki will become our default log store in the near future.
+There are also several use cases for forwarding to an external Loki instance.
+
+### Goals
+
+- Forward logs to a Loki instance
+- Choose a good default set of Loki labels.
+- Provide configuration to control the choice of Loki labels
+- Describe how expected query patterns correspond to Loki labels and JSON payload
+
+### Non-Goals
+
+The following may be covered in future proposals, but are out of scope here:
+
+- Query tools.
+- Optimizing/reducing the JSON payload format.
+- Alternate payload formats.
+- Use of collectors other than `fluentd` (e.g. Promtail, FluentBit)
+
+## Proposal
+
+### Summary of Loki labels
+
+Loki and Elasticsearch take different approaches to indexing and search.
+Elasticsearch does content parsing and indexing _during ingest_ to optimize the _query_ path.
+Loki optimizes the _ingest_ path by using simple key-value `labels` for initial indexing,
+and deferring content parsing until _query_ time.
+
+The benefit is faster ingest, which reduces the risk of dropping logs.
+The trade-off is that need to carefully choose what to use as labels.
+
+For more details:
+- [Guide to labels in Loki](https://grafana.com/blog/2020/08/27/the-concise-guide-to-labels-in-loki/)
+- [Out-of-order streams issue]([https://github.com/grafana/loki/issues/1544)
+
+To summarize the key points:
+- Each unique combination of labels defines an ordered Loki _stream_.
+  - Too many streams degrades performance.
+  - _Cardinality_ refers to the number of unique label combinations.
+- The cardinality of `kubernetes.pod_name` was found to be
+  - too high in clusters with _very_ rapid turn-over of short-lived pods.
+  - acceptable in other (possibly more typical) clusters with high log data rates.
+- Minimize the _total number_ of labels per stream
+  - There is a limit of 30 max, fewer labels == better performance.
+- Log streams must be _ordered by collection timestamp_.
+  - Must not allow distinct nodes with separate clocks to contribute to a single stream.
+
+All logging meta-data will still be included in log records.
+Labels do not need to _identify the source of logs_, only to _partition the search space_.
+At query time labels reduce the search space, then Loki uses log content to complete the search.
+
+**Note**: Loki [label names][] must match the regex `[a-zA-Z_:][a-zA-Z0-9_:]*`.
+Meta-data names are translated to Loki labels by replacing illegal characters with '_'.
+
+Nested JSON objects are referenced by converting their JSON path to a label name.
+For example, if `kubernetes.namespace_name` is a label, then this JSON log record:
+
+``` json
+{"kubernetes.namespace_name":"foo","kubernetes":{"labels":{"foo":"bar"}}}
+```
+would match this LogQL filter:
+
+``` logql
+{ kubernetes_namespace_name="foo" } | json | kubernetes_labels_foo == "bar"
+```
+### Default Loki Labels
+
+The default label set should be suitable for most deployments.
+The user can configure a different set for specific needs.
+
+The default label set is:
+
+`log_type`: One of `application`, `infrastructure`, `audit`. Basic log categorization.
+
+`cluster`: The cluster name, i.e. the DNS authority part of the cluster master URL.\
+This name is DNS-unique, human readable, and known to all k8s API clients.
+It is recommend as a cluster ID by [this discussion](https://github.com/kubernetes/kubernetes/issues/2292).
+To print the name of the connected cluster:
+```bash
+oc config view -o jsonpath='{.clusters[].name}{"\n"}
+```
+
+`kubernetes.namespace_name`: namespace where the log originated.
+
+`kubernetes.pod_name`: name of the pod where the log originated.
+
+`kubernetes.host`: Host name of the cluster node where the log record originated.\
+This is *always included* to guarantee ordered streams, even if the user configures a label set without it.
+
+**Note:** `container_name` and `image_name` are *not* included in the defaults. They are not high-cardinality by themselves, but they are multipliers of `namespace/pod`, which is the highest cardinality we want to allow by default.
+
+**Note:** We use names rather than UID for cluster, namespace and pod.
+Names are more likely to be known and easier to use in queries.
+Using both names *and* UIDs is redundant, they partition the data in approximately the same way.
+
+UIDs (and all other meta-data) are still available in the log payload for filtering in queries.
+
+### Proposed API
+
+Add a new `loki` output type to the `ClusterLogForwarder` API:
+
+``` yaml
+- name: myLokiOutput
+  type: loki
+  url: ...
+  secret: ...
+```
+
+`url` and `secret` have the usual meaning with regard to TLS and certificates.
+The `secret` may also contain `username` and `password` fields for Loki.
+
+The following optional output fields are Loki-specific:
+
+- `labelKeys`: ([]string, default=_see [Default Loki Labels](#default-loki-labels))_ \
+  A list of meta-data keys to replace the default labels.\
+  Keys are translated to [label names][] as described in [Summary of Loki Labels](#summary-of-loki-labels)
+  Example: `kubernetes.labels.foo` => `kubernetes_labels_foo`.\
+  **Note**: `kubernetes.host` is *always* be included, even if not requested.
+  It is required to ensure ordered label streams.
+- `tenantKey`: (string, default=`"kubernetes.namespaceName"`) \
+  Use the value of this meta-data key as the Loki tenant ID.\
+  At least these keys are supported:
+  - `kubernetes.namespaceName`: Use the namespace name as the tenant ID.
+  - `kubernetes.labels.<key>`: Use the string value of kubernetes label with key `<key>`.
+  - `openshift.labels.<key>`: use the value of a label attached by the forwarder.
+
+The full set of meta-data keys is listed in [data model][].
+
+### Implementation Details
+
+The output will be implemented using this fluentd plugin: https://grafana.com/docs/loki/latest/clients/fluentd/
+
+Notes on plug-in configuration:
+
+- Security: configured by the `output.secret` as usual.
+- K8s labels as Loki labels: supported by the plug-in.
+- Always include `kubernetes.host` to avoid avoid out-of-order streams.
+- Tenant set from `output.tenantKey`
+- Output format is `json`, serializes the fluentd record like other outputs.
+- Static labels set as `extra_labels` to avoid extracting from each record.
+
+### User Stories
+
+#### Treat each namespace as a separate Loki tenant
+
+I want logs from each namespace to be directed to separate Loki tenants, using Loki's "soft tenancy" model:
+
+``` yaml
+- name: myLokiOutput
+  type: loki
+  url: ...
+  secret: ...
+  tenantKey: kubernetes.namespace_name
+```
+
+#### Query all logs from a namespace
+
+``` logql
+{ kubernetes_namespace_name="mynamespace"}
+```
+
+#### Query logs from a specific container in a named Pod
+
+``` logql
+{ kubernetes_namespace_name="mynamespace"" } |= kubernetes_pod_name == "mypod" |= kubernetes_container_name="mycontainer
+```
+
+#### Query logs from a labeled application
+
+Using the default configuration:
+
+``` logql
+{ } |= kubernetes_labels_app == "myapp""
+```
+
+By adding `labelKeys: [ kubernetes.labels.app ]` to the Loki output configuration this
+can be a faster label query:
+
+``` logql
+{ kubernetes_labels_app="myapp" }
+```
+
+#### Example Queries From The Field
+
+These are real queries taken from use in the field and translated.
+This deployment makes heavy use of the `app` and `deploymentconfig` kubernetes labels, so
+we assume this configuration to speed up queries:
+
+``` yaml
+- name: myLokiOutput
+  type: loki
+  url: ...
+  secret: ...
+  labelKeys: [ kubernetes.labels.app, kubernetes.labels.deploymentconfig ]
+```
+
+``` logql
+{cluster="dsaas",kubernetes_labels_app="f8notification"} |= 'level:"error"'
+
+{cluster="dsaas",kubernetes_labels_app="keycloak"} |=  message:"*503 Service Unavailable*"
+
+{cluster="rh-idev", kubernetes_labels_deploymentconfig="bayesian-pgbouncer"} !~ “client close request|coreapi tls=no|server_lifetime|new connection to server|client unexpected eof|LOG Stats|server idle timeout|unclean server”
+
+{cluster="rh-idev", kubernetes_labels_deploymentconfig="wwwopenshifio"} |= message:"Connection reset by peer"
+```
+
+### Risks and Mitigation
+
+#### Cardinality explosions for large scale clusters
+
+We provide a set of labels that should perform well in most cases, but allow the user to tune the labels to manage unusual situations.
+For example a CI test cluster that constantly creates and destroys randomly-named pods might want to omit the pod name if it becomes too high cardinality.
+
+## Design Details
+
+### Test Plan
+
+- Unit and e2e tests
+- Measure throughput/latency on busy cluster, should exceed Elasticsearch.
+- Find breaking point for log loss in stress test, should exceed Elasticsearch.
+- Measure query response on large data, must be acceptable compared to Elasticsearch.
+
+### Graduation Criteria
+
+Acceptable performance & passing stress tests.
+
+### Upgrade / Downgrade Strategy
+
+_**TODO** Migrating from Elasticsearch_
+
+## Implementation History
+
+None yet.
+
+[label names]: https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels
+[data model]: https://github.com/openshift/origin-aggregated-logging/blob/master/docs/com.redhat.viaq-openshift-project.asciidoc