Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC OpenTelemetry - general setup #10873

Closed
a-thaler opened this issue Mar 11, 2021 · 2 comments
Closed

PoC OpenTelemetry - general setup #10873

a-thaler opened this issue Mar 11, 2021 · 2 comments
Assignees
Labels
area/logging Issues or PRs related to the logging module (deprecated)

Comments

@a-thaler
Copy link
Contributor

Description

That is the first piece to solve #10119

Motivation:

  • build up knowledge and understanding of the architecture of the OTEL components without focussing yet on the client side
  • understand the processor pipeline of an collector and for what to use it
  • validate the overall vision in regards if OTEl can solve the expected problems

High-level outcome:

  • Explain opportunities and weaknesses, what problems can be solved nicely, what problems do you see. Focus on the collector, not so much about the metrics/traces itself
  • Demo the collector in action, demonstrate how it solves our problems
  • blueprint: step-by-step guide on how to setup a playground with dummy receiver and exporter

Technical goals:

  • What deployment formats of the collector are there, which one to pick
  • how to make the config dynamic based on applied resources to the cluster
  • How to send metrics/traces to muliple backends? How flexible is the selection criteria (can it be per namespace or container or even some tags?)

Reasons

Attachments

@a-thaler a-thaler added the area/logging Issues or PRs related to the logging module (deprecated) label Mar 11, 2021
@suleymanakbas91
Copy link
Contributor

suleymanakbas91 commented Mar 29, 2021

OpenTelemetry Collector

The Collector is a single binary that can be configured either as an Agent or as a Gateway. The Agent is supposed to do the light-weight job of only collecting the data and sending it to the Gateway, whereas the Gateway can do more advanced, heavy-lifting filtering operations (similar to FluentBit/Fluentd setup).

Deployment Type

There are three different deployment modes: DaemonSet, Deployment (default), sidecar.

We can first start with the Agent as DaemonSet setup, and see if it'd be sufficient. If not, we can also add a Gateway as Deployment and move the heavy-lifting parts there.

There is an OpenTelemetry Operator or a Helm chart to deploy the Collector. The Operator will make the configuration easier using the CRDs, but it is yet another piece of software to maintain. That's why I think deploying the Helm chart option would be a less-troublesome start for us.

Configuration

There are three different concepts in the configuration for the data collection/modification, which are called receivers, processors, and exporters. They are just like inputs, filters, and outputs in FluentBit configuration.

They do not take any effect until they are used in a pipeline though. A pipeline consists of a set of receivers, processors and exporters, and they are the execution recipes. Each pipeline can be of type traces, metrics, or logs. There can be multiple pipelines of the same type.

There are also extensions to provide further functionality. They need to be defined in the extension field to take effect, and are available primarily for tasks that do not involve processing telemetry data. Examples of extensions include health monitoring, service discovery, and data forwarding. Extensions are optional.

In the end, a sample configuration looks like this:

receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  batch:

exporters:
  otlp:
    endpoint: otelcol:55680

extensions:
  health_check:
  pprof:
  zpages:

service:
  extensions: [health_check,pprof,zpages]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]

Note: Dynamic loading of config is not possible at the moment and not planned for GA. There is this nice proposal to allow remote configuration of Collectors, which can also be useful for our case.

Processors

Processors are used to modify or to filter telemetry data. We can use Filter Processor for metrics and Span Processor for traces, to filter data based on regexes. We can also make use of Memory Limiter Processor to prevent out of memory issues. Additionally, Batch Processor would be useful for compressing the data.

FluentBit Subprocess Extension

There is an extension called FluentBit Subprocess Extension that runs FluentBit as a subprocess. It either sends the collected logs to OpenTelemetry Collector using a Forward plugin or uses outputs defined in FluentBit configuration. We can make use of this extension to have only one agent Pod running on each Node instead of two separate FluentBit and Collector Pods.

Prometheus Exporter vs Prometheus Remote Write Exporter

There are two different exporters for Prometheus. Prometheus Exporter creates an endpoint for Prometheus to scrape the collected metrics, and Prometheus Remote Write Exporter sends the collected metrics directly to an external Prometheus-compatible backend like Cortex. Using Prometheus RW Exporter would free us from having a Prometheus on the cluster.

The OTLP Protocol

OpenTelemetry Protocol (OTLP) specification describes the encoding, transport, and delivery mechanism of telemetry data between telemetry sources, intermediate nodes such as collectors and telemetry backends. Using the sidecar or the DaemonSet deployment mode results in transforming the app specific data into the OTLP so that the Collector receives input in a consistent format. With that, we can easily interchange the whole shipment part with something else.

Downsides

A lot of moving pieces, there are warning messages on every page regarding possible changes and removals.

Example

The easiest way to see the Collector in action is to follow this blog post.

@suleymanakbas91
Copy link
Contributor

Results will be consolidated with the other PoC results in the community repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging Issues or PRs related to the logging module (deprecated)
Projects
None yet
Development

No branches or pull requests

2 participants