-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTelemetry output plugin #9228
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to distance this a bit from prometheus if possible (otel has no direct relationship). Could also use tests.
@ssoroka friendly reminder |
This change adds a receiver for [InfluxData line protocol](https://docs.influxdata.com/influxdb/v2.0/reference/syntax/line-protocol/), the native wire transfer protocol for [Telegraf](https://www.influxdata.com/time-series-platform/telegraf/) and [InfluxDB](https://www.influxdata.com/products/influxdb/). The conversion logic lives in [package `influx2otel`](https://github.com/influxdata/influxdb-observability/tree/main/influx2otel), and is also imported by the [proposed Telegraf OpenTelemetry output plugin](influxdata/telegraf#9228). The objective of this strategy is to create a canonical mapping, and an open-source implementation, of OpenTelemetry <-> Telegraf/InfluxDB translation. **Link to tracking Issue:** #2951 **Testing:** The translation logic is tested in [package `influx2otel`](https://github.com/influxdata/influxdb-observability/tree/main/influx2otel). **Documentation:** README.md included in new receiver.
@jacobmarble this output plugins looks good, but the library it is using is marked experimental. Do you think this plugin is ready for end users and be planned to ship with the next Telegraf feature release? |
I ran a manual test using telegraf with this plugin to export data to an opentelemetry collector locally and didn't see any data being generated. here's the output from the telegraf logs:
and the data as seen on the collector side:
Haven't dug into the code to understand what is happening yet though. Please let me know if other details would be helpful. |
I'll work on tests to reproduce and prevent future regression. |
Fair point. I have added integration tests to the library, removed "experimental" markers, and released v0.1.0. That exposed one bug and a fix (testing FTW!). |
@codeboten I wrote two sets of tests:
All of the above tests work as intended. Your error "gauge field not found" is a schema issue. I suspect you are consuming metrics with a line protocol schema that is incompatible with your chosen If you are using a schema different to these two, then we need to agree on a strategy for converting other schemata. Not a hard thing, but could we punt it to follow-on issues and PRs against this plugin? |
@ssoroka @codeboten PTAL |
should this be possible? shouldn't all line protocol be convertible to valid OTEL ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look fine. are there supposed to be test files now?
I agree, any valid line protocol should find its way into some kind of Otel struct. In the example shared by @codeboten , I would like to punt this to influxdata/influxdb-observability#2 |
I have added various tests to the library responsible for conversion. This plugin adapts the conversion library for Telegraf, so there isn't a lot to test. Share your thoughts? |
I tested locally again with both schema options and both are giving me errors. With
With
This is testing with a default configuration generated by running
|
I should add that I recompiled telegraf using the following commit: |
This PR is taking a long time. I'm sorry about that. I have reproduced the bug that @codeboten has shared. It's clear that requiring a Telegraf/Prometheus schema in this plugin is unreasonable. To properly fix this issue, I'll do a refactor to |
@codeboten PTAL Turns out that many input plugins type their output (gauge, counter, etc) without any particular schema, so that needed to be accounted for. The |
We're pretty bound by backwards-compatibility; which can make it hard to change later without breaking things. Is there a better option than gauge by default? Is this a case of missing configuration? Edit: I see you're making a change for it to infer the schema from the metric. Seems like a good call. |
@ssoroka there is no "untyped" metric in OpenTelemetry, unfortunately. |
The failing tests appear to be flakey. Let me know if I'm wrong about that. |
Looks like new artifacts were built from this PR. Get them here!Artifact URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've confirmed that metrics are now being generated, thanks for addressing that. Only question i have remaining is regarding the headers
configuration option.
tls.ClientConfig | ||
Timeout config.Duration `toml:"timeout"` | ||
Compression string `toml:"compression"` | ||
Headers map[string]string `toml:"headers"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing where this is currently used, these headers should be setting grpc metadata in the client connections. See https://github.com/influxdata/telegraf/pull/9109/files#diff-2702903f71c7896e7e1d358e50075b5dc48150603438ee5eaea24948b8230d18R111
Merged! thanks @jacobmarble. @codeboten let's fix that in a separate PR |
* origin/master: (183 commits) fix: CrateDB replace dots in tag keys with underscores (influxdata#9566) feat: Pull metrics from multiple AWS CloudWatch namespaces (influxdata#9386) fix: improve Clickhouse corner cases for empty recordset in aggregation queries, fix dictionaries behavior (influxdata#9401) fix(opcua): clean client on disconnect so that connect works cleanly (influxdata#9583) fix: Refactor ec2 init for config-api (influxdata#9576) fix: sort logs by timestamp before writing to Loki (influxdata#9571) fix: muting tests for udp_listener (influxdata#9578) fix: Do not return on disconnect to avoid breaking reconnect (influxdata#9524) fix: Fixing k8s nodes and pods parsing error (influxdata#9581) feat: OpenTelemetry output plugin (influxdata#9228) feat: Support AWS Web Identity Provider (influxdata#9411) fix: upgraded sensu/go to v2.9.0 (influxdata#9577) fix: Normalize unix socket path (influxdata#9554) docs: fix aws ec2 readme inconsistency (influxdata#9567) feat: Modbus Rtu over tcp enhancement (influxdata#9570) docs: information on new conventional commit format (influxdata#9573) docs: Add logo (influxdata#9574) docs: Adding links to net_irtt and dht_sensor external plugins (influxdata#9569) Upgrade hashicorp/consul/api to 1.9.1 (influxdata#9565) Update vmware/govmomi to v0.26.0 (influxdata#9552) Do not skip good quality nodes after a bad quality node is encountered (influxdata#9550) fix test so it hits a fake service (influxdata#9564) Update changelog Fix procstat plugin README to match sample config (influxdata#9553) Fix metrics reported as written but not actually written (influxdata#9526) Prevent segfault in persistent volume claims (influxdata#9549) Update procstat to support cgroup globs & include systemd unit children (Copy of influxdata#7890) (influxdata#9488) Fix attempt to connect to an empty list of servers. (influxdata#9503) Fix handling bool in sql input plugin (influxdata#9540) Suricata alerts (influxdata#9322) Linter fixes for plugins/inputs/[fg]* (influxdata#9387) For Prometheus Input add ability to query Consul Service catalog (influxdata#5464) Support Landing page on Prometheus landing page (influxdata#8641) [Docs] Clarify tagging behavior (influxdata#9461) Change the timeout from all queries to per query (influxdata#9471) Attach the pod labels to the `kubernetes_pod_volume` & `kubernetes_pod_network` metrics. (influxdata#9438) feat(http_listener_v2): allows multiple paths and add path_tag (influxdata#9529) Bug Fix Snmp empty metric name (influxdata#9519) Worktable workfile stats (influxdata#8587) Update Go to v1.16.6 (influxdata#9542) ...
Is there any plugin that converts Influx Line Protocol Metrix to OTEL Traces. |
@shruthikudlur There is the OpenTelemetry output plugin |
Related:
This change adds an output plugin opentelemetry, which sends OTLP signals via gRPC.
OpenTelemetry is an open-source telemetry ecosystem, supporting traces, metrics and logs. The mapping of the OpenTelemetry data model to line protocol is a work-in-progress, and I welcome any feedback to that document.
The core conversion logic lives in influxdata/influxdb-observability/influx2otel, intended to stand as a canonical implementation of line protocol->OTel.
Note that, unlike the OpenTelemetry input plugin, this output plugin only supports metrics, and not traces or logs. That may be added later.