Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bridges/prometheus] "unknown aggregation" error when transforming ScopeMetrics #5296

Closed
lukasmalkmus opened this issue Mar 19, 2024 · 4 comments · Fixed by open-telemetry/opentelemetry-go#5100
Labels
bug Something isn't working
Milestone

Comments

@lukasmalkmus
Copy link

lukasmalkmus commented Mar 19, 2024

Description

I'm using Prometheus to instrument a large codebase. Instead of migrating to the OTel SDK, a simpler alternative (for now) seems to be using the Prometheus bridge.

While this works, I'm frequently seeing an error reported through the error handler set via otel.SetErrorHandler:

1 errors occurred transforming ScopeMetrics:
    * invalid metric (name: "go_gc_duration_seconds", description: "A summary of the pause duration of garbage collection cycles.", unit: ""): unknown aggregation: metricdata.Summary

The metric causing problems is from the "default" Prometheus Go Collector: https://github.com/prometheus/client_golang/blob/release-1.19/prometheus/go_collector.go#L266.

I'm creating the metric reader like this:

reader := sdkmetric.NewPeriodicReader(
	exporter, // OTLP+HTTP exporter pointing to a backend of my choice, could as well be OTel Collector.
	sdkmetric.WithProducer(prometheus.NewMetricProducer()),
)),

Environment

  • OS: Linux
  • Architecture: x86
  • Go Version: 1.21
  • go.opentelemetry.io/contrib version: v0.49.0 (go.opentelemetry.io/contrib/bridges/prometheus)

Steps To Reproduce

Still lacking a proper reproducer, will provide one ASAP.

Expected behaviour

I expected no error to be reported.

@lukasmalkmus lukasmalkmus added the bug Something isn't working label Mar 19, 2024
@dashpole
Copy link
Contributor

Thanks for reporting this. I believe the issue is that the prometheus bridge is properly translating the prometheus summary to an OTel summary, but the OTLP exporter doesn't support summary metrics (yet).

I believe the error is coming from here: https://github.com/open-telemetry/opentelemetry-go/blob/a5172ab422f4b9e09e8cfeba0e986f5f90c26092/exporters/otlp/otlpmetric/otlpmetrichttp/internal/transform/error.go#L76

Summary doesn't appear to be listed here: https://github.com/open-telemetry/opentelemetry-go/blob/a5172ab422f4b9e09e8cfeba0e986f5f90c26092/exporters/otlp/otlpmetric/otlpmetrichttp/internal/transform/metricdata.go#L77

@lukasmalkmus
Copy link
Author

Thanks @dashpole, that makes total sense!

I wasn't paying attention apparently! Summary != Sum. Given that Summary is considered legacy, I wonder if the SDK will ever support it?

Why is the bridge not using Sum? Is it non-trivial to express a Prometheus Summary using Sum?

@dashpole
Copy link
Contributor

A summary is similar to a histogram. Like a histogram, it has a Sum of observations, and a Count of observations. But instead of buckets (which have the count of observations that fell in each range), it directly computes the quantile (e.g. 95th percentile). You can see why histograms are generally recommended over summaries here: https://prometheus.io/docs/practices/histograms/#histograms-and-summaries. Summary metrics in OTel exist primarily to support existing Prometheus instrumentation, which has many summary-typed metrics (as you've discovered).

The underlying issue for this metric is actually that the stats provided by the go runtime have quantiles (like a summary), rather than buckets (like a histogram): https://pkg.go.dev/runtime/debug#GCStats.

I don't expect OpenTelemetry to add summary support to the API or SDK, as users should generally use Histograms (or even better, exponential histograms). But they may come from bridges.

@dashpole
Copy link
Contributor

You can also see the fix here: open-telemetry/opentelemetry-go#5100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants