Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prometheusremotewrite] invalid temporality and type combination when remote write to thanos backend #15281

Closed
Dongqi-Guo opened this issue Oct 19, 2022 · 25 comments

Comments

@Dongqi-Guo
Copy link

What happened?

Description

I tried to use promethrus remote write to thanos backend and then display metrics on Grafana. I found there are many errors in otel collector log like "Permanent error: invalid temporality and type combination", In result, thanos lacks many metrics used in Grafana dashboard, any idea or solution about this?

Steps to Reproduce

  1. fluentbit node_metrics output
  2. prometheus remote write exporter
  3. thanos as the backend

Expected Result

Actual Result

Collector version

0.61.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

2022-10-19T11:27:07.375+0800    error   exporterhelper/queued_retry.go:395      Exporting failed. The error is not retryable. Dropping data.    {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: invalid temporality and type combination;

Additional context

No response

@Dongqi-Guo Dongqi-Guo added bug Something isn't working needs triage New item requiring triage labels Oct 19, 2022
@evan-bradley evan-bradley added priority:p2 Medium exporter/prometheusremotewrite and removed needs triage New item requiring triage labels Oct 19, 2022
@github-actions
Copy link
Contributor

Pinging code owners: @Aneurysm9. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@krupyansky
Copy link

The problem is reproduced also with VictoriaMetrics as the backend

@krupyansky
Copy link

Clarification

The error occurs because https://github.com/open-telemetry/opentelemetry-go sends metrics that have 0 DataPoints

@Aneurysm9
Copy link
Member

Clarification

The error occurs because https://github.com/open-telemetry/opentelemetry-go sends metrics that have 0 DataPoints

Is there an existing issue on the OTel-Go repo that can be linked here? If not, can you create one with steps to reproduce?

@krupyansky
Copy link

krupyansky commented Oct 31, 2022

Exactly @Aneurysm9

I make issue for https://github.com/open-telemetry/opentelemetry-go

open-telemetry/opentelemetry-go#3394

@montag
Copy link

montag commented Dec 14, 2022

I'm having the same issue with prometheusremotewrite writing to prometheus. The collector is dropping thousands of metrics.

2022-12-14T06:07:04.831Z error exporterhelper/queued_retry.go:394 Exporting failed. The error is not retryable. Dropping data. {“kind”: “exporter”, “data_type”: “metrics”, “name”: “prometheusremotewrite”, “error”: “Permanent error: invalid temporality and type combination”, “dropped_items”: 20}

@krupyansky
Copy link

@montag
Copy link

montag commented Dec 14, 2022

Thanks, @krupyansky. I'm using the otel collector helm chart via terraform.

@krupyansky
Copy link

krupyansky commented Dec 14, 2022

@montag are you sending metrics from your application to otel collectore via https://github.com/open-telemetry/opentelemetry-go?

@montag
Copy link

montag commented Dec 14, 2022

@krupyansky I'm using the python open-telemetry instrumentation libs to send to the otel collector (otlp receiver), which then uses the prometheusremotewrite exporter to push to prom. I see the above error in the collector logs every few minutes.

@krupyansky
Copy link

@montag try to write issue to the python open-telemetry instrumentation libs like my issue open-telemetry/opentelemetry-go#3394

Most likely the error occurs in your case because the python open-telemetry sends metrics that have 0 DataPoints

@montag
Copy link

montag commented Dec 14, 2022

@krupyansky Any idea how I might verify that?

@edawader
Copy link

In same boat as @montag - No idea how we could be sending metrics with 0 Datapoints.

@GorodentsevD
Copy link

GorodentsevD commented Mar 29, 2023

Hi! Could you please clarify, do we have some showstoppers to upgrade opentelemetry-go version with the fix? Got the same bug on the 0.74.0 version of opentelemetry-contrib. And if i am correct, it seems that the collector is using v1.0.0-rc8 opentelemetry-go https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/translator/prometheusremotewrite/go.mod#L11. And version with the fix starts from v1.11.2

@Aneurysm9
Copy link
Member

Hi! Could you please clarify, do we have some showstoppers to upgrade opentelemetry-go version with the fix? Got the same bug on the 0.74.0 version of opentelemetry-contrib. And if i am correct, it seems that the collector is using v1.0.0-rc8 opentelemetry-go https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/translator/prometheusremotewrite/go.mod#L11. And version with the fix starts from v1.11.2

That dependency is on the pdata module from the collector core repository, not on the OTel Go SDK. The issue here is with data produced by an application using the OTel Go SDK and not any use of the SDK within the collector framework or components.

@GorodentsevD
Copy link

Thank you for the answer! Sorry, misunderstand the discussion, now i get it. I have the same issue with Kong statsd plugin

@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jun 5, 2023
@eugene-chernyshenko
Copy link

eugene-chernyshenko commented Jul 24, 2023

I have the same issue with opentelemetry-collector-contrib 0.81.0

2023-07-24T21:17:44.155Z	info	service/service.go:148	Everything is ready. Begin running and processing data.
2023-07-24T21:18:37.504Z	info	MetricsExporter	{"kind": "exporter", "data_type": "metrics", "name": "logging", "resource metrics": 1, "metrics": 2, "data points": 2}
2023-07-24T21:18:37.504Z	info	ResourceMetrics #0
Resource SchemaURL: 
Resource attributes:
     -> telemetry.sdk.language: Str(python)
     -> telemetry.sdk.name: Str(opentelemetry)
     -> telemetry.sdk.version: Str(1.19.0)
     -> namespace: Str(develop)
     -> service.name: Str(scrape)
     -> telemetry.auto.version: Str(0.40b0)
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope opentelemetry.instrumentation.flask 0.40b0
Metric #0
Descriptor:
     -> Name: http.server.active_requests
     -> Description: measures the number of concurrent HTTP requests that are currently in-flight
     -> Unit: requests
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> http.method: Str(GET)
     -> http.host: Str(127.0.0.1:5000)
     -> http.scheme: Str(http)
     -> http.flavor: Str(1.1)
     -> http.server_name: Str(0.0.0.0)
StartTimestamp: 2023-07-24 20:43:23.343899867 +0000 UTC
Timestamp: 2023-07-24 21:18:37.404773474 +0000 UTC
Value: 0
Metric #1
Descriptor:
     -> Name: http.server.duration
     -> Description: measures the duration of the inbound HTTP request
     -> Unit: ms
     -> DataType: Histogram
     -> AggregationTemporality: Cumulative
HistogramDataPoints #0
Data point attributes:
     -> http.method: Str(GET)
     -> http.host: Str(127.0.0.1:5000)
     -> http.scheme: Str(http)
     -> http.flavor: Str(1.1)
     -> http.server_name: Str(0.0.0.0)
     -> net.host.port: Int(5000)
     -> http.status_code: Int(200)
StartTimestamp: 2023-07-24 20:43:23.346501178 +0000 UTC
Timestamp: 2023-07-24 21:18:37.404773474 +0000 UTC
Count: 7
Sum: 16.000000
Min: 1.000000
Max: 3.000000
ExplicitBounds #0: 0.000000
ExplicitBounds #1: 5.000000
ExplicitBounds #2: 10.000000
ExplicitBounds #3: 25.000000
ExplicitBounds #4: 50.000000
ExplicitBounds #5: 75.000000
ExplicitBounds #6: 100.000000
ExplicitBounds #7: 250.000000
ExplicitBounds #8: 500.000000
ExplicitBounds #9: 750.000000
ExplicitBounds #10: 1000.000000
ExplicitBounds #11: 2500.000000
ExplicitBounds #12: 5000.000000
ExplicitBounds #13: 7500.000000
ExplicitBounds #14: 10000.000000
Buckets #0, Count: 0
Buckets #1, Count: 7
Buckets #2, Count: 0
Buckets #3, Count: 0
Buckets #4, Count: 0
Buckets #5, Count: 0
Buckets #6, Count: 0
Buckets #7, Count: 0
Buckets #8, Count: 0
Buckets #9, Count: 0
Buckets #10, Count: 0
Buckets #11, Count: 0
Buckets #12, Count: 0
Buckets #13, Count: 0
Buckets #14, Count: 0
Buckets #15, Count: 0
	{"kind": "exporter", "data_type": "metrics", "name": "logging"}
2023-07-24T21:19:37.422Z	info	MetricsExporter	{"kind": "exporter", "data_type": "metrics", "name": "logging", "resource metrics": 1, "metrics": 2, "data points": 1}
2023-07-24T21:19:37.422Z	info	ResourceMetrics #0
Resource SchemaURL: 
Resource attributes:
     -> telemetry.sdk.language: Str(python)
     -> telemetry.sdk.name: Str(opentelemetry)
     -> telemetry.sdk.version: Str(1.19.0)
     -> namespace: Str(develop)
     -> service.name: Str(scrape)
     -> telemetry.auto.version: Str(0.40b0)
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope opentelemetry.instrumentation.flask 0.40b0
Metric #0
Descriptor:
     -> Name: http.server.active_requests
     -> Description: measures the number of concurrent HTTP requests that are currently in-flight
     -> Unit: requests
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> http.method: Str(GET)
     -> http.host: Str(127.0.0.1:5000)
     -> http.scheme: Str(http)
     -> http.flavor: Str(1.1)
     -> http.server_name: Str(0.0.0.0)
StartTimestamp: 2023-07-24 20:43:23.343899867 +0000 UTC
Timestamp: 2023-07-24 21:19:37.407265471 +0000 UTC
Value: 0
Metric #1
Descriptor:
     -> Name: http.server.duration
     -> Description: measures the duration of the inbound HTTP request
     -> Unit: ms
     -> DataType: Empty
	{"kind": "exporter", "data_type": "metrics", "name": "logging"}
2023-07-24T21:19:37.423Z	error	exporterhelper/queued_retry.go:391	Exporting failed. The error is not retryable. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: invalid temporality and type combination for metric \"http.server.duration\"", "dropped_items": 1}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
	go.opentelemetry.io/collector/exporter@v0.81.0/exporterhelper/queued_retry.go:391
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
	go.opentelemetry.io/collector/exporter@v0.81.0/exporterhelper/metrics.go:125
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
	go.opentelemetry.io/collector/exporter@v0.81.0/exporterhelper/queued_retry.go:195
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
	go.opentelemetry.io/collector/exporter@v0.81.0/exporterhelper/internal/bounded_memory_queue.go:47

@github-actions github-actions bot removed the Stale label Jul 25, 2023
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Oct 16, 2023
@ElementalWarrior
Copy link

We were experiencing this for the python metrics instrumentation.

You can resolve it with a filter processor:

processors:
  filter/empty_http_server_duration:
    error_mode: ignore
    metrics:
      metric:
          - 'name == "http.server.duration" and type != METRIC_DATA_TYPE_HISTOGRAM'
          - 'name == "http.client.duration" and type != METRIC_DATA_TYPE_HISTOGRAM'

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 30, 2024
@francescopotenziani
Copy link

Any updates? I've the same issue of @montag: I'm using:

  • otel/opentelemetry-collector-contrib:0.67.0
  • "@opentelemetry/sdk-node": "0.33.0"

@github-actions github-actions bot removed the Stale label Feb 28, 2024
@Starefossen
Copy link

Getting the same with the most recent version of the collector:

2024-03-05T07:46:50.226Z        error   exporterhelper/common.go:95     Exporting failed. Dropping data.        {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: invalid temporality and type combination for metric \"app_currency_counter\"", "dropped_items": 1}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter.createMetricsExporter.WithQueue.func2.1
        go.opentelemetry.io/collector/exporter@v0.93.0/exporterhelper/common.go:95
go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1
        go.opentelemetry.io/collector/exporter@v0.93.0/exporterhelper/queue_sender.go:117
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue[...]).Consume
        go.opentelemetry.io/collector/exporter@v0.93.0/exporterhelper/internal/bounded_memory_queue.go:57
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*QueueConsumers[...]).Start.func1
        go.opentelemetry.io/collector/exporter@v0.93.0/exporterhelper/internal/consumers.go:43

codeboten pushed a commit that referenced this issue Mar 13, 2024
…lure to translate metrics (#29729)

Don't drop a whole batch in case of failure to
translate from Otel to Prometheus. Instead, with this PR we are trying
to send to Prometheus all the metrics that were properly translated and
create a warning message for failures to translate.

This PR also adds supports to telemetry in this component so that it is
possible to inspect how the translation process is happening and
identify failed translations.

I opted to not include the number of time series that failed translation
because I don't want to make assumptions about how the `FromMetrics`
function works. Instead we are just publishing if there was any failure
during the translation process and the number of time series returned.

**Link to tracking Issue:** #15281

**Testing:** UTs were added to account for the case that you have mixed
metrics, with some succeeding the translation and some failing.

---------

Signed-off-by: Raphael Silva <rapphil@gmail.com>
Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>
Co-authored-by: bryan-aguilar <46550959+bryan-aguilar@users.noreply.github.com>
Co-authored-by: Bryan Aguilar <bryaag@amazon.com>
Copy link
Contributor

github-actions bot commented May 6, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label May 6, 2024
Copy link
Contributor

github-actions bot commented Jul 5, 2024

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests