Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datadog panic's and OTEL collector crashes for empty ExponentialHistogram #26103

Closed
lenin-jaganathan opened this issue Aug 25, 2023 · 10 comments · Fixed by DataDog/opentelemetry-mapping-go#158 or #26355
Assignees
Labels
bug Something isn't working data:metrics Metric related issues exporter/datadog Datadog components priority:p2 Medium

Comments

@lenin-jaganathan
Copy link

Component(s)

exporter/datadog

What happened?

Description

When an empty exponential Histogram is sent with empty buckets, the datadog exporter panic's and crashes the OTEL Collector (using the latest collector-contirb v0.83.0).

Steps to Reproduce

Export a metric datapoint with the below information,

Metric #0
Descriptor:
     -> Name: test_exponential_histogram
     -> Description: Test histogram with empty values
     -> Unit: byte
     -> DataType: ExponentialHistogram
     -> AggregationTemporality: Delta
ExponentialHistogramDataPoints #0
StartTimestamp: 2023-08-25 10:41:00 +0000 UTC
Timestamp: 2023-08-25 10:41:10 +0000 UTC
Count: 0
Sum: 0.000000
Max: 0.000000
Min: 0.000000

Expected Result

Datapoint with value 0 to be sent to the back-end or Datapoint to be dropped gracefully

Actual Result

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x3746eac]

goroutine 110 [running]:
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).mapExponentialHistogramMetrics(0xc0004cbe30, {0x922d858, 0xc000d4b5c0}, {0x9215dc0, 0xc000d74ea0}, 0xc0011f30c8, {0x0?}, 0x1)
        github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.7.0/exponential_histograms_translator.go:158 +0xe2c
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).mapToDDFormat(0xc0004cbe30, {0x922d858, 0xc000d4b5c0}, {0x1016350?}, {0x9215dc0?, 0xc000d74ea0?}, {0xc000d9e440, 0x1, 0x4}, {0xc00069d1a0, ...}, ...)
        github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.7.0/metrics_translator.go:774 +0xc68
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).MapMetrics(0xc0004cbe30, {0x922d858, 0xc000d4b5c0}, {0x0?}, {0x9215dc0, 0xc000d74ea0?})
        github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.7.0/metrics_translator.go:716 +0x97c
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*metricsExporter).PushMetricsData(0xc000c85d40, {0x922d858, 0xc000d4b5c0}, {0xc000de4e10?})
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.83.0/metrics_exporter.go:208 +0x20d
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*metricsExporter).PushMetricsDataScrubbed(0xc000c85d40, {0x922d858?, 0xc000d4b5c0?}, {0x0?})
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.83.0/metrics_exporter.go:181 +0x28
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).Export(0x0?, {0x922d858?, 0xc000d4b5c0?})
        go.opentelemetry.io/collector/exporter@v0.83.0/exporterhelper/metrics.go:54 +0x34
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send(0xc000ee2018, {0x9256d30, 0xc000e03560})
        go.opentelemetry.io/collector/exporter@v0.83.0/exporterhelper/common.go:197 +0x96
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send(0xc000ede460, {0x9256d30?, 0xc000e03560?})
        go.opentelemetry.io/collector/exporter@v0.83.0/exporterhelper/queued_retry.go:355 +0x190
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send(0xc000c8ade0, {0x9256d30, 0xc000e03560})
        go.opentelemetry.io/collector/exporter@v0.83.0/exporterhelper/metrics.go:125 +0x88
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1({0x9256d30, 0xc000e03560})
        go.opentelemetry.io/collector/exporter@v0.83.0/exporterhelper/queued_retry.go:195 +0x39
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1()
        go.opentelemetry.io/collector/exporter@v0.83.0/exporterhelper/internal/bounded_memory_queue.go:47 +0xb6
created by go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers
        go.opentelemetry.io/collector/exporter@v0.83.0/exporterhelper/internal/bounded_memory_queue.go:42 +0x45

Collector version

0.83.0

Environment information

Environment

OS: MacOs/Ubuntu

OpenTelemetry Collector configuration

extensions:
  health_check:
  zpages:
    endpoint: 0.0.0.0:55679
receivers:
  otlp:
    protocols:
      grpc:
      http:
  zipkin:
processors:
  batch:
  memory_limiter:
    check_interval: 1s
    limit_mib: 8000
    spike_limit_mib: 2000
exporters:
  logging:
    verbosity: detailed
    sampling_initial: 5
    sampling_thereafter: 10
  datadog:
    hostname: somehost
    metrics:
      endpoint: https://somehost
    api:
      key: <key>
    tls:
      insecure_skip_verify: true
service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging]
  extensions: [health_check, zpages]

Log output

No response

Additional context

No response

@lenin-jaganathan lenin-jaganathan added bug Something isn't working needs triage New item requiring triage labels Aug 25, 2023
@github-actions github-actions bot added the exporter/datadog Datadog components label Aug 25, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@songy23 songy23 added priority:p2 Medium and removed needs triage New item requiring triage labels Aug 25, 2023
@mx-psi mx-psi self-assigned this Aug 28, 2023
@mx-psi
Copy link
Member

mx-psi commented Aug 28, 2023

Thanks for the report @lenin-jaganathan, assigning to me to fix this by dropping empty points.

@mx-psi mx-psi added the data:metrics Metric related issues label Aug 28, 2023
@lenin-jaganathan
Copy link
Author

I am not sure how it generally works in DataDog (also what OpenTelemetry suggests).

Do we need to drop the 0 valued datapoints or ingest them as "0"'s?

@mx-psi
Copy link
Member

mx-psi commented Aug 29, 2023

I am not sure how it generally works in DataDog (also what OpenTelemetry suggests).

For Datadog distributions coming from DogStatsD, dropping would be the closest behavior (IIUC the client library just doesn't produce a point in this case). In the case of OpenTelemetry, I don't think there is a prescription as to what to do here, this is supported for OTLP Histograms (and based on the wording of the spec I would guess also OTLP Exponential Histograms) but the spec generally does not give guidance as to how to map its types other than for well-known systems such as Prometheus.

Do we need to drop the 0 valued datapoints or ingest them as "0"'s?

I believe at present we have no choice but to drop the point. The Datadog distributions payload our intake accepts has an 'average' field, which would have an indeterminate value for this kind of empty points (since it would be $\frac00$). We may be able to change the payload eventually but I want to provide a shorter term solution since a crash is pretty bad.

@lenin-jaganathan
Copy link
Author

lenin-jaganathan commented Aug 29, 2023

I want to provide a shorter-term solution since a crash is pretty bad.

On, Full agreement here.

In the case of OpenTelemetry, I don't think there is a prescription as to what to do here

I would hope so since this is a consumer-side concern. I brought it up because I have seen some back-end using different aggregation windows and some have problems with non-continuous data streams. But, If data dog can handle that no real concern here.

@mx-psi
Copy link
Member

mx-psi commented Aug 29, 2023

I brought it up because I have seen some back-end using different aggregation windows and some have problems with non-continuous data streams. But, If data dog can handle that no real concern here.

I don't think these would be an issue in Datadog's case, but I am open to feedback if you find any issues when trying out the fixed version

@mx-psi
Copy link
Member

mx-psi commented Aug 29, 2023

DataDog/opentelemetry-mapping-go/pull/158 fixes this, my intention is for this to be available on v0.84.0 v0.85.0

@lenin-jaganathan
Copy link
Author

@mx-psi Did this not make the cut for 0.84?

@mx-psi
Copy link
Member

mx-psi commented Aug 30, 2023

Sorry, v0.85.0 🤦, it was already too late for v0.84.0 when I wrote this message

mx-psi added a commit that referenced this issue Aug 31, 2023
**Description:** 

Bumps `github.com/DataDog/opentelemetry-mapping-go` to v0.8.0, fixes
crash with empty exponential histograms.

**Link to tracking Issue:** Fixes #26103
@mx-psi
Copy link
Member

mx-psi commented Aug 31, 2023

After merging #26355 I can confirm the fix will be available on v0.85.0. Apologies for the confusion on my previous message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data:metrics Metric related issues exporter/datadog Datadog components priority:p2 Medium
Projects
None yet
3 participants