Compression_error and crashing #9050

jparrattwork · 2023-06-29T19:30:10Z

Component(s)

No response

What happened?

Description

We have an application sending OTLP data to the collector via gRPC. We then send that from the collector to our Splunk backend. The collector seems to stop sending any logs/traces/metrics and both the collector and our application eventually crash.

Steps to Reproduce

Send OTLP via gRPC to the collector

Expected Result

The collector produces no warnings and doesn't crash

Actual Result

The collector prints the following warning over and over
warn zapgrpc/zapgrpc.go:195 [transport] transport: http2Server.HandleStreams failed to read frame: connection error: COMPRESSION_ERROR {"grpc_log": true}

and then eventually crashes

Collector version

v0.68.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") Windows Server 2019
Compiler(if manually compiled): go version go1.19.5 windows/amd64

OpenTelemetry Collector configuration

extensions:
  # Enables health check endpoint for otel collector - https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/healthcheckextension
  health_check:
  # Opens up zpages for dev/debugging - https://github.com/open-telemetry/opentelemetry-collector/tree/main/extension/zpagesextension
  zpages:
    endpoint: localhost:55679

receivers:
  # For dotnet apps
  otlp:
    protocols:
      grpc:
      http:

  # FluentD
  fluentforward:
    endpoint: 0.0.0.0:8006

  # Otel Internal Metrics
  prometheus:
    config:
      scrape_configs:
      - job_name: 'otelcol' # Gets mapped to service.name
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']
  
  # System Metrics
  hostmetrics:
    collection_interval: 10s
    scrapers:
      cpu:
      disk:
      filesystem:
      memory:
      network:
      # System load average metrics https://en.wikipedia.org/wiki/Load_(computing)
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Aggregated system process count metrics
      processes:
      # System processes metrics, disabled by default
      # process:  

processors:
  batch: # Batches data when sending
  resourcedetection:
    detectors: [gce, ecs, ec2, azure, system]
    timeout: 2s
    override: false
  transform/body-empty:
    log_statements:
      - context: log
        statements:
          - set(body, "body-empty") where body == nil
  groupbyattrs:
    keys:
    - service.name
    - service.version
    - host.name
  # Enabling the memory_limiter is strongly recommended for every pipeline.
  # Configuration is based on the amount of memory allocated to the collector.
  # For more information about memory limiter, see
  # https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiter/README.md
  memory_limiter:
    check_interval: 2s
    limit_mib: 256              
 
exporters:
  splunk_hec/logs:
    token: hidden
    endpoint: hidden
    index: hidden
    max_connections: 20
    disable_compression: false
    timeout: 10s
    tls:
      insecure_skip_verify: true
      ca_file: ""
      cert_file: ""
      key_file: ""

  splunk_hec/traces:
    token: hidden
    endpoint: hidden
    index: hidden
    max_connections: 20
    disable_compression: false
    timeout: 10s
    tls:
      insecure_skip_verify: true
      ca_file: ""
      cert_file: ""
      key_file: ""
 
  splunk_hec/metrics:
    token: hidden
    endpoint: hidden
    index: hidden
    max_connections: 20
    disable_compression: false
    timeout: 10s
    tls:
      insecure_skip_verify: true
      ca_file: ""
      cert_file: ""
      key_file: ""      

service:
  # zpages port : 55679

  pipelines:
    logs:
      receivers: [otlp, fluentforward]
      processors: [resourcedetection, transform/body-empty, groupbyattrs, memory_limiter, batch]
      exporters: [splunk_hec/logs]
    metrics:
      receivers: [otlp, hostmetrics]
      processors: [resourcedetection, groupbyattrs, memory_limiter, batch]
      exporters: [splunk_hec/metrics]
    traces:
      receivers: [otlp]
      processors: [resourcedetection, groupbyattrs, memory_limiter, batch]
      exporters: [splunk_hec/traces]

Log output

1.6879002944284434e+09	warn	zapgrpc/zapgrpc.go:195	[transport] transport: http2Server.HandleStreams failed to read frame: connection error: COMPRESSION_ERROR	{"grpc_log": true}

Additional context

No response

atoulme · 2023-06-30T23:01:12Z

Please try the latest release. Which distribution are you using? Which OS are you running on?

Please see here to elevate log level and troubleshoot further: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md

github-actions · 2023-11-13T03:31:52Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

atoulme · 2023-12-06T05:56:11Z

This might relate to #9022. I will transfer the issue over.

jparrattwork added the bug Something isn't working label Jun 29, 2023

bryan-aguilar added the question Further information is requested label Aug 11, 2023

github-actions bot added the Stale label Nov 13, 2023

atoulme transferred this issue from open-telemetry/opentelemetry-collector-contrib Dec 6, 2023

github-actions bot removed the Stale label Dec 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compression_error and crashing #9050

Compression_error and crashing #9050

jparrattwork commented Jun 29, 2023 •

edited

Loading

atoulme commented Jun 30, 2023

github-actions bot commented Nov 13, 2023

atoulme commented Dec 6, 2023

Compression_error and crashing #9050

Compression_error and crashing #9050

Comments

jparrattwork commented Jun 29, 2023 • edited Loading

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

atoulme commented Jun 30, 2023

github-actions bot commented Nov 13, 2023

atoulme commented Dec 6, 2023

jparrattwork commented Jun 29, 2023 •

edited

Loading