Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/loadbalancing] couldn't find the exporter for the endpoint bug #35153

Open
Frapschen opened this issue Sep 12, 2024 · 4 comments
Open
Assignees
Labels

Comments

@Frapschen
Copy link
Contributor

Component(s)

exporter/loadbalancing

What happened?

Description

Using this exporter encounters a bug. I have tried dsn and k8s resolver; both cause bugs.

use dns resolver, config:

  loadbalancing/jaeger:
    routing_key: "traceID"
    protocol:
      otlp:
        # all options from the OTLP exporter are supported, except the endpoint
        timeout: 1s
        tls:
          insecure: true
        sending_queue:
          enabled: true
          storage: file_storage/all_settings
        retry_on_failure:
          enabled: true
          max_elapsed_time: 500s
    resolver:
      dns:
        hostname: jaeger-collector-headless.insight-system.svc.cluster.local
        port: "4317"
        interval: 1s
        timeout: 200ms

It's error logs:
img_v3_02el_0ff18611-890a-42bf-b8aa-598edaf5394g

using k8s resolver config:

  loadbalancing/jaeger:
    routing_key: "traceID"
    protocol:
      otlp:
        # all options from the OTLP exporter are supported, except the endpoint
        timeout: 1s
        tls:
          insecure: true
        sending_queue:
          enabled: true
          storage: file_storage/all_settings
        retry_on_failure:
          enabled: true
          max_elapsed_time: 500s
    resolver:
      k8s:
        service: insight-jaeger-collector
        ports:
          - 4317

It's error logs:
image

I have noticed this:

// something is really wrong... how come we couldn't find the exporter??

It also shows that something is wrong with the code.

Collector version

v0.109.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@Frapschen Frapschen added bug Something isn't working needs triage New item requiring triage labels Sep 12, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@atoulme
Copy link
Contributor

atoulme commented Nov 9, 2024

Please provide your complete configuration file and indicate the version of the collector used.

@atoulme atoulme added waiting for author and removed needs triage New item requiring triage labels Nov 9, 2024
@jpkrohling jpkrohling self-assigned this Dec 2, 2024
@jpkrohling
Copy link
Member

jpkrohling commented Dec 2, 2024

It looks like I can reproduce this with a fairly simple config, like:

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otelcol-loadbalancer
spec:
  image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.114.0
  config:
    receivers:
      otlp:
        protocols:
          grpc: {}

    exporters:
      loadbalancing:
        protocol:
          otlp:
            timeout: 1s
            tls:
              insecure: true
            sending_queue:
              enabled: true
              storage: file_storage/all_settings
            retry_on_failure:
              enabled: true
              max_elapsed_time: 500s
        resolver:
          dns:
            hostname: otelcol-backend-collector-headless

    service:
      extensions: [ ]
      pipelines:
        traces:
          receivers:  [ otlp ]
          processors: [  ]
          exporters:  [ loadbalancing ]

And with this backend:

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otelcol-backend
spec:
  image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.114.0
  replicas: 10
  config:
    receivers:
      otlp:
        protocols:
          grpc: {}

    exporters:
      debug: {}

    service:
      extensions: [ ]
      pipelines:
        traces:
          receivers:  [ otlp  ]
          processors: [  ]
          exporters:  [ debug ]

This seems to be connected to the persistent sending queue, as removing it confirms that it works.
When everything is working, the following metrics are available:

# HELP otelcol_loadbalancer_backend_outcome Number of successes and failures for each endpoint.
# TYPE otelcol_loadbalancer_backend_outcome counter
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.31:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.32:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.33:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.34:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.35:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.36:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.37:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.38:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.39:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.40:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261

With the sending queue, all spans end up being refused, like this:

otelcol_receiver_refused_spans{receiver="otlp",service_instance_id="e3fc5786-a50e-4df2-bd6f-71167e37a26e",service_name="otelcol-contrib",service_version="0.114.0",transport="grpc"} 1536

Copy link
Contributor

github-actions bot commented Feb 3, 2025

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants