[loadbalancingexporter] using the loadbalancingexporter k8s resolved breaks the internal metrics #30697

grzn · 2024-01-21T12:03:31Z

Component(s)

exporter/loadbalancing

What happened?

Description

metrics endpoint fails

Steps to Reproduce

Enable the load balacing exporter

    loadbalancing/traces:
      resolver:
        k8s:
          service: opentelemetry-collector.default

## Expected Result

`curl http://<podip>:9090/metrics` to succeed

## Actual Result

curl http;curl http;wiz@utils-ops-5db4f67695-mtupg:/$ curl http://10.0.46.15:9090/metrics
An error has occurred while serving metrics:

collected metric "otelcol_exporter_queue_size" { label:{name:"exporter" value:"loadbalancing/traces"} label:{name:"service_instance_id" value:"16abe78b-1a05-4f33-909c-cc9e9cb4b73e


### Collector version

0.92.0

### Environment information

## Environment
Docker image `otel/opentelemetry-collector-contrib:0.92.0`

### OpenTelemetry Collector configuration

```yaml
exporters:
    loadbalancing/traces:
      resolver:
        k8s:
          service: opentelemetry-collector.default
      protocol:
        otlp:
          tls:
            insecure: true
  receivers:
    otlp:
      protocols:
        grpc: {}
        http: {}
    prometheus:
      config:
        scrape_configs:
        - job_name: opentelemetry-agent
          scrape_interval: 10s
          static_configs:
          - targets:
            - ${K8S_POD_IP}:9090
  service:
    telemetry:
      logs:
        level: info
      metrics:
        address: 0.0.0.0:9090
    extensions:
    - health_check
    - memory_ballast
    pipelines:
      traces:
        exporters:
        - loadbalancing/traces
        processors:
        - memory_limiter
        - k8sattributes
        - resource
        - resource/add_cluster_name
        - resource/add_environment
        - resourcedetection
        receivers:
        - otlp

Log output

the logs are clean

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-01-21T12:03:48Z

Pinging code owners:

exporter/loadbalancing: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

grzn · 2024-01-21T12:38:04Z

Looks like this is happening after the k8s service restarts and the pods change

grzn · 2024-01-21T21:53:31Z

In our setup, we have an agent daemonset and a collector deployment; the agent sends metrics to the k8s service for that deployment, using the config mentioned in the description.

To reproduce:

start the collector/deployment
start the agent/daemonset
curl to one of the agent pods at http://:8080/metrics, all works
rollout restart the collector/deployment
run the curl again, them metrics endpoint will return an error as mentioned in the issue description

eveyrthing works when doing the same but with the sending queue disabled, e.g.

      loadbalancing/traces:
        protocol:
          otlp:
            retry_on_failure:
              enabled: false
            sending_queue:
              enabled: false
            timeout: 3s
            tls:
              insecure: true
        resolver:
          k8s:
            service: opentelemetry-collector.default

jpkrohling · 2024-01-25T14:26:57Z

Might be related to #16826

Juliaj · 2024-01-28T21:49:37Z

@jpkrohling, we're also hitting this issue with our trace ingestion infra. Our setup is: Tier 1 (2 Otel load balancer, deployment, k8s resolver) -> Tier 2( 3 Otel Collector, statefulset) -> trace storage backend. I can repro this on demand with the steps similar to @grzn outlined above by terminating one of the otel collector at tier 2. I'm debugging the issue at the moment and would like to collaborate if possible.

This doesn't repro if Otel load balancer uses DNS resolver.

Juliaj · 2024-02-02T07:55:05Z

@grzn, a few of us are investigating this. See more info in issue mentioned above #30477.

jpkrohling · 2024-02-14T09:55:48Z

Given that #30477 seems to have been fixed in 0.94.0, can you confirm this is reproducible with the latest version as well, @grzn ?

grzn · 2024-02-15T13:07:57Z

we're running v0.94.0 for a few hours, looks good so far

jpkrohling · 2024-02-19T11:11:56Z

Alright, I'm closing this, but let me know if this needs to be reopened.

grzn added bug Something isn't working needs triage New item requiring triage labels Jan 21, 2024

github-actions bot added the exporter/loadbalancing label Jan 21, 2024

github-actions bot mentioned this issue Jan 23, 2024

Weekly Report: 2024-01-16 - 2024-01-23 #30711

Closed

jpkrohling removed the needs triage New item requiring triage label Jan 25, 2024

jpkrohling self-assigned this Jan 25, 2024

grzn changed the title ~~[loadbalancingexporter] using the loadbalancingexporter breaks the internal metrics~~ [loadbalancingexporter] using the loadbalancingexporter k8s resolved breaks the internal metrics Jan 29, 2024

juissi-t mentioned this issue Jan 31, 2024

500 error when scraping metrics from otel-collector pod when loadbalancing exporter is used #30477

Closed

jpkrohling closed this as completed Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[loadbalancingexporter] using the loadbalancingexporter k8s resolved breaks the internal metrics #30697

[loadbalancingexporter] using the loadbalancingexporter k8s resolved breaks the internal metrics #30697

grzn commented Jan 21, 2024

github-actions bot commented Jan 21, 2024

grzn commented Jan 21, 2024

grzn commented Jan 21, 2024

jpkrohling commented Jan 25, 2024

Juliaj commented Jan 28, 2024 •

edited

Loading

Juliaj commented Feb 2, 2024

jpkrohling commented Feb 14, 2024

grzn commented Feb 15, 2024

jpkrohling commented Feb 19, 2024

[loadbalancingexporter] using the loadbalancingexporter k8s resolved breaks the internal metrics #30697

[loadbalancingexporter] using the loadbalancingexporter k8s resolved breaks the internal metrics #30697

Comments

grzn commented Jan 21, 2024

Component(s)

What happened?

Description

Steps to Reproduce

Log output

Additional context

github-actions bot commented Jan 21, 2024

grzn commented Jan 21, 2024

grzn commented Jan 21, 2024

jpkrohling commented Jan 25, 2024

Juliaj commented Jan 28, 2024 • edited Loading

Juliaj commented Feb 2, 2024

jpkrohling commented Feb 14, 2024

grzn commented Feb 15, 2024

jpkrohling commented Feb 19, 2024

Juliaj commented Jan 28, 2024 •

edited

Loading