-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[loadbalancingexporter] using the loadbalancingexporter k8s resolved breaks the internal metrics #30697
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Looks like this is happening after the k8s service restarts and the pods change |
In our setup, we have an agent daemonset and a collector deployment; the agent sends metrics to the k8s service for that deployment, using the config mentioned in the description. To reproduce:
eveyrthing works when doing the same but with the sending queue disabled, e.g.
|
Might be related to #16826 |
@jpkrohling, we're also hitting this issue with our trace ingestion infra. Our setup is: Tier 1 (2 Otel load balancer, deployment, k8s resolver) -> Tier 2( 3 Otel Collector, statefulset) -> trace storage backend. I can repro this on demand with the steps similar to @grzn outlined above by terminating one of the otel collector at tier 2. I'm debugging the issue at the moment and would like to collaborate if possible. This doesn't repro if Otel load balancer uses DNS resolver. |
we're running v0.94.0 for a few hours, looks good so far |
Alright, I'm closing this, but let me know if this needs to be reopened. |
Component(s)
exporter/loadbalancing
What happened?
Description
metrics endpoint fails
Steps to Reproduce
Enable the load balacing exporter
curl http;curl http;wiz@utils-ops-5db4f67695-mtupg:/$ curl http://10.0.46.15:9090/metrics
An error has occurred while serving metrics:
collected metric "otelcol_exporter_queue_size" { label:{name:"exporter" value:"loadbalancing/traces"} label:{name:"service_instance_id" value:"16abe78b-1a05-4f33-909c-cc9e9cb4b73e
Log output
Additional context
No response
The text was updated successfully, but these errors were encountered: