Dropping lables configured in scraping config drops more metrics than requested #36061

xzizka · 2024-10-29T11:55:33Z

Component(s)

receiver/prometheus

What happened?

Description

This is very much related to the issue mentioned here: #36060
The environment is the same, the only difference is the config.

The scraping is configured like this:

...
          scrape_configs:
          - job_name: integrations/kubernetes/kubelet
            scrape_interval: 15s
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
                - role: node
            metric_relabel_configs:
                - source_labels: [__name__]
                  regex: '(kubelet_pleg_relist_duration_seconds_bucket|kubelet_pleg_relist_duration_seconds_count|kubelet_pleg_relist_interval_seconds_.*)'
                  action: keep
                - action: labeldrop
                  regex: container_id|id|image_id|uid
            relabel_configs:
...

This scraping should return the following metrics:

kubelet_pleg_relist_interval_seconds_count
kubelet_pleg_relist_interval_seconds_sum
kubelet_pleg_relist_interval_seconds_bucket
kubelet_pleg_relist_duration_seconds_bucket
kubelet_pleg_relist_duration_seconds_count

These metrics are then visible in Prometheus:

Debug output from the collector log.

testuser@testvm:~/otel-logs $ kubectl logs -l component=otel-collector --follow | grep pleg
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
...

If I add a drop action to the scraping for kubelet_pleg_relist_interval_seconds_count and kubelet_pleg_relist_interval_seconds_sum:

...
          scrape_configs:
          - job_name: integrations/kubernetes/kubelet
            scrape_interval: 15s
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
                - role: node
            metric_relabel_configs:
                - source_labels: [__name__]
                  regex: '(kubelet_pleg_relist_duration_seconds_bucket|kubelet_pleg_relist_duration_seconds_count|kubelet_pleg_relist_interval_seconds_.*)'
                  action: keep
                - action: labeldrop
                  regex: container_id|id|image_id|uid
                - source_labels: [__name__]
                  regex: '(kubelet_pleg_relist_interval_seconds_count|kubelet_pleg_relist_interval_seconds_sum)'
                  action: drop
            relabel_configs:
...

Then all the kubelet_pleg_relist_interval_seconds_.+ metrics are removed and not visible in Prometheus/Grafana.

Debug output from the collector log.

testuser@testvm:~/otel-logs $ kubectl logs -l component=otel-collector --follow | grep pleg
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
...

Steps to Reproduce

Use the config above for opentelemetry-contrib metrics collection.

Expected Result

To drop just the requested metrics.

Actual Result

Instead of dropping just kubelet_pleg_relist_interval_seconds_count and kubelet_pleg_relist_interval_seconds_sum metric also kubelet_pleg_relist_interval_seconds_bucket is dropped.

Collector version

0.112.0-amd64

Environment information

Environment

K8S 1.29
K8S 1.30

OpenTelemetry Collector configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-monitoring-collector-conf
  namespace: otel-system
  labels:
    app: opentelemetry
    component: otel-monitoring-collector-conf
data:
  otel-monitoring-collector-config: |
    exporters:
      prometheusremotewrite:
        endpoint: https://prometheus-dev:28080/api/v1/push
        tls:
          insecure_skip_verify: true
        headers: 
          X-Scope-OrgID: k8s-nprod-otel
        external_labels:
          cluster: "k8s-nprod-2856"
          otel_component: "otel-collector"
      debug/metrics:
        verbosity: detailed
    receivers:
      prometheus:
        config:
          scrape_configs:
          - job_name: integrations/kubernetes/kubelet
            scrape_interval: 15s
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
                - role: node
            metric_relabel_configs:
                - source_labels: [__name__]
                  regex: '(kubelet_pleg_relist_duration_seconds_bucket|kubelet_pleg_relist_duration_seconds_count|kubelet_pleg_relist_interval_seconds_.*)'
                  action: keep
                - action: labeldrop
                  regex: container_id|id|image_id|uid
            relabel_configs:
                - replacement: kubernetes.default.svc.cluster.local:443
                  target_label: __address__
                - regex: (.+)
                  replacement: /api/v1/nodes/$${1}/proxy/metrics
                  source_labels:
                    - __meta_kubernetes_node_name
                  target_label: __metrics_path__
            scheme: https
            tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                insecure_skip_verify: false
                server_name: kubernetes
    processors:
      batch/metrics:
      memory_limiter/metrics:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 20
    extensions:
      health_check:
        endpoint: ${env:MY_POD_IP}:13133
    service:
      extensions: [health_check]
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: [memory_limiter/metrics, batch/metrics]
          exporters: [debug/metrics, prometheusremotewrite]

Log output

Logs are mentioned in the description part of this issue.

Additional context

If we call the scraping with Prometheus agent, it works as expected.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-29T11:55:51Z

Pinging code owners:

receiver/prometheus: @Aneurysm9 @dashpole

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dashpole · 2024-10-29T12:09:47Z

Those timeseries aren't dropped--they are just combined into an OpenTelemetry histogram. The actual name of your metrics are kubelet_pleg_relist_duration_seconds and kubelet_pleg_relist_interval_seconds (grep your /metrics endpoint for # TYPE kubelet_pleg_relist_duration_seconds or # TYPE kubelet_pleg_relist_interval_seconds.

Prometheus represents a histogram metric using multiple timeseries, with _bucket, _sum, and _count suffixes to denote the bucket counts, and the overall sum and count. OpenTelemetry represents a histogram metric using a complex type, in which the bucket counts, sum and count are all part of one data structure.

Prometheus relabel rules work on the timeseries, which means your initial configuration was correct.

The prometheusremotewrite exporter adds the _bucket, _sum, and _count suffixes when it converts back to Prometheus' representation of the histogram. For example, this appends the _bucket suffix to the histogram series name:

opentelemetry-collector-contrib/pkg/translator/prometheusremotewrite/helper.go

Line 256 in 6917453

labels := createLabels(baseName+bucketStr, baseLabels, leStr, boundStr)

The debug exporter is printing out the OpenTelemetry representation of the histogram, which does not have Prometheus' suffixes.

github-actions · 2024-10-29T12:10:18Z

Pinging code owners for exporter/prometheusremotewrite: @Aneurysm9 @rapphil @dashpole. See Adding Labels via Comments if you do not have permissions to add labels yourself.

xzizka · 2024-10-29T15:22:32Z

Thank you, @dashpole, for your answer. I did a few tests with my colleagues, and I think we understand how OTEL is different in this in comparison to "pure" Prometheus.
I think, your answer also answers my other question here (#36060), so I will go ahead and close it as well.

Thank you for your explanation.

xzizka added bug Something isn't working needs triage New item requiring triage labels Oct 29, 2024

github-actions bot added the receiver/prometheus Prometheus receiver label Oct 29, 2024

xzizka mentioned this issue Oct 29, 2024

[receiver/prometheus] Dropping lables configured in scraping config drops more metrics than requested. open-telemetry/opentelemetry-collector#11536

Closed

dashpole added question Further information is requested exporter/prometheusremotewrite and removed bug Something isn't working needs triage New item requiring triage labels Oct 29, 2024

xzizka closed this as completed Oct 29, 2024

xzizka mentioned this issue Oct 29, 2024

Prometheus receiver does not collect _bucket and _sum when explicitly mentioned in scraping config #36060

Closed

github-actions bot mentioned this issue Nov 5, 2024

Weekly Report: 2024-10-29 - 2024-11-05 #36187

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dropping lables configured in scraping config drops more metrics than requested #36061

Dropping lables configured in scraping config drops more metrics than requested #36061

xzizka commented Oct 29, 2024

github-actions bot commented Oct 29, 2024

dashpole commented Oct 29, 2024

github-actions bot commented Oct 29, 2024

xzizka commented Oct 29, 2024