Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropping lables configured in scraping config drops more metrics than requested #36061

Closed
xzizka opened this issue Oct 29, 2024 · 4 comments
Closed
Labels
exporter/prometheusremotewrite question Further information is requested receiver/prometheus Prometheus receiver

Comments

@xzizka
Copy link

xzizka commented Oct 29, 2024

Component(s)

receiver/prometheus

What happened?

Description

This is very much related to the issue mentioned here: #36060
The environment is the same, the only difference is the config.

The scraping is configured like this:

...
          scrape_configs:
          - job_name: integrations/kubernetes/kubelet
            scrape_interval: 15s
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
                - role: node
            metric_relabel_configs:
                - source_labels: [__name__]
                  regex: '(kubelet_pleg_relist_duration_seconds_bucket|kubelet_pleg_relist_duration_seconds_count|kubelet_pleg_relist_interval_seconds_.*)'
                  action: keep
                - action: labeldrop
                  regex: container_id|id|image_id|uid
            relabel_configs:
...

This scraping should return the following metrics:

  • kubelet_pleg_relist_interval_seconds_count
  • kubelet_pleg_relist_interval_seconds_sum
  • kubelet_pleg_relist_interval_seconds_bucket
  • kubelet_pleg_relist_duration_seconds_bucket
  • kubelet_pleg_relist_duration_seconds_count

These metrics are then visible in Prometheus:
image

Debug output from the collector log.

testuser@testvm:~/otel-logs $ kubectl logs -l component=otel-collector --follow | grep pleg
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_interval_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
...

If I add a drop action to the scraping for kubelet_pleg_relist_interval_seconds_count and kubelet_pleg_relist_interval_seconds_sum:

...
          scrape_configs:
          - job_name: integrations/kubernetes/kubelet
            scrape_interval: 15s
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
                - role: node
            metric_relabel_configs:
                - source_labels: [__name__]
                  regex: '(kubelet_pleg_relist_duration_seconds_bucket|kubelet_pleg_relist_duration_seconds_count|kubelet_pleg_relist_interval_seconds_.*)'
                  action: keep
                - action: labeldrop
                  regex: container_id|id|image_id|uid
                - source_labels: [__name__]
                  regex: '(kubelet_pleg_relist_interval_seconds_count|kubelet_pleg_relist_interval_seconds_sum)'
                  action: drop
            relabel_configs:
...

Then all the kubelet_pleg_relist_interval_seconds_.+ metrics are removed and not visible in Prometheus/Grafana.
image

Debug output from the collector log.

testuser@testvm:~/otel-logs $ kubectl logs -l component=otel-collector --follow | grep pleg
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
     -> Name: kubelet_pleg_relist_duration_seconds
...

Steps to Reproduce

Use the config above for opentelemetry-contrib metrics collection.

Expected Result

To drop just the requested metrics.

Actual Result

Instead of dropping just kubelet_pleg_relist_interval_seconds_count and kubelet_pleg_relist_interval_seconds_sum metric also kubelet_pleg_relist_interval_seconds_bucket is dropped.

Collector version

0.112.0-amd64

Environment information

Environment

K8S 1.29
K8S 1.30

OpenTelemetry Collector configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-monitoring-collector-conf
  namespace: otel-system
  labels:
    app: opentelemetry
    component: otel-monitoring-collector-conf
data:
  otel-monitoring-collector-config: |
    exporters:
      prometheusremotewrite:
        endpoint: https://prometheus-dev:28080/api/v1/push
        tls:
          insecure_skip_verify: true
        headers: 
          X-Scope-OrgID: k8s-nprod-otel
        external_labels:
          cluster: "k8s-nprod-2856"
          otel_component: "otel-collector"
      debug/metrics:
        verbosity: detailed
    receivers:
      prometheus:
        config:
          scrape_configs:
          - job_name: integrations/kubernetes/kubelet
            scrape_interval: 15s
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
                - role: node
            metric_relabel_configs:
                - source_labels: [__name__]
                  regex: '(kubelet_pleg_relist_duration_seconds_bucket|kubelet_pleg_relist_duration_seconds_count|kubelet_pleg_relist_interval_seconds_.*)'
                  action: keep
                - action: labeldrop
                  regex: container_id|id|image_id|uid
            relabel_configs:
                - replacement: kubernetes.default.svc.cluster.local:443
                  target_label: __address__
                - regex: (.+)
                  replacement: /api/v1/nodes/$${1}/proxy/metrics
                  source_labels:
                    - __meta_kubernetes_node_name
                  target_label: __metrics_path__
            scheme: https
            tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                insecure_skip_verify: false
                server_name: kubernetes
    processors:
      batch/metrics:
      memory_limiter/metrics:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 20
    extensions:
      health_check:
        endpoint: ${env:MY_POD_IP}:13133
    service:
      extensions: [health_check]
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: [memory_limiter/metrics, batch/metrics]
          exporters: [debug/metrics, prometheusremotewrite]

Log output

Logs are mentioned in the description part of this issue.

Additional context

If we call the scraping with Prometheus agent, it works as expected.

@xzizka xzizka added bug Something isn't working needs triage New item requiring triage labels Oct 29, 2024
@github-actions github-actions bot added the receiver/prometheus Prometheus receiver label Oct 29, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dashpole
Copy link
Contributor

Those timeseries aren't dropped--they are just combined into an OpenTelemetry histogram. The actual name of your metrics are kubelet_pleg_relist_duration_seconds and kubelet_pleg_relist_interval_seconds (grep your /metrics endpoint for # TYPE kubelet_pleg_relist_duration_seconds or # TYPE kubelet_pleg_relist_interval_seconds.

Prometheus represents a histogram metric using multiple timeseries, with _bucket, _sum, and _count suffixes to denote the bucket counts, and the overall sum and count. OpenTelemetry represents a histogram metric using a complex type, in which the bucket counts, sum and count are all part of one data structure.

Prometheus relabel rules work on the timeseries, which means your initial configuration was correct.

The prometheusremotewrite exporter adds the _bucket, _sum, and _count suffixes when it converts back to Prometheus' representation of the histogram. For example, this appends the _bucket suffix to the histogram series name:

labels := createLabels(baseName+bucketStr, baseLabels, leStr, boundStr)

The debug exporter is printing out the OpenTelemetry representation of the histogram, which does not have Prometheus' suffixes.

@dashpole dashpole added question Further information is requested exporter/prometheusremotewrite and removed bug Something isn't working needs triage New item requiring triage labels Oct 29, 2024
Copy link
Contributor

Pinging code owners for exporter/prometheusremotewrite: @Aneurysm9 @rapphil @dashpole. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@xzizka
Copy link
Author

xzizka commented Oct 29, 2024

Thank you, @dashpole, for your answer. I did a few tests with my colleagues, and I think we understand how OTEL is different in this in comparison to "pure" Prometheus.
I think, your answer also answers my other question here (#36060), so I will go ahead and close it as well.

Thank you for your explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exporter/prometheusremotewrite question Further information is requested receiver/prometheus Prometheus receiver
Projects
None yet
Development

No branches or pull requests

2 participants