Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus scraper ignores __address__ overrides when it is used with kubernetes_sd_configs=pod #32306

Closed
sfc-gh-aivanou opened this issue Apr 10, 2024 · 7 comments
Labels
bug Something isn't working needs triage New item requiring triage

Comments

@sfc-gh-aivanou
Copy link

sfc-gh-aivanou commented Apr 10, 2024

Component(s)

Prometheus scraper

What happened?

Description

When prometheus scraper is used with kubernetes_sd_configs=pod config :

          scrape_configs:
            - job_name: "my-job"
              scrape_interval: 10s
              metrics_path: /metrics
              kubernetes_sd_configs:
                - role: pod
              relabel_configs:
                - source_labels: [__address__]
                  regex: ^(.*):\d+$
                  target_label: __address__
                  replacement: $$1:9000

It ignores relabel_configs and does not override port to 9000.

Steps to Reproduce

K8s configmap with otel config:

apiVersion: v1
kind: ConfigMap
metadata:
  name: test-conf
  labels:
    test: value
data:
  config.yaml: |
    receivers:
      prometheus:
        config:
          scrape_configs:
            - job_name: "my-job"
              scrape_interval: 10s
              metrics_path: /metrics
              kubernetes_sd_configs:
                - role: pod
              relabel_configs:
                - source_labels: [__address__]
                  regex: ^(.*):\d+$
                  target_label: __address__
                  replacement: $$1:9000

    extensions:
      zpages: {}
    exporters:
      prometheus:
        endpoint: "0.0.0.0:9001"
        resource_to_telemetry_conversion:
          enabled: true
      logging:
        loglevel: debug
        sampling_initial: 5
        sampling_thereafter: 200

    service:
      telemetry:
        logs:
          level: "debug"
      extensions: [zpages]
      
      pipelines:
        metrics/1:
          receivers: [prometheus]
          processors: [
            memory_limiter,
            batch]
          exporters: [prometheus]

Expected Result

Requests always sent to 9000 port

Actual Result

Requests sometimes sent to 9000, and sometimes to 80 port for some magical reasons

2024-04-10T22:00:25.545Z	warn	internal/transaction.go:123	Failed to scrape Prometheus endpoint	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1712786425544, "target_labels": "{__name__=\"up\", instance=\"10.244.1.9:80\", job=\"managed-service\"}"}
2024-04-10T22:00:25.659Z	debug	scrape/scrape.go:1384	Scrape failed	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "managed-service", "target": "http://10.244.0.5:80/metrics", "error": "Get \"http://10.244.0.5:80/metrics\": dial tcp 10.244.0.5:80: connect: connection refused"}
2024-04-10T22:00:25.659Z	warn	internal/transaction.go:123	Failed to scrape Prometheus endpoint	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1712786425658, "target_labels": "{__name__=\"up\", instance=\"10.244.0.5:80\", job=\"managed-service\"}"}
2024-04-10T22:00:27.088Z	debug	scrape/scrape.go:1384	Scrape failed	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "managed-service", "target": "http://10.244.0.33:9000/metrics", "error": "Get \"http://10.244.0.33:9000/metrics\": dial tcp 10.244.0.33:9000: connect: connection refused"}

Collector version

0.85.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")
CentOS

OpenTelemetry Collector configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: test-conf
  labels:
    test: value
data:
  config.yaml: |
    receivers:
      prometheus:
        config:
          scrape_configs:
            - job_name: "my-job"
              scrape_interval: 10s
              metrics_path: /metrics
              kubernetes_sd_configs:
                - role: pod
              relabel_configs:
                - source_labels: [__address__]
                  regex: ^(.*):\d+$
                  target_label: __address__
                  replacement: $$1:9000

    extensions:
      zpages: {}
    exporters:
      prometheus:
        endpoint: "0.0.0.0:9001"
        resource_to_telemetry_conversion:
          enabled: true
      logging:
        loglevel: debug
        sampling_initial: 5
        sampling_thereafter: 200

    service:
      telemetry:
        logs:
          level: "debug"
      extensions: [zpages]
      
      pipelines:
        metrics/1:
          receivers: [prometheus]
          processors: [
            memory_limiter,
            batch]
          exporters: [prometheus]


### Log output

```shell
2024-04-10T22:00:25.545Z	warn	internal/transaction.go:123	Failed to scrape Prometheus endpoint	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1712786425544, "target_labels": "{__name__=\"up\", instance=\"10.244.1.9:80\", job=\"managed-service\"}"}
2024-04-10T22:00:25.659Z	debug	scrape/scrape.go:1384	Scrape failed	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "managed-service", "target": "http://10.244.0.5:80/metrics", "error": "Get \"http://10.244.0.5:80/metrics\": dial tcp 10.244.0.5:80: connect: connection refused"}
2024-04-10T22:00:25.659Z	warn	internal/transaction.go:123	Failed to scrape Prometheus endpoint	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1712786425658, "target_labels": "{__name__=\"up\", instance=\"10.244.0.5:80\", job=\"managed-service\"}"}
2024-04-10T22:00:27.088Z	debug	scrape/scrape.go:1384	Scrape failed	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "managed-service", "target": "http://10.244.0.33:9000/metrics", "error": "Get \"http://10.244.0.33:9000/metrics\": dial tcp 10.244.0.33:9000: connect: connection refused"}


### Additional context

_No response_
@sfc-gh-aivanou sfc-gh-aivanou added bug Something isn't working needs triage New item requiring triage labels Apr 10, 2024
@sfc-gh-aivanou sfc-gh-aivanou changed the title Prometheus scraper is broken when it is used with kubernetes_sd_configs=pod Prometheus scraper ignores __address__ overrides when it is used with kubernetes_sd_configs=pod Apr 10, 2024
@JBodkin-Amphora
Copy link

I suspect that the regex doesn't match the __address__ label, as you've used a greedy match which will also capture the port number in the group.

You can try out this instead: ([^:]+)(?::\d+)?

@rooque
Copy link

rooque commented Apr 17, 2024

I'm having the same problem, and I think is related to what @JBodkin-Amphora said.

I tried copying the rules of prometheus of this file and putting into the collector config -> https://raw.githubusercontent.com/cilium/cilium/1.15.3/examples/kubernetes/addons/prometheus/monitoring-example.yaml

    scrape_configs:
      # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L79
      - job_name: 'kubernetes-endpoints'
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_label_k8s_app]
            action: keep
            regex: cilium
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: (.+)(?::\d+);(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: service
  
      # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L156
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            regex: (.+):(?:\d+);(\d+)
            replacement: ${1}:${2}
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: pod
          - source_labels: [__meta_kubernetes_pod_container_port_number]
            action: keep
            regex: \d+

It didn't worked, but when I changed the regex, to the one @JBodkin-Amphora mentioned, of this part (replace address with correct port) in all jobs it worked well.

- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            regex: (.+):(?:\d+);(\d+)  ----> ([^:]+)(?::\d+)?
            replacement: ${1}:${2}

I think the regex in collector is working differently how it works in prometheus, because this rules works fine in prometheus without changing the regex.

@JBodkin-Amphora
Copy link

Hello @rooque,

The regex pattern in the Prometheus example is different to the one you've specified in your initial configuration. In their example, the regex pattern is ([^:]+)(?::\d+)?;(\d+).

Note: There is an additional capturing group compared to the regex pattern I gave above, which takes the port from the annotation instead of a hardcoded value.

Do you recall where you found the regex pattern?

@rooque
Copy link

rooque commented Apr 24, 2024

@JBodkin-Amphora I tried with the regex ([^:]+)(?::\d+)?;(\d+) and it isn't working as well

@JBodkin-Amphora
Copy link

This is the full configuration that I'm using:

    prometheus:
      config:
        scrape_configs:
          - job_name: kubernetes
            scrape_interval: 30s
            kubernetes_sd_configs:
              - role: pod
                selectors:
                  - role: pod
                    field: spec.nodeName==${env:K8S_NODE_NAME}
            relabel_configs:
              - source_labels:
                  - __meta_kubernetes_pod_annotation_opentelemetry_io_scrape
                action: keep
                regex: "true"
              - source_labels:
                  - __address__
                  - __meta_kubernetes_pod_annotation_opentelemetry_io_scrape_port
                target_label: __address__
                action: replace
                regex: ([^:]+)(?::\d+)?;(\d+)
                replacement: $$1:$$2
              - source_labels:
                  - __meta_kubernetes_pod_annotation_opentelemetry_io_scrape_path
                target_label: __metric_path__
                action: replace
                regex: (.+)

Note: In the replacement, you need to use $$ characters, to escape environment variables. This is mentioned in the README under Getting Started.

Then my pods are annotation with:

    opentelemetry.io/scrape: "true"
    opentelemetry.io/scrape_port: "3100"

@sfc-gh-aivanou
Copy link
Author

thank you very much @JBodkin-Amphora !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage
Projects
None yet
Development

No branches or pull requests

3 participants