Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate tolerations causing issue with prometheus >= 2.52.0 #2390

Open
rgarcia89 opened this issue May 14, 2024 · 7 comments · May be fixed by #2559
Open

Duplicate tolerations causing issue with prometheus >= 2.52.0 #2390

rgarcia89 opened this issue May 14, 2024 · 7 comments · May be fixed by #2559
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@rgarcia89
Copy link

What happened:
Starting with version 2.52.0, Prometheus introduced a mechanism to detect duplicate series during scraping. This can lead to error logs when kube-state-metrics scrapes metrics for deployments, particularly if there are duplicate entries within the toleration array.

prometheus debug logs:

ts=2024-05-13T19:21:09.190Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-state-metrics/0 target=https://10.244.5.6:8443/metrics msg="Duplicate sample for timestamp" series="kube_pod_tolerations{namespace=\"calico-system\",pod=\"calico-kube-controllers-75c647b46c-pg9cr\",uid=\"bf944c52-17bd-438b-bbf1-d97f8671bd6b\",key=\"CriticalAddonsOnly\",operator=\"Exists\"}"
ts=2024-05-13T19:21:09.207Z caller=scrape.go:1738 level=warn component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-state-metrics/0 target=https://10.244.5.6:8443/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=1

There might be a need to deduplicate the toleration entries or add an index to entries with existing duplicates.

How to reproduce it (as minimally and precisely as possible):

create the following deployment and look at the metrics produced by kube-state-metrics

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-deployment
  labels:
    app: something
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - name: test-container
        image: nginx
      tolerations:
       - key: CriticalAddonsOnly
         operator: Exists
       - key: CriticalAddonsOnly
         operator: Exists

Anything else we need to know?:
Issue report I opened on the prometheus project prometheus/prometheus#14089

Environment:

  • kube-state-metrics version: 2.12.0
  • Kubernetes version (use kubectl version): 1.27.9
  • Cloud provider or hardware configuration: AKS
  • Other info:
@dgrisonnet
Copy link
Member

/assign
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 16, 2024
@dgrisonnet
Copy link
Member

Quoting yourself from the issue you opened against Kubernetes:

A validation check within the Kubernetes API server to reject manifests with duplicate tolerations, ensuring adherence to Kubernetes best practices and avoiding potential issues related to duplicate toleration definitions would be great.

This is also what I would expect to be in the kube-apiserver. I don't think we should handle this scenario at kube-state-metrics' level since the object data is erroneous.

I am closing this issue in favor of the Kubernetes one. Feel free to reopen if the Kubernetes maintainers think we should handle this scenario here.

@RiRa12621
Copy link

RiRa12621 commented Nov 21, 2024

kubernetes/kubernetes#124881 (comment)

seems this got bounced back here @dgrisonnet

@k8s-ci-robot
Copy link
Contributor

@RiRa12621: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen
kubernetes/kubernetes#124881 (comment)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dgrisonnet
Copy link
Member

Thanks for the heads up @RiRa12621 :)

/reopen

/unassign
/help

If anyone is interested in contributing the logic to make sure that there are only unique tolerations, feel free to self-assign the issue and draft a PR.

@k8s-ci-robot
Copy link
Contributor

@dgrisonnet: Reopened this issue.

In response to this:

Thanks for the heads up @RiRa12621 :)

/reopen

/unassign
/help

If anyone is interested in contributing the logic to make sure that there are only unique tolerations, feel free to self-assign the issue and draft a PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@RiRa12621
Copy link

RiRa12621 commented Nov 21, 2024

/assign @RiRa12621
Not sure if this is the most elegant way, but should do the job: #2559

This takes all tolerations, only gets the unique ones and then the regular logic is applied to those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
4 participants