Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube state metrics only shows metrics related to the namespace where it is running #2211

Closed
ThiagoScodeler opened this issue Sep 29, 2023 · 13 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@ThiagoScodeler
Copy link

What happened:
Kube state metrics only shows metrics related to the namespace where it is running.

What you expected to happen:
Kube state metrics shows metrics related all Kubernetes cluster namespaces.

How to reproduce it (as minimally and precisely as possible):

I have AWS EKS cluster with kube-state-metrics installed in a namespace called "monitoring". This installation is using service monitor and other components (see yaml files below).
In this cluster, there is also a prometheus agent running and selecting the kube-state-metrics service monitor.

kube-state-metrics is listed on Prometheus targets property but when I add a dashboard on Grafana to visualize these metrics, I can only see kube-state-metrics related to the "monitoring" namespace. EKS cluster has other namespaces and kube-state-metrics should display metrics for all of them.

I have a similar setup for Cadvisor and it works fine by showing metrics related to all namespaces.

Any idea why kube-state-metrics is showing only data related to the namespace it is running?

prometheus.yaml

---
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: agent
  namespace: monitoring
spec:
  version: v2.39.1
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      component: prometheus-agent
  serviceMonitorNamespaceSelector:
    matchLabels:
      monitoring: prometheus-agent
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 500m
      memory: 1Gi
  replicas: 1
  logLevel: debug
  logFormat: logfmt
  scrapeInterval: 30s
  remoteWrite:
  - url: https://prometheus-workspace
    sigv4:
      region: us-east-1
    queueConfig:
      maxSamplesPerSend: 1000
      maxShards: 200
      capacity: 2500
  containers:
  - name: prometheus
    args:
    - --config.file=/etc/prometheus/config_out/prometheus.env.yaml
    - --storage.agent.path=/prometheus
    - --enable-feature=agent
    - --web.enable-lifecycle
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: role
            operator: In
            values:
            - monitoring
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app.kubernetes.io/name
              operator: In
              values:
              - prometheus
          topologyKey: kubernetes.io/hostname
        weight: 100

kube-state-metrics:

service-monitor.yaml

---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kube-state-metrics
  namespace: monitoring
  labels:
    component: prometheus-agent
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-state-metrics
  endpoints:
  - port: http-metrics

service.yaml

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.6.0
  name: kube-state-metrics
  namespace: monitoring
spec:
  clusterIP: None
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
  - name: telemetry
    port: 8081
    targetPort: telemetry
  selector:
    app.kubernetes.io/name: kube-state-metrics

deployment.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.6.0
  name: kube-state-metrics
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-state-metrics
  template:
    metadata:
      labels:
        app.kubernetes.io/component: exporter
        app.kubernetes.io/name: kube-state-metrics
        app.kubernetes.io/version: 2.6.0
    spec:
      automountServiceAccountToken: true
      containers:
      - image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5
        name: kube-state-metrics
        ports:
        - containerPort: 8080
          name: http-metrics
        - containerPort: 8081
          name: telemetry
        readinessProbe:
          httpGet:
            path: /
            port: 8081
          initialDelaySeconds: 5
          timeoutSeconds: 5
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsUser: 65534
      nodeSelector:
        kubernetes.io/os: linux
      serviceAccountName: kube-state-metrics

cluster-role-binding.yaml

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.6.0
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: monitoring

cluster-role.yaml

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.6.0
  name: kube-state-metrics
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - serviceaccounts
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs:
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - statefulsets
  - daemonsets
  - deployments
  - replicasets
  verbs:
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - cronjobs
  - jobs
  verbs:
  - list
  - watch
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - list
  - watch
- apiGroups:
  - authentication.k8s.io
  resources:
  - tokenreviews
  verbs:
  - create
- apiGroups:
  - authorization.k8s.io
  resources:
  - subjectaccessreviews
  verbs:
  - create
- apiGroups:
  - policy
  resources:
  - poddisruptionbudgets
  verbs:
  - list
  - watch
- apiGroups:
  - certificates.k8s.io
  resources:
  - certificatesigningrequests
  verbs:
  - list
  - watch
- apiGroups:
  - storage.k8s.io
  resources:
  - storageclasses
  - volumeattachments
  verbs:
  - list
  - watch
- apiGroups:
  - admissionregistration.k8s.io
  resources:
  - mutatingwebhookconfigurations
  - validatingwebhookconfigurations
  verbs:
  - list
  - watch
- apiGroups:
  - networking.k8s.io
  resources:
  - networkpolicies
  - ingresses
  verbs:
  - list
  - watch
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - list
  - watch
- apiGroups:
  - rbac.authorization.k8s.io
  resources:
  - clusterrolebindings
  - clusterroles
  - rolebindings
  - roles
  verbs:
  - list
  - watch

service-account.yaml

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
  namespace: monitoring
  • kube-state-metrics version: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0
  • Kubernetes version (use kubectl version): v1.27.5-eks-43840fb
  • Cloud provider or hardware configuration: AWS EKS cluster
@ThiagoScodeler ThiagoScodeler added the kind/bug Categorizes issue or PR as related to a bug. label Sep 29, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 29, 2023
@dashpole
Copy link
Contributor

dashpole commented Oct 5, 2023

/assign @CatherineF-dev
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 5, 2023
@CatherineF-dev
Copy link
Contributor

Hi, could you try KSM v2.7 and see whether it still has this issue?

I have a cluster running with v2.7 and doesn't have this issue.

@ThiagoScodeler
Copy link
Author

Hi @CatherineF-dev just tried v2.7.0 and got the same issue, only "monitoring namespace" related metrics:

image

@ThiagoScodeler
Copy link
Author

@CatherineF-dev kube-state-metrics pod logs:

I1009 17:19:23.557406       1 wrapper.go:78] Starting kube-state-metrics
I1009 17:19:23.557917       1 server.go:125] "Used default resources"
I1009 17:19:23.558027       1 types.go:184] "Using all namespaces"
I1009 17:19:23.558099       1 server.go:166] "Metric allow-denylisting" allowDenyStatus="Excluding the following lists that were on denylist: "
W1009 17:19:23.558189       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1009 17:19:23.560029       1 server.go:311] "Tested communication with server"
I1009 17:19:23.575931       1 server.go:316] "Run with Kubernetes cluster version" major="1" minor="27+" gitVersion="v1.27.4-eks-2d98532" gitTreeState="clean" gitCommit="3d90c097c72493c2f1a9dd641e4a22d24d15be68" platform="linux/amd64"
I1009 17:19:23.576116       1 server.go:317] "Communication with server successful"
I1009 17:19:23.576396       1 server.go:263] "Started metrics server" metricsServerAddress="[::]:8080"
I1009 17:19:23.576399       1 metrics_handler.go:97] "Autosharding disabled"
I1009 17:19:23.576689       1 server.go:69] levelinfomsgListening onaddress[::]:8080
I1009 17:19:23.576770       1 server.go:69] levelinfomsgTLS is disabled.http2falseaddress[::]:8080
I1009 17:19:23.576812       1 server.go:252] "Started kube-state-metrics self metrics server" telemetryAddress="[::]:8081"
I1009 17:19:23.576860       1 server.go:69] levelinfomsgListening onaddress[::]:8081
I1009 17:19:23.576876       1 server.go:69] levelinfomsgTLS is disabled.http2falseaddress[::]:8081
I1009 17:19:23.578081       1 builder.go:257] "Active resources" activeStoreNames="certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,leases,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments"

@CatherineF-dev
Copy link
Contributor

Could you curl KSM endpoint directly to list all metrics?

@ThiagoScodeler
Copy link
Author

ThiagoScodeler commented Oct 9, 2023

@CatherineF-dev yes, I can curl KSM and visualize all metrics. I'm using this Grafana dashboard: https://grafana.com/grafana/dashboards/13332-kube-state-metrics-v2/

@CatherineF-dev
Copy link
Contributor

Could you paste one KSM metric around pods?

@ThiagoScodeler
Copy link
Author

@CatherineF-dev here are some metrics for a non "monitoring" namespaced:

kube_pod_container_resource_limits{namespace="test-development",pod="test-deployment-475bc6cdc9-cjfr9",uid="8a5de859-f203-48fe-8113-afd002056e5648",container="test-container",node="ip-1-1-1-150.ec2.internal",resource="cpu",unit="core"} 1
::
kube_pod_container_resource_requests{namespace="test-development",pod="test-deployment-475bc6cdc9-cjfr9",uid="8a5de859-f203-48fe-8113-afd002056e5648",container="test-container",node="ip-1-1-1-150.ec2.internal",resource="cpu",unit="core"} 0.5
::
kube_pod_container_state_started{namespace="test-development",pod="test-deployment-475bc6cdc9-cjfr9",uid="8a5de859-f203-48fe-8113-afd002056e5648",container="test-container"} 1.696455268e+09
::
kube_pod_container_status_ready{namespace="test-development",pod="test-deployment-475bc6cdc9-cjfr9",uid="8a5de859-f203-48fe-8113-afd002056e5648",container="test-container"} 1

@CatherineF-dev
Copy link
Contributor

It does show metrics in other namespaces. I feel it's an issue around grafana dashboard. Maybe you can contact the team who provides this dashboard.

Do you have other questions? If not, we will close this issue.

@CatherineF-dev
Copy link
Contributor

/remove kind/bug

@ThiagoScodeler
Copy link
Author

ThiagoScodeler commented Oct 10, 2023

@CatherineF-dev i'll get in contact with them. Do you have any other recommended grafana dashboard?
Thank you for your support, no more questions from my side.

@CatherineF-dev
Copy link
Contributor

Searched inside this repo and didn't find grafana dashboard to monitor cluster.
You can contribute to it if you have time and would like to.

/close

@k8s-ci-robot
Copy link
Contributor

@CatherineF-dev: Closing this issue.

In response to this:

Searched inside this repo and didn't find grafana dashboard to monitor cluster.
You can contribute to it if you have time and would like to.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants