[prometheus-kube-stack] New cAdvisorMetricRelabelings are a bit too restrictive by default #2279

dotdc · 2022-07-15T13:49:53Z

Describe the bug a clear and concise description of what the bug is.

Hi,

I think two of the new rules introduced in 37.0.0 by @SuperQ are a bit too restrictive to be enabled by default.

Rules concerned :

      # Drop cgroup metrics with no pod.
      - sourceLabels: [id, pod]
        action: drop
        regex: '.+;'
      # Drop cgroup metrics with no container.
      - sourceLabels: [id, container]
        action: drop
        regex: '.+;'

Theses 2 rules drop every metrics without a pod and/or a container.
This will make some basic queries like container_cpu_usage_seconds_total{id="/"} to stop working out of the box.
I would at least comment theses two to limit the number of impacted users.

What do you think?

What's your helm version?

3.9.0

What's your kubectl version?

1.24.2

Which chart?

prometheus-kube-stack

What's the chart version?

37.2.0

What happened?

No response

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

container_cpu_usage_seconds_total{id="/"}

Anything else we need to know?

No response

The text was updated successfully, but these errors were encountered:

SuperQ · 2022-07-16T15:38:10Z

Adding my comments from the PR:

These are mostly generated by systemd cgroups, which can be pretty noisy for Kubernetes users. This is why I added them as the default.

The goal here is to monitor Kubernetes resources.

Also, I'm not sure what use of monitoring the CPU of the root cgroup is, when node_cpu_seconds_total covers that.

dotdc · 2022-07-16T15:53:08Z

Good point, can't remember why I didn't used node_cpu_seconds_total to calculate global cluster usage, will check again and update this.

BeckYeh · 2022-07-18T03:31:15Z

Becasue this metrics used in prometheus-adapter for nodes resource query（ kubectl top nodes ）.
So by default I hope don't add this filter.

SuperQ · 2022-07-18T04:12:34Z

Ooof, the prometheus-adapter seems to have a bunch of incorrect queries built in. Those should be fixed.

SuperQ · 2022-07-18T04:24:14Z

kubernetes-sigs/prometheus-adapter#516 should fix up the prometheus-adapter bug.

dotdc · 2022-07-18T10:14:24Z

Thank you @SuperQ, I replaced all my affected queries using node_cpu_seconds_total instead of container_cpu_usage_seconds_total and LGTM.

Can I close the issue or should we wait to see if other related charts and projects are using similar queries?

SuperQ · 2022-07-19T04:22:09Z

I think we can close it. Other projects can find this issue for advice on how to fix their query use.

dotdc added the bug Something isn't working label Jul 15, 2022

dotdc assigned monotek and SuperQ Jul 15, 2022

This was referenced Jul 15, 2022

Some metrics are missing. dotdc/grafana-dashboards-kubernetes#3

Closed

[prometheus-kube-stack] disabled restrictive cAdvisorMetricRelabelings rules by default #2280

Closed

dotdc closed this as completed Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prometheus-kube-stack] New cAdvisorMetricRelabelings are a bit too restrictive by default #2279

[prometheus-kube-stack] New cAdvisorMetricRelabelings are a bit too restrictive by default #2279

dotdc commented Jul 15, 2022

SuperQ commented Jul 16, 2022

dotdc commented Jul 16, 2022

BeckYeh commented Jul 18, 2022

SuperQ commented Jul 18, 2022

SuperQ commented Jul 18, 2022 •

edited

Loading

dotdc commented Jul 18, 2022

SuperQ commented Jul 19, 2022

[prometheus-kube-stack] New cAdvisorMetricRelabelings are a bit too restrictive by default #2279

[prometheus-kube-stack] New cAdvisorMetricRelabelings are a bit too restrictive by default #2279

Comments

dotdc commented Jul 15, 2022

Describe the bug a clear and concise description of what the bug is.

What's your helm version?

What's your kubectl version?

Which chart?

What's the chart version?

What happened?

What you expected to happen?

How to reproduce it?

Enter the changed values of values.yaml?

Enter the command that you execute and failing/misfunctioning.

Anything else we need to know?

SuperQ commented Jul 16, 2022

dotdc commented Jul 16, 2022

BeckYeh commented Jul 18, 2022

SuperQ commented Jul 18, 2022

SuperQ commented Jul 18, 2022 • edited Loading

dotdc commented Jul 18, 2022

SuperQ commented Jul 19, 2022

SuperQ commented Jul 18, 2022 •

edited

Loading