Skip to content

CPU starvation on worker nodes caused by the Kubelet not setting cpu.cfs_quota_us in the kubepods.slice cgroup. #129811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sudheernv opened this issue Jan 24, 2025 · 5 comments
Labels
needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@sudheernv
Copy link

sudheernv commented Jan 24, 2025

Any idea why the kubelet isn't setting a value for cpu.cfs_quota_us for the parent cgroup "kubepods.slice", and instead defaults to -1? This is leading to CPU starvation on the node, as burstable pods end up consuming 100% of the CPU, despite CPU reservations being configured in the kubelet’s kubeReserved and systemReserved as shown below. These reservations aren’t being enforced because the parent cgroup doesn't have CPU quota set. This is resulting in pods consuming 100% of the CPU and nothing being reserved for system processes or kubelet.

################
Kubelet Config:
################

kubeReserved:
cpu: "2000m"
systemReserved:
cpu: "2000m"

################
CGroup "kubepods.slice" setting for cpu quota:
################

$ cat /sys/fs/cgroup/cpu/kubepods.slice/cpu.cfs_quota_us
-1

@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 24, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ffromani
Copy link
Contributor

/sig node
/cc

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 25, 2025
@sudheernv
Copy link
Author

Hi Team!
Any update on this issue?

@Aaina26
Copy link
Contributor

Aaina26 commented Mar 13, 2025

Hi! Are you using cgroup v2 or v1?
Also, which version of kubernetes is causing this issue?

@Aaina26
Copy link
Contributor

Aaina26 commented Mar 13, 2025

I think a similar issue is being tracked here: #97445
You might want to check this out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

4 participants