CPU starvation on worker nodes caused by the Kubelet not setting cpu.cfs_quota_us in the kubepods.slice cgroup. #129811
Labels
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
sig/node
Categorizes an issue or PR as relevant to SIG Node.
Any idea why the kubelet isn't setting a value for cpu.cfs_quota_us for the parent cgroup "kubepods.slice", and instead defaults to -1? This is leading to CPU starvation on the node, as burstable pods end up consuming 100% of the CPU, despite CPU reservations being configured in the kubelet’s kubeReserved and systemReserved as shown below. These reservations aren’t being enforced because the parent cgroup doesn't have CPU quota set. This is resulting in pods consuming 100% of the CPU and nothing being reserved for system processes or kubelet.
################
Kubelet Config:
################
kubeReserved:
cpu: "2000m"
systemReserved:
cpu: "2000m"
################
CGroup "kubepods.slice" setting for cpu quota:
################
$ cat /sys/fs/cgroup/cpu/kubepods.slice/cpu.cfs_quota_us
-1
The text was updated successfully, but these errors were encountered: