Skip to content

Commit d045e7d

Browse files
committed
Blog post: New conversion from cgroup v1 CPU shares to v2 CPU weight
Signed-off-by: Itamar Holder <iholder@redhat.com>
1 parent b97d217 commit d045e7d

5 files changed

+152
-0
lines changed
59.4 KB
Loading
37 KB
Loading
39.1 KB
Loading
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
---
2+
layout: blog
3+
title: 'New conversion from cgroup v1 CPU shares to v2 CPU weight'
4+
date: 2025-10-25T05:00:00-08:00
5+
slug: new-cgroup-v1-to-v2-cpu-conversion-formula
6+
author: >
7+
[Itamar Holder](https://github.com/iholder101) (Red Hat)
8+
---
9+
10+
We're excited to announce the implementation of an improved conversion formula
11+
from cgroup v1 CPU shares to cgroup v2 CPU weight. This enhancement addresses
12+
critical issues with CPU priority allocation for Kubernetes workloads when
13+
running on systems with cgroup v2.
14+
15+
## Background
16+
17+
Kubernetes was originally designed with cgroup v1 in mind, where CPU shares
18+
were defined simply by assigning the container's CPU requests in millicpu
19+
form.
20+
21+
For example, a container requesting 1 CPU (1024m) would get
22+
`cpu.shares = 1024`.
23+
24+
After a while, cgroup v1 was stared being replaced by its successor,
25+
cgroup v2. In cgroup v2, the concept of CPU shares (which ranges from 2 to
26+
262144, or from 2^1 to 2^18) was replaced with CPU weight (which ranges from
27+
1 to 10000, or 10^10 to 10^4).
28+
29+
With the transition to cgroup v2,
30+
[KEP-2254](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2254-cgroup-v2)
31+
introduced a conversion formula to map cgroup v1 CPU shares to cgroup v2 CPU
32+
weight. The conversion formula is defined as:
33+
34+
`cpu.weight = (1 + ((cpu.shares - 2) * 9999) / 262142) // convert from [2-262144] to [1-10000]`.
35+
36+
This formula linearly maps between `[2^1 - 2^18]` to `[10^0 - 10^4]`.
37+
![2025-10-25-new-cgroup-v1-to-v2-conversion-formula-linear-conversion.png](2025-10-25-new-cgroup-v1-to-v2-conversion-formula-linear-conversion.png)
38+
39+
While this approach is simple, the linear mapping imposes a few significant
40+
problems and impacts both performance and configuration granularity.
41+
42+
## Problems with Current Conversion Formula
43+
44+
The current conversion formula creates two major issues:
45+
46+
### 1. Reduced Priority Against Non-Kubernetes Workloads
47+
48+
In cgroup v1, the default CPU shares is `1024`, meaning a container
49+
requesting 1 CPU has equal priority with system processes that live outside
50+
of Kubernetes' scope.
51+
However, in cgroup v2, the default CPU weight is `100`, but the current
52+
formula converts 1 CPU (1024m) to only `~39` weight - less than 40% of the
53+
default.
54+
55+
**Example:**
56+
- Container requesting 1 CPU (1024m)
57+
- cgroup v1: `cpu.shares = 1024` (equal to default)
58+
- cgroup v2 (current): `cpu.weight = 39` (much lower than default 100)
59+
60+
This means that after moving to cgroup v2, Kubernetes workloads would
61+
de-factor reduce their CPU priority against non-Kubernetes processes. The
62+
problem can be severe for setups that run many system daemons that run
63+
outside of Kubernetes' scope and expect Kubernetes workloads to have
64+
priority, especially in situations of resource starvation.
65+
66+
### 2. Unmanageable Granularity
67+
68+
The current formula produces very low values for small CPU requests,
69+
limiting the ability to create sub-cgroups within containers for
70+
fine-grained resource distribution.
71+
72+
**Example:**
73+
- Container requesting 100m CPU
74+
- cgroup v1: `cpu.shares = 102`
75+
- cgroup v2 (current): `cpu.weight = 4` (too low for sub-cgroup
76+
configuration)
77+
78+
With cgroup v1, requesting 1 CPU which led to 102 CPU shares was manageable
79+
in the sense that sub-cgroups could have been created inside the main
80+
container, assigning fine-grained CPU priorities for different groups of
81+
processes. With cgroup v2 however, having 4 shares is very hard to
82+
distribute between sub-cgroups since it's not granular enough.
83+
84+
With plans to allow [writable cgroups for unprivileged containers](https://github.com/kubernetes/enhancements/issues/5474),
85+
this becomes even
86+
more relevant.
87+
88+
## New Conversion Formula
89+
90+
### Description
91+
The new formula is more complicated, but does a much better job mapping
92+
between cgroup v1 CPU shares and cgroup v2 CPU weight:
93+
94+
`cpu.weight = ⌈10^(L²/612 + 125L/612 - 7/34)⌉` where `L = log₂(cpu.shares)`.
95+
96+
The idea is that this is a quadratic function to cross the following values:
97+
- (2, 1): The minimum values for both ranges.
98+
- (1024, 100): The default values for both ranges.
99+
- (262144, 10000): The maximum values for both ranges.
100+
101+
Visually, the new function looks as follows:
102+
![2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion.png](2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion.png)
103+
104+
And if we zoom in to the important part:
105+
![2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion-zoom.png](2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion-zoom.png)
106+
107+
The new formula is "close to linear", yet it is sophistically designed to
108+
map the ranges in a clever way so the three important points above would
109+
cross.
110+
111+
### How It Solves the Problems
112+
113+
**1. Better Priority Alignment:**
114+
- Container requesting 1 CPU (1024m) will now get a `cpu.weight = 102`. This
115+
value is close to cgroup v2's default 100.
116+
- This restores the intended priority relationship between Kubernetes
117+
workloads and system processes.
118+
119+
**2. Improved Granularity:**
120+
- Container requesting 100m CPU will get `cpu.weight = 17`, (see
121+
[here](https://go.dev/play/p/sLlAfCg54Eg)).
122+
- Enables better fine-grained resource distribution within containers.
123+
124+
## Adoption and integration
125+
126+
This change was implemented as an OCI-level implementation.
127+
In other words, this is not implemented Kubernetes itself, therefore the
128+
adoption of the new conversion formula depends solely on the OCI runtime
129+
adoption.
130+
131+
For example:
132+
- runc: The new formula is enabled from [version 1.4.0-rc.1](https://github.com/opencontainers/runc/releases/tag/v1.4.0-rc.1).
133+
- crun: The new formula is enabled from [version 1.23](https://github.com/containers/crun/releases/tag/1.23).
134+
135+
## Where Can I Learn More?
136+
137+
For those interested in this enhancement:
138+
139+
- [Kubernetes GitHub Issue #131216](https://github.com/kubernetes/kubernetes/issues/131216) - Detailed technical
140+
analysis and examples, including discussions and reasoning for choosing the
141+
above formula.
142+
- [KEP-2254: cgroup v2](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2254-cgroup-v2) -
143+
Original cgroup v2 implementation in Kubernetes.
144+
- [Kubernetes cgroup documentation](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) -
145+
Current resource management guidance.
146+
147+
## How Do I Get Involved?
148+
149+
For those interested in getting involved with Kubernetes node-level
150+
features, join the [Kubernetes Node Special Interest Group](https://github.com/kubernetes/community/tree/master/sig-node).
151+
We always welcome new contributors and diverse perspectives on resource management
152+
challenges.

linear-conversion.png

59.4 KB
Loading

0 commit comments

Comments
 (0)