-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
only relevant for linux
As of now, Zig is not container-aware. This means that functions like getCpuCount and totalSystemMemory do not take into account cgroup-imposed limits when running inside containers.
For example, a Zig application attempting to allocate 70% of total system memory might be OOM-killed immediately if the process is subject to a lower memory limit via cgroups.
Current Implementation
memory
flowchart
A["std.process.totalSystemMemory"]
B["sysinfo syscall"]
A-->B
cpu
flowchart
A["std.Thread.getCpuCount"]
B["std.posix.sched_getaffinity"]
C["std.posix.CPU_COUNT"]
A-->B
B-->C
Discovering 'cgroup' Limits
Since cgroup v2 is now the standard, cgroup v1 could be disregarded.
A process can determine which cgroup it belongs to by reading the file /proc/self/cgroup. This file outputs entries in the format:
<id>:<subsystem>:/path/to/cgroup.
Both fields <id> and <subsystem> are only relevant for cgroup v1 and thus could be ignored. Important is the path, which is relative to the cgroup filesystem mount point, typically located at /sys/fs/cgroup. Therefore, a process can discover its own cgroup path using this information.
Each cgroup is represented as a directory containing multiple files to control different resources. Depending on which resources are controlled a cgroup directory may contain the files cpu.max and memory.max.
memory.max file
This file specifies the maximum amount of memory (in bytes) that processes in the cgroup are allowed to use.
If the file contains the string max, it means there is no memory limit imposed.
cpu.max file
The file is structured as:
<quota> <period>
- quota: the maximum amount of CPU time (in microseconds) the cgroup can use during each period, or the string
maxfor no CPU limit - period: the length of each enforcement window, also in microseconds
The CPU shares a process is allowed can be calculated as:
cpu share = quota / period
Which can lead to numbers like 0,5 (half a cpu) or 2 (two cpus).
Pitfall
The values read from cpu.max or memory.max may be misleading or incomplete. This is because a cgroup does not inherit awareness of the resource limits set by its parent cgroup.
flowchart
A["/sys/fs/cgroup/parent (memory limit = 1GB)"]
B["/sys/fs/cgroup/parent/child (memory limit = max)"]
A-->|parent of| B
In the example above, the child cgroup reports memory.max = max, suggesting there is no memory limit. However, the parent has a limit of 1GB, which implicitly constrains the child. The child cannot exceed the parent's limit, regardless of its own setting.
This becomes even more problematic in environments using namespacing. In such cases, a process cannot 'see' its parent cgroup and therefore has no visibility into inherited limits. As a result, applications relying solely on their local memory.max or cpu.max values might make incorrect assumptions about available resources.
Proposed Implementation
memory
flowchart
A["std.process.totalSystemMemory"]
Z["sysinfo syscall"]
B["discover cgroup"]
C["read memory limit"]
D["return value"]
A-->B
B-->C
B-->|fail|Z
B-->|cgroup v1|Z
C-->|fail|Z
C-->|max|Z
C-->|bytes|D
where failing to interact with the cgroup filesystem will cause a fallback to the current behavior.
cpu
flowchart
A["std.Thread.getCpuCount"]
X["std.posix.sched_getaffinity"]
Z["std.posix.CPU_COUNT"]
B["discover cgroup"]
C["read cpu limit"]
D["calculate and return cpu share"]
A-->B
B-->|sucessful read|C
B-->|fail|X
B-->|cgroup v1|X
C-->|non max|D
C-->|max|X
C-->|fail|X
X-->Z
Questions
- does Zig want to be container aware?
- does Zig also care about cgroup version 1?
Sidenotes / References
- Kernel cgroup documentation (docs)
- Java: general container awareness in java 10 (release notes)
- Go
- cpu: implemented in 1.25 (release notes, code and more code)
- memory: todo (issue)
- memory.high vs memory.max in cockroachdb (issue)
I would be interested in working on this if permitted.