server/status: prefer `memory.high` for memory limit detection for cgroups v2

cgroups v2 introduces a new `memory.high` soft limit used for throttling (also referred to as "pressure stall") when processes exceed this limit. See https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files and https://facebookmicrosites.github.io/cgroup2/docs/memory-controller.html

Software can then use this memory pressure information to determine whether they should reclaim memory back to the OS. In practice, this is used by Kubernetes for an (alpha) feature [MemoryQoS](https://kubernetes.io/blog/2023/05/05/qos-memory-resources/) which calculates a suitable per-container `memory.high` value based on pod allocable memory, requested memory, and memory limits. So I think this is a more suitable number for CRDB's memory use to target as we know that CRDB can sometimes exceed its detected (hard) memory limits.

The reclaim pressure provided by `memory.high` can also prevent k8s from evicting CRDB for high memory use if CRDB's page cache use high (see https://github.com/kubernetes/kubernetes/issues/43916).

cgroups v2 currently doesn't have a way of determining the effective memory limits of a child cgroup if it is being constrained by a parent cgroup. i.e. if cgroup `/kubepods.slice` has a `memory.max` of `6GiB`, and `/kubepods.slice/my-container.slice` has a `memory.max` of `max` (default k8s uses when no memory limit is set), then `/sys/fs/cgroup/memory.max` from within `my-container` will report `max`, even though the effective limit is actually `6GiB`.

This is different from cgroups v1 where `memory.limit_in_bytes` from inside the child cgroup did actually report effective memory limits.

CRDB currently only checks `memory.max` and `memory.limit_in_bytes` for cgroup memory limits.

My suggestion is that CRDB first checks `memory.high` before `memory.max`, and should prefer a memory limit specified by `memory.high`. This should make it behave in a more Kubernetes friendly way on cgroups v2, at least when `MemoryQoS` is enabled.

Related internal Slack thread: https://cockroachlabs.slack.com/archives/C04HQCNHGEP/p1700517717112919

Jira issue: CRDB-33675

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server/status: prefer `memory.high` for memory limit detection for cgroups v2 #114774

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

server/status: prefer memory.high for memory limit detection for cgroups v2 #114774

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

server/status: prefer `memory.high` for memory limit detection for cgroups v2 #114774