Skip to content

server/status: prefer memory.high for memory limit detection for cgroups v2 #114774

@1lann

Description

@1lann

cgroups v2 introduces a new memory.high soft limit used for throttling (also referred to as "pressure stall") when processes exceed this limit. See https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files and https://facebookmicrosites.github.io/cgroup2/docs/memory-controller.html

Software can then use this memory pressure information to determine whether they should reclaim memory back to the OS. In practice, this is used by Kubernetes for an (alpha) feature MemoryQoS which calculates a suitable per-container memory.high value based on pod allocable memory, requested memory, and memory limits. So I think this is a more suitable number for CRDB's memory use to target as we know that CRDB can sometimes exceed its detected (hard) memory limits.

The reclaim pressure provided by memory.high can also prevent k8s from evicting CRDB for high memory use if CRDB's page cache use high (see kubernetes/kubernetes#43916).

cgroups v2 currently doesn't have a way of determining the effective memory limits of a child cgroup if it is being constrained by a parent cgroup. i.e. if cgroup /kubepods.slice has a memory.max of 6GiB, and /kubepods.slice/my-container.slice has a memory.max of max (default k8s uses when no memory limit is set), then /sys/fs/cgroup/memory.max from within my-container will report max, even though the effective limit is actually 6GiB.

This is different from cgroups v1 where memory.limit_in_bytes from inside the child cgroup did actually report effective memory limits.

CRDB currently only checks memory.max and memory.limit_in_bytes for cgroup memory limits.

My suggestion is that CRDB first checks memory.high before memory.max, and should prefer a memory limit specified by memory.high. This should make it behave in a more Kubernetes friendly way on cgroups v2, at least when MemoryQoS is enabled.

Related internal Slack thread: https://cockroachlabs.slack.com/archives/C04HQCNHGEP/p1700517717112919

Jira issue: CRDB-33675

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-cc-enablementPertains to current CC production issues or short-term projectsA-cli-serverCLI commands that pertain to CockroachDB server processesA-orchestrationRelating to orchestration systems like KubernetesC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)O-sreFor issues SRE opened or otherwise cares about tracking.T-db-server

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions