KEP-5507: Container-level OOM kill mode configuration #5496

utam0k · 2025-08-24T08:56:47Z

One-line PR description: Adding KEP for container-level OOM kill mode configuration

Issue link: Container-level OOM kill mode configuration #5507

Other comments:

Signed-off-by: utam0k <k0ma@utam0k.jp>

k8s-ci-robot · 2025-08-24T08:56:49Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

k8s-ci-robot · 2025-08-24T08:56:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: utam0k
Once this PR has been reviewed and has the lgtm label, please assign mrunalp for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

toVersus · 2025-08-24T23:44:11Z

keps/sig-node/xxxx-container-oom-kill-mode/README.md

+
+### Goals
+
+- Add a per-container `oomKillMode` field to allow container-level OOM behavior configuration


Currently, we have four types of containers: regular, init, sidecar, and ephemeral. Will the oomKillMode field be supported for all of them? If so, it might be better to state this explicitly. Also, since ephemeral containers use a separate struct, the API change would likely need to be applied there as well.
https://github.com/kubernetes/kubernetes/blob/v1.33.4/pkg/apis/core/types.go#L4240

toVersus · 2025-08-25T00:02:08Z

keps/sig-node/xxxx-container-oom-kill-mode/README.md

+
+These reports led to [PR #122813](https://github.com/kubernetes/kubernetes/pull/122813), which attempted to add a kubelet flag but was closed after community discussion concluded that container-level configuration was the proper solution[^3]. The consensus was that node-level configuration cannot adequately address the needs of heterogeneous workloads.
+
+This KEP also deprecates the `singleProcessOOMKill` kubelet flag for removal in v1.38 (GA). Container-level configuration provides better granularity and eliminates the complexity of maintaining both node-level and container-level settings.


If you’re only running multi-process applications (e.g., using a managed database service in the cloud), it seems more convenient from a UX perspective to configure this once at the kubelet level. If configuration is only allowed at the container level, you’d likely need to rely on a Mutating Admission Policy instead.

Since the logic that allows the kubelet to determine a default value for memory.oom.group based on the OS and cgroup settings remains when the oomKillMode field is unset, I don’t think keeping the kubelet flag to override that default will add much complexity to the implementation. I think it will be more difficult for users who currently rely on the singleProcessOOMKill kubelet flag to migrate away from it within just three releases.

So personally, I don’t think the kubelet flag needs to be removed. What do you think?

toVersus · 2025-08-25T00:08:34Z

keps/sig-node/xxxx-container-oom-kill-mode/README.md

+
+### Validation Rules
+
+API validation will be added in `pkg/apis/core/validation/validation.go`:


If Pod spec's OS field is set to windows, it might be a better to prevent users from specifying OOMKillMode.
https://github.com/kubernetes/kubernetes/blob/v1.33.4/pkg/apis/core/types.go#L3680

toVersus · 2025-08-25T00:11:00Z

keps/sig-node/xxxx-container-oom-kill-mode/README.md

+   - cgroups are managed at the container level
+
+3. **Consistency with Existing APIs**: Follows the pattern of other container-level resource configurations:
+   - Resources (limits/requests) are container-level


nit:

Suggested change

- Resources (limits/requests) are container-level

- Resources (limits/requests) can be set at both pod and container level

toVersus · 2025-08-25T00:18:40Z

keps/sig-node/xxxx-container-oom-kill-mode/README.md

+            useGroupKill = false
+        case v1.OOMKillModeGroup:
+            if !isCgroup2UnifiedMode() {
+                klog.Warningf("Container %s/%s requests Group mode but system uses cgroup v1, falling back to single process",


End users probably don’t look at kubelet logs much, so it would be nice to notify them via Kubernetes Events. However, with the current kubelet implementation, that might be tricky.

toVersus · 2025-08-25T00:25:19Z

keps/sig-node/xxxx-container-oom-kill-mode/README.md

+    // OOMKillMode specifies how the OOM killer behaves for this container.
+    // - "Single": only the process that triggered OOM is killed
+    // - "Group": all processes in the container are killed (cgroup v2 default)
+    // If not specified, the behavior is determined by the kubelet configuration.


If OOMKillMode is not specified, the kubelet will determine the default value. It might be helpful to write the determined value into the ContainerStatus so that users can check it.
https://github.com/kubernetes/kubernetes/blob/v1.33.4/staging/src/k8s.io/api/core/v1/types.go#L3152

k8s-ci-robot · 2025-08-28T22:32:32Z

@utam0k: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-enhancements-verify	`d8fe9cd`	link	true	`/test pull-enhancements-verify`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

SergeyKanzhelev · 2025-09-02T23:25:36Z

keps/sig-node/xxxx-container-oom-kill-mode/README.md

+   - SecurityContext can be set at both pod and container level
+   - The existing `singleProcessOOMKill` implementation operates per container
+
+### API Design


this API is heavily influenced by the linux configuration that controls this behavior. Some things we may consider here by expanding the scope of the feature and integrating in some wider scope:

Will we ever want same/similar setting for eviction logic? Will kubelet be ever extended to understand that a single container or even a single process in a container can be "evicted"? Especially with PSI metrics support

Do we need to "WholePod" option? Especially in conjunction with the restart policies Container restart rules to customize the pod restart policy #5307 Mostly for use cases when the init container will need to be re-run if OOM kill happened.

There may be more things it may align with, but those two are the first things came to mind

cc @haircommander @ndixita @yuanwang04

I agree we should consider how the OOM kill could work on the container-level as well as pod-level. In general, I feel this should be handled by kubelet as an action after the the container died for whatever reasons (e.g. KillPod action on container exit). Container-OOM would be one of the cases, so probably defining WholePod here is not the most clean solution.

On the other hand, is this OOMKillMode something that's better specified on the container-level or the pod-level?

SergeyKanzhelev · 2025-09-02T23:27:00Z

keps/sig-node/xxxx-container-oom-kill-mode/README.md

+- Platform-specific limitations (cgroup v1, Windows)
+- Another field for users to understand and configure
+
+## Alternatives


some alternatives include implementing this as an NRI plugin or as a privileged Init container that configures this for a cgroup. You may want to list these alternatives with the rejection reason.

Add container OOM kill mode configuration KEP

d8fe9cd

Signed-off-by: utam0k <k0ma@utam0k.jp>

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 24, 2025

k8s-ci-robot requested a review from dchen1107 August 24, 2025 08:56

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Aug 24, 2025

k8s-ci-robot requested a review from derekwaynecarr August 24, 2025 08:56

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 24, 2025

toVersus reviewed Aug 25, 2025

View reviewed changes

BenTheElder mentioned this pull request Aug 25, 2025

Crier failing to report OOMed pods kubernetes-sigs/prow#210

Open

utam0k mentioned this pull request Aug 28, 2025

Container-level OOM kill mode configuration #5507

Open

4 tasks

utam0k changed the title ~~[WIP] Add container OOM kill mode configuration KEP~~ KEP-5507: Add container OOM kill mode configuration KEP Aug 28, 2025

utam0k marked this pull request as ready for review August 28, 2025 22:29

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 28, 2025

k8s-ci-robot requested a review from mrunalp August 28, 2025 22:29

utam0k changed the title ~~KEP-5507: Add container OOM kill mode configuration KEP~~ KEP-5507: Container-level OOM kill mode configuration Aug 28, 2025

SergeyKanzhelev reviewed Sep 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KEP-5507: Container-level OOM kill mode configuration #5496

KEP-5507: Container-level OOM kill mode configuration #5496

utam0k commented Aug 24, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Aug 24, 2025

Uh oh!

k8s-ci-robot commented Aug 24, 2025

Uh oh!

toVersus Aug 24, 2025

Uh oh!

toVersus Aug 25, 2025

Uh oh!

toVersus Aug 25, 2025

Uh oh!

toVersus Aug 25, 2025

Uh oh!

toVersus Aug 25, 2025

Uh oh!

toVersus Aug 25, 2025

Uh oh!

k8s-ci-robot commented Aug 28, 2025

Uh oh!

SergeyKanzhelev Sep 2, 2025

Uh oh!

yuanwang04 Sep 5, 2025

Uh oh!

yuanwang04 Sep 5, 2025

Uh oh!

SergeyKanzhelev Sep 2, 2025

Uh oh!

Uh oh!


		### Goals

		- Add a per-container `oomKillMode` field to allow container-level OOM behavior configuration


		These reports led to [PR #122813](https://github.com/kubernetes/kubernetes/pull/122813), which attempted to add a kubelet flag but was closed after community discussion concluded that container-level configuration was the proper solution[^3]. The consensus was that node-level configuration cannot adequately address the needs of heterogeneous workloads.

		This KEP also deprecates the `singleProcessOOMKill` kubelet flag for removal in v1.38 (GA). Container-level configuration provides better granularity and eliminates the complexity of maintaining both node-level and container-level settings.


		### Validation Rules

		API validation will be added in `pkg/apis/core/validation/validation.go`:

	- Resources (limits/requests) are container-level
	- Resources (limits/requests) can be set at both pod and container level

KEP-5507: Container-level OOM kill mode configuration #5496

Are you sure you want to change the base?

KEP-5507: Container-level OOM kill mode configuration #5496

Conversation

utam0k commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Aug 24, 2025

Uh oh!

k8s-ci-robot commented Aug 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Aug 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

utam0k commented Aug 24, 2025 •

edited

Loading