Adding Beta requirement for Pod Level Resources KEP

ndixita · ndixita · commit 3da03892d7f4 · 2025-06-11T22:59:56.000Z
Signed-off-by: ndixita &lt;ndixita@google.com&gt;
diff --git a/keps/prod-readiness/sig-node/2837.yaml b/keps/prod-readiness/sig-node/2837.yaml
@@ -1,3 +1,5 @@
 kep-number: 2837
 alpha:
-  approver: "@jpbetz"
+  approver: "@jpbetz"
+beta: 
+  approver: "@jpbetz"  
diff --git a/keps/sig-node/2837-pod-level-resource-spec/README.md b/keps/sig-node/2837-pod-level-resource-spec/README.md
@@ -38,18 +38,19 @@
       - [API changes](#api-changes)
       - [Resize Restart Policy](#resize-restart-policy)
       - [Implementation Details](#implementation-details)
-    - [[Scoped for GA] Memory Manager](#scoped-for-ga-memory-manager)
-    - [[Scoped for GA] CPU Manager](#scoped-for-ga-cpu-manager)
-    - [[Scoped for GA] Topology Manager](#scoped-for-ga-topology-manager)
-    - [[Scoped for GA] User Experience Survey](#scoped-for-ga-user-experience-survey)
     - [[Scoped for Beta] Surfacing Pod Resource Requirements](#scoped-for-beta-surfacing-pod-resource-requirements)
       - [The Challenge of Determining Effective Pod Resource Requirements](#the-challenge-of-determining-effective-pod-resource-requirements)
       - [Goals of surfacing Pod Resource Requirements](#goals-of-surfacing-pod-resource-requirements)
       - [Implementation Details](#implementation-details-1)
       - [Notes for implementation](#notes-for-implementation)
-    - [[Scoped for Beta] VPA](#scoped-for-beta-vpa)
-    - [[Scoped for Beta] Cluster Autoscaler](#scoped-for-beta-cluster-autoscaler)
-    - [[Scoped for Beta] Support for Windows](#scoped-for-beta-support-for-windows)
+    - [[Scoped for Beta] HPA](#scoped-for-beta-hpa)
+    - [Cluster Autoscaler](#cluster-autoscaler)
+    - [VPA](#vpa)
+    - [[Future KEP Consideration in 1.35] Support for Windows](#future-kep-consideration-in-135-support-for-windows)
+    - [[Future KEP Consideration in 1.35] Memory Manager](#future-kep-consideration-in-135-memory-manager)
+    - [[Future KEP Consideration] CPU Manager](#future-kep-consideration-cpu-manager)
+    - [[Future KEP Consideration] Topology Manager](#future-kep-consideration-topology-manager)
+    - [[Scoped for GA] User Experience Survey](#scoped-for-ga-user-experience-survey)
   - [Test Plan](#test-plan)
     - [Unit tests](#unit-tests)
     - [e2e tests](#e2e-tests)
@@ -71,7 +72,7 @@
 - [Implementation History](#implementation-history)
 - [Drawbacks](#drawbacks)
 - [Alternatives](#alternatives)
-  - [VPA](#vpa)
+  - [VPA](#vpa-1)
 <!-- /toc -->
 
 
@@ -1359,93 +1360,6 @@ either modify the pod-level resources to accommodate ephemeral containers or
 supply resources at container-level for ephemeral containers and kubernetes will
 resize the pod to accommodate the ephemeral containers.
 
-
-#### [Scoped for GA] Memory Manager
-
-The Memory Manager currently allocates memory resources at
-the container level through its
-[Allocate](https://github.com/kubernetes/kubernetes/blob/849a82b727b1cc1e77b58149b3cacbfa5ada30fd/pkg/kubelet/cm/memorymanager/memory_manager.go#L261)
-method. The [Topology Manager](https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150) calls this Allocate method as part of its hint provider integration.
-
-
-With the introduction of Pod Level Resources, the following modifications are needed:
-
-1. Memory Manager Interface Extension:
-Add a new AllocatePodLevel method to the Memory Manager interface to handle 
-resource allocation at the pod level. This method will complement the existing container-level Allocate method.
-
-2. Topology Manager Integration: Modify the (Topology Manager)[https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150] to conditionally
-call AllocatePodLevel when pod-level resources are configured. Maintain
-backward compatibility by continuing to use the existing Allocate method for
-container-level allocation scenarios
-
-Note: The BestEffort policy (Windows-only) is explicitly out of scope for this 
-change, as Windows implementation is not covered by the Pod Level Resources KEP.
-
-#### [Scoped for GA] CPU Manager
-
-The Memory Manager currently allocates memory resources at
-the container level through its
-[Allocate](https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/cpumanager/cpu_manager.go#L255)
-method. The [Topology Manager](https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150) calls this Allocate method as part of its hint provider integration.
-
-With the introduction of Pod Level Resources, the following modifications are required:
-
-1. CPU Manager Interface Extension: Add a new AllocatePodLevel method to the CPU
-Manager interface to handle resource allocation at the pod level. This method
-will complement the existing container-level Allocate method.
-
-2. Topology Manager Integration: Modify the (Topology
-Manager)[https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150]
-to conditionally call AllocatePodLevel when pod-level resources are
-configured. Backward compatibility will be maintained by continuing to use the
-existing Allocate method for container-level allocation scenarios.
-
-3. Policy-Specific Modifications: Not all existing CPU Manager policies remain
-   compatible with Pod Level Resources. Following are policy-specific
-   adaptations:
-   
-* distribute-cpus-across-numa: This policy is incompatible with pod-level
-  resources. Distributing CPUs across NUMA nodes requires detailed knowledge of
-  bandwidth-intensive containers, which is explicitly abstracted away by
-  pod-level resources. Without workload-specific information, the system cannot
-  optimally distribute containers across NUMA nodes, and incorrect placement
-  could degrade performance (How to distribute M containers across N NUMA
-  nodes).
-
-* distribute-cpus-across-cores: Similarly, this policy is incompatible. Users focused on core-level optimization for individual containers would likely not opt for pod-level resources in the first place.
-
-* full-pcpus-only: This policy is compatible and highly beneficial for multi-tenant pods requiring inter-pod isolation, as it helps prevent hyperthread contention. The CPU Manager will be extended to allocate full physical cores at the pod level and implement a shared CPU pool within pod boundaries.
-
-* align-by-socket: This policy is compatible. It ensures all a pod's CPUs remain on the same socket when possible, reducing inter-socket latencies and benefiting containers that share L3 cache or communicate frequently. The socket alignment logic will be extended to work with pod-level CPU pools.
-
-* strict-cpu-reservation: This policy is compatible and crucial for guaranteed workloads, preventing interference from burstable and best-effort pods. We'll update the CPU reservation logic to consider pod-level requests and limits.
-
-* prefer-align-cpus-by-uncorecache: This policy is compatible. It optimizes CPU allocation across uncore cache groups, enhancing shared cache locality for containers within the pod. The allocation logic will be updated to consider pod-level requests and limits.
-
-Note: This is a prelimnary analysis, and we might have real usecases to support
-  distribute-cpus-across-numa and distribute-cpus-across-cores with pod-level
-  resources. We can re-visit this again during the GA planning cycle.
-
-#### [Scoped for GA] Topology Manager
-
-Currently, scope=pod aggregates resource requirements from a pod's individual
-containers to determine overall pod-level needs. With the introduction of Pod
-Level Resources, scope=pod will directly use the pod-level resource values
-specified in the Pod object for topology alignment.
-
-Besides, scope=container won't be supported for pods with Pod Level Resources. This is because these pods lack per-container resource specifications, leaving the Topology Manager without the granular information needed to make informed container-level topology decisions. If a user attempts to configure scope=container for such a pod, the Topology Manager will explicitly disallow it and provide an informative message. This message will guide the user to use scope=pod or to configure per-container resources if fine-grained container-level topology is truly desired.
-
-#### [Scoped for GA] User Experience Survey
-
-Before promoting the feature to GA, we plan to conduct a UX survey to
-understand user expectations for setting various combinations of requests and
-limits at both the pod and container levels. This will help us gather use cases
-for different combinations, enabling us to enhance the feature's usability. If we
-identify the need for significant changes to the defaulting logic based on this 
-feedback, we'll release another Beta version of Pod-Level Resources to
-incorporate those adjustments.
-
 #### [Scoped for Beta] Surfacing Pod Resource Requirements
 
 ##### The Challenge of Determining Effective Pod Resource Requirements
@@ -1558,32 +1472,149 @@ KEPs.  The first change doesn’t present any user visible change, and if
 implemented, will in a small way reduce the effort for both of those KEPs by
 providing a single place to update the pod resource calculation.
 
-#### [Scoped for Beta] VPA
-
-TBD. Do not review for the alpha stage.
-
-#### [Scoped for Beta] Cluster Autoscaler
+#### [Scoped for Beta] HPA
+For accurate scaling decisions, HPA needs to be able to correctly calculate the
+resources requested by a pod, regardless of whether those requests are defined
+at the pod or container level. Currently, HPA calculates pod requests by simply
+aggregating the requests of all containers within a pod. To address this, HPA
+should leverage the helper method found at
+https://github.com/kubernetes/kubernetes/blob/988cf21f0975cf95444a619481c13d2503d8ec6a/staging/src/k8s.io/component-helpers/resource/helpers.go
+for more precise pod request computations. The changes are being worked on by
+sig-autoscaling: (#132237)[https://github.com/kubernetes/kubernetes/issues/132237]
+
+#### Cluster Autoscaler
+
+The Cluster Autoscaler uses resourcehelper.PodRequests to calculate Pod resource
+requirements for scaling decisions version 1.4.0. This automatically 
+includes Pod-level resource requests when the PodLevelResources feature gate is 
+enabled, ensuring accurate node scaling and utilization calculations.
+
+#### VPA
+
+Collaboration with sig-autoscaling has been established to integrate support for
+VPA with Pod-level resources, slated for VPA 1.34. The changes to support pod-level 
+resources in VPA will be worked on in two phases:
+* [Scoped for Beta] Phase 1 of Necessary changes
+  The necessary changes include augmenting the recommendation algorithm to
+  provide pod-level resource recommendations within RecommendedPodResources, in
+  addition to existing per-container recommendations, when pod-level resources are
+  set.
+  ```
+  type RecommendedPodResources struct {
+    ContainerRecommendations []RecommendedContainerResources
+    // NEW: Pod-level resources
+    PodLevelResources        *ResourceList
+  }
+  ```
+  Note: Detailed KEP design is owned and being worked on by 
+  sig-autoscaling: [#7571](https://github.com/kubernetes/autoscaler/issues/7571)
 
-Cluster Autoscaler won't work as expected with pod-level resources in alpha since
-it relies on container-level values to be specified. If a user specifies only
-pod-level resources, the CA will assume that the pod requires no resources since
-container-level values are not set. As a result, the CA won't scale the number of
-nodes to accommodate this pod. Meanwhile, the scheduler will evaluate the
-pod-level resource requests but may be unable to find a suitable node to fit the
-pod. Consequently, the pod will not be scheduled. While this behavior is
-acceptable for the alpha implementation, it is anticipated that Cluster
-Autoscaler support will be addressed in the Beta phase with pod resource
-requirements surfaced in a helper library/function that autoscalers can use to
-make autoscaling decisions.
+* [Scoped for GA] Phase 2 of improving recommendation algorithm
+  Pod-Level Resources allows pod-level limits to be greater than aggregated
+  container limits to allow the containers to share idle resources among each other.
+  Integrating this functionality with VPA necessitates the development of a 
+  complex new recommendation algorithm. Concepts such as proportionate pod
+  and container level recommendations have been proposed and 
+  require further discussion.
 
-#### [Scoped for Beta] Support for Windows
+#### [Future KEP Consideration in 1.35] Support for Windows
 
 Pod-level resource specifications are a natural extension of Kubernetes' existing
 resource management model. Although this new feature is expected to function with
 Windows containers, careful testing and consideration are required due to
 platform-specific differences. As the introduction of pod-level resources is a
 major change in itself, full support for Windows will be addressed in future
-stages, beyond the initial alpha release.
+KEPs, beyond the scope of this KEP.
+
+#### [Future KEP Consideration in 1.35] Memory Manager
+
+The Memory Manager currently allocates memory resources at
+the container level through its
+[Allocate](https://github.com/kubernetes/kubernetes/blob/849a82b727b1cc1e77b58149b3cacbfa5ada30fd/pkg/kubelet/cm/memorymanager/memory_manager.go#L261)
+method. The [Topology Manager](https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150) calls this Allocate method as part of its hint provider integration.
+
+
+With the introduction of Pod Level Resources, the following modifications are needed:
+
+1. Memory Manager Interface Extension:
+Add a new AllocatePodLevel method to the Memory Manager interface to handle 
+resource allocation at the pod level. This method will complement the existing container-level Allocate method.
+
+2. Topology Manager Integration: Modify the (Topology Manager)[https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150] to conditionally
+call AllocatePodLevel when pod-level resources are configured. Maintain
+backward compatibility by continuing to use the existing Allocate method for
+container-level allocation scenarios
+
+#### [Future KEP Consideration] CPU Manager
+
+The Memory Manager currently allocates memory resources at
+the container level through its
+[Allocate](https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/cpumanager/cpu_manager.go#L255)
+method. The [Topology Manager](https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150) calls this Allocate method as part of its hint provider integration.
+
+With the introduction of Pod Level Resources, the following modifications are required:
+
+1. CPU Manager Interface Extension: Add a new AllocatePodLevel method to the CPU
+Manager interface to handle resource allocation at the pod level. This method
+will complement the existing container-level Allocate method.
+
+2. Topology Manager Integration: Modify the (Topology
+Manager)[https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150]
+to conditionally call AllocatePodLevel when pod-level resources are
+configured. Backward compatibility will be maintained by continuing to use the
+existing Allocate method for container-level allocation scenarios.
+
+3. Policy-Specific Modifications: Not all existing CPU Manager policies remain
+   compatible with Pod Level Resources. Following are policy-specific
+   adaptations:
+   
+* distribute-cpus-across-numa: This policy is incompatible with pod-level
+  resources. Distributing CPUs across NUMA nodes requires detailed knowledge of
+  bandwidth-intensive containers, which is explicitly abstracted away by
+  pod-level resources. Without workload-specific information, the system cannot
+  optimally distribute containers across NUMA nodes, and incorrect placement
+  could degrade performance (How to distribute M containers across N NUMA
+  nodes).
+
+* distribute-cpus-across-cores: Similarly, this policy is incompatible. Users focused on core-level optimization for individual containers would likely not opt for pod-level resources in the first place.
+
+* full-pcpus-only: This policy is compatible and highly beneficial for multi-tenant pods requiring inter-pod isolation, as it helps prevent hyperthread contention. The CPU Manager will be extended to allocate full physical cores at the pod level and implement a shared CPU pool within pod boundaries.
+
+* align-by-socket: This policy is compatible. It ensures all a pod's CPUs remain on the same socket when possible, reducing inter-socket latencies and benefiting containers that share L3 cache or communicate frequently. The socket alignment logic will be extended to work with pod-level CPU pools.
+
+* strict-cpu-reservation: This policy is compatible and crucial for guaranteed workloads, preventing interference from burstable and best-effort pods. We'll update the CPU reservation logic to consider pod-level requests and limits.
+
+* prefer-align-cpus-by-uncorecache: This policy is compatible. It optimizes CPU allocation across uncore cache groups, enhancing shared cache locality for containers within the pod. The allocation logic will be updated to consider pod-level requests and limits.
+
+Note: This is a prelimnary analysis, and we might have real usecases to support
+  distribute-cpus-across-numa and distribute-cpus-across-cores with pod-level
+  resources. We can re-visit this again during the GA planning cycle.
+
+#### [Future KEP Consideration] Topology Manager
+
+Currently, scope=pod aggregates resource requirements from a pod's individual
+containers to determine overall pod-level needs. With the introduction of Pod
+Level Resources, scope=pod will directly use the pod-level resource values
+specified in the Pod object for topology alignment.
+
+Besides, scope=container won't be supported for pods with Pod Level Resources. This
+is because these pods lack per-container resource specifications, leaving the
+Topology Manager without the granular information needed to make informed
+container-level topology decisions. If a user creates a pod with pod-level 
+resources and this pod gets scheduled on a node where the Kubelet's Topology Manager
+is configured with scope=container, then the Topology Manager will not perform
+resource alignment for that pod, and will explicitly throw an error with an
+informative message.This message will guide the user to use scope=pod or to configure per-container resources if fine-grained container-level topology is truly desired.
+
+#### [Scoped for GA] User Experience Survey
+
+Before promoting the feature to GA, we plan to conduct a UX survey to
+understand user expectations for setting various combinations of requests and
+limits at both the pod and container levels. This will help us gather use cases
+for different combinations, enabling us to enhance the feature's usability. If we
+identify the need for significant changes to the defaulting logic based on this 
+feedback, we'll release another Beta version of Pod-Level Resources to
+incorporate those adjustments.
 
 ### Test Plan
 
diff --git a/keps/sig-node/2837-pod-level-resource-spec/kep.yaml b/keps/sig-node/2837-pod-level-resource-spec/kep.yaml
@@ -21,12 +21,12 @@ see-also: []
 replaces: []
 
 # The target maturity stage in the current dev cycle for this KEP.
-stage: alpha
+stage: beta
 
 # The most recent milestone for which work toward delivery of this KEP has been
 # done. This can be the current (upcoming) milestone, if it is being actively
 # worked on.
-latest-milestone: "v1.32"
+latest-milestone: "v1.34"
 
 # The milestone at which this feature was, or is targeted to be, at each stage.
 milestone: