Skip to content

Commit 3da0389

Browse files
committed
Adding Beta requirement for Pod Level Resources KEP
Signed-off-by: ndixita <ndixita@google.com>
1 parent bd60469 commit 3da0389

File tree

3 files changed

+149
-116
lines changed

3 files changed

+149
-116
lines changed
Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 2837
22
alpha:
3-
approver: "@jpbetz"
3+
approver: "@jpbetz"
4+
beta:
5+
approver: "@jpbetz"

keps/sig-node/2837-pod-level-resource-spec/README.md

Lines changed: 144 additions & 113 deletions
Original file line numberDiff line numberDiff line change
@@ -38,18 +38,19 @@
3838
- [API changes](#api-changes)
3939
- [Resize Restart Policy](#resize-restart-policy)
4040
- [Implementation Details](#implementation-details)
41-
- [[Scoped for GA] Memory Manager](#scoped-for-ga-memory-manager)
42-
- [[Scoped for GA] CPU Manager](#scoped-for-ga-cpu-manager)
43-
- [[Scoped for GA] Topology Manager](#scoped-for-ga-topology-manager)
44-
- [[Scoped for GA] User Experience Survey](#scoped-for-ga-user-experience-survey)
4541
- [[Scoped for Beta] Surfacing Pod Resource Requirements](#scoped-for-beta-surfacing-pod-resource-requirements)
4642
- [The Challenge of Determining Effective Pod Resource Requirements](#the-challenge-of-determining-effective-pod-resource-requirements)
4743
- [Goals of surfacing Pod Resource Requirements](#goals-of-surfacing-pod-resource-requirements)
4844
- [Implementation Details](#implementation-details-1)
4945
- [Notes for implementation](#notes-for-implementation)
50-
- [[Scoped for Beta] VPA](#scoped-for-beta-vpa)
51-
- [[Scoped for Beta] Cluster Autoscaler](#scoped-for-beta-cluster-autoscaler)
52-
- [[Scoped for Beta] Support for Windows](#scoped-for-beta-support-for-windows)
46+
- [[Scoped for Beta] HPA](#scoped-for-beta-hpa)
47+
- [Cluster Autoscaler](#cluster-autoscaler)
48+
- [VPA](#vpa)
49+
- [[Future KEP Consideration in 1.35] Support for Windows](#future-kep-consideration-in-135-support-for-windows)
50+
- [[Future KEP Consideration in 1.35] Memory Manager](#future-kep-consideration-in-135-memory-manager)
51+
- [[Future KEP Consideration] CPU Manager](#future-kep-consideration-cpu-manager)
52+
- [[Future KEP Consideration] Topology Manager](#future-kep-consideration-topology-manager)
53+
- [[Scoped for GA] User Experience Survey](#scoped-for-ga-user-experience-survey)
5354
- [Test Plan](#test-plan)
5455
- [Unit tests](#unit-tests)
5556
- [e2e tests](#e2e-tests)
@@ -71,7 +72,7 @@
7172
- [Implementation History](#implementation-history)
7273
- [Drawbacks](#drawbacks)
7374
- [Alternatives](#alternatives)
74-
- [VPA](#vpa)
75+
- [VPA](#vpa-1)
7576
<!-- /toc -->
7677

7778

@@ -1359,93 +1360,6 @@ either modify the pod-level resources to accommodate ephemeral containers or
13591360
supply resources at container-level for ephemeral containers and kubernetes will
13601361
resize the pod to accommodate the ephemeral containers.
13611362

1362-
1363-
#### [Scoped for GA] Memory Manager
1364-
1365-
The Memory Manager currently allocates memory resources at
1366-
the container level through its
1367-
[Allocate](https://github.com/kubernetes/kubernetes/blob/849a82b727b1cc1e77b58149b3cacbfa5ada30fd/pkg/kubelet/cm/memorymanager/memory_manager.go#L261)
1368-
method. The [Topology Manager](https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150) calls this Allocate method as part of its hint provider integration.
1369-
1370-
1371-
With the introduction of Pod Level Resources, the following modifications are needed:
1372-
1373-
1. Memory Manager Interface Extension:
1374-
Add a new AllocatePodLevel method to the Memory Manager interface to handle
1375-
resource allocation at the pod level. This method will complement the existing container-level Allocate method.
1376-
1377-
2. Topology Manager Integration: Modify the (Topology Manager)[https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150] to conditionally
1378-
call AllocatePodLevel when pod-level resources are configured. Maintain
1379-
backward compatibility by continuing to use the existing Allocate method for
1380-
container-level allocation scenarios
1381-
1382-
Note: The BestEffort policy (Windows-only) is explicitly out of scope for this
1383-
change, as Windows implementation is not covered by the Pod Level Resources KEP.
1384-
1385-
#### [Scoped for GA] CPU Manager
1386-
1387-
The Memory Manager currently allocates memory resources at
1388-
the container level through its
1389-
[Allocate](https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/cpumanager/cpu_manager.go#L255)
1390-
method. The [Topology Manager](https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150) calls this Allocate method as part of its hint provider integration.
1391-
1392-
With the introduction of Pod Level Resources, the following modifications are required:
1393-
1394-
1. CPU Manager Interface Extension: Add a new AllocatePodLevel method to the CPU
1395-
Manager interface to handle resource allocation at the pod level. This method
1396-
will complement the existing container-level Allocate method.
1397-
1398-
2. Topology Manager Integration: Modify the (Topology
1399-
Manager)[https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150]
1400-
to conditionally call AllocatePodLevel when pod-level resources are
1401-
configured. Backward compatibility will be maintained by continuing to use the
1402-
existing Allocate method for container-level allocation scenarios.
1403-
1404-
3. Policy-Specific Modifications: Not all existing CPU Manager policies remain
1405-
compatible with Pod Level Resources. Following are policy-specific
1406-
adaptations:
1407-
1408-
* distribute-cpus-across-numa: This policy is incompatible with pod-level
1409-
resources. Distributing CPUs across NUMA nodes requires detailed knowledge of
1410-
bandwidth-intensive containers, which is explicitly abstracted away by
1411-
pod-level resources. Without workload-specific information, the system cannot
1412-
optimally distribute containers across NUMA nodes, and incorrect placement
1413-
could degrade performance (How to distribute M containers across N NUMA
1414-
nodes).
1415-
1416-
* distribute-cpus-across-cores: Similarly, this policy is incompatible. Users focused on core-level optimization for individual containers would likely not opt for pod-level resources in the first place.
1417-
1418-
* full-pcpus-only: This policy is compatible and highly beneficial for multi-tenant pods requiring inter-pod isolation, as it helps prevent hyperthread contention. The CPU Manager will be extended to allocate full physical cores at the pod level and implement a shared CPU pool within pod boundaries.
1419-
1420-
* align-by-socket: This policy is compatible. It ensures all a pod's CPUs remain on the same socket when possible, reducing inter-socket latencies and benefiting containers that share L3 cache or communicate frequently. The socket alignment logic will be extended to work with pod-level CPU pools.
1421-
1422-
* strict-cpu-reservation: This policy is compatible and crucial for guaranteed workloads, preventing interference from burstable and best-effort pods. We'll update the CPU reservation logic to consider pod-level requests and limits.
1423-
1424-
* prefer-align-cpus-by-uncorecache: This policy is compatible. It optimizes CPU allocation across uncore cache groups, enhancing shared cache locality for containers within the pod. The allocation logic will be updated to consider pod-level requests and limits.
1425-
1426-
Note: This is a prelimnary analysis, and we might have real usecases to support
1427-
distribute-cpus-across-numa and distribute-cpus-across-cores with pod-level
1428-
resources. We can re-visit this again during the GA planning cycle.
1429-
1430-
#### [Scoped for GA] Topology Manager
1431-
1432-
Currently, scope=pod aggregates resource requirements from a pod's individual
1433-
containers to determine overall pod-level needs. With the introduction of Pod
1434-
Level Resources, scope=pod will directly use the pod-level resource values
1435-
specified in the Pod object for topology alignment.
1436-
1437-
Besides, scope=container won't be supported for pods with Pod Level Resources. This is because these pods lack per-container resource specifications, leaving the Topology Manager without the granular information needed to make informed container-level topology decisions. If a user attempts to configure scope=container for such a pod, the Topology Manager will explicitly disallow it and provide an informative message. This message will guide the user to use scope=pod or to configure per-container resources if fine-grained container-level topology is truly desired.
1438-
1439-
#### [Scoped for GA] User Experience Survey
1440-
1441-
Before promoting the feature to GA, we plan to conduct a UX survey to
1442-
understand user expectations for setting various combinations of requests and
1443-
limits at both the pod and container levels. This will help us gather use cases
1444-
for different combinations, enabling us to enhance the feature's usability. If we
1445-
identify the need for significant changes to the defaulting logic based on this
1446-
feedback, we'll release another Beta version of Pod-Level Resources to
1447-
incorporate those adjustments.
1448-
14491363
#### [Scoped for Beta] Surfacing Pod Resource Requirements
14501364

14511365
##### The Challenge of Determining Effective Pod Resource Requirements
@@ -1558,32 +1472,149 @@ KEPs. The first change doesn’t present any user visible change, and if
15581472
implemented, will in a small way reduce the effort for both of those KEPs by
15591473
providing a single place to update the pod resource calculation.
15601474
1561-
#### [Scoped for Beta] VPA
1562-
1563-
TBD. Do not review for the alpha stage.
1564-
1565-
#### [Scoped for Beta] Cluster Autoscaler
1475+
#### [Scoped for Beta] HPA
1476+
For accurate scaling decisions, HPA needs to be able to correctly calculate the
1477+
resources requested by a pod, regardless of whether those requests are defined
1478+
at the pod or container level. Currently, HPA calculates pod requests by simply
1479+
aggregating the requests of all containers within a pod. To address this, HPA
1480+
should leverage the helper method found at
1481+
https://github.com/kubernetes/kubernetes/blob/988cf21f0975cf95444a619481c13d2503d8ec6a/staging/src/k8s.io/component-helpers/resource/helpers.go
1482+
for more precise pod request computations. The changes are being worked on by
1483+
sig-autoscaling: (#132237)[https://github.com/kubernetes/kubernetes/issues/132237]
1484+
1485+
#### Cluster Autoscaler
1486+
1487+
The Cluster Autoscaler uses resourcehelper.PodRequests to calculate Pod resource
1488+
requirements for scaling decisions version 1.4.0. This automatically
1489+
includes Pod-level resource requests when the PodLevelResources feature gate is
1490+
enabled, ensuring accurate node scaling and utilization calculations.
1491+
1492+
#### VPA
1493+
1494+
Collaboration with sig-autoscaling has been established to integrate support for
1495+
VPA with Pod-level resources, slated for VPA 1.34. The changes to support pod-level
1496+
resources in VPA will be worked on in two phases:
1497+
* [Scoped for Beta] Phase 1 of Necessary changes
1498+
The necessary changes include augmenting the recommendation algorithm to
1499+
provide pod-level resource recommendations within RecommendedPodResources, in
1500+
addition to existing per-container recommendations, when pod-level resources are
1501+
set.
1502+
```
1503+
type RecommendedPodResources struct {
1504+
ContainerRecommendations []RecommendedContainerResources
1505+
// NEW: Pod-level resources
1506+
PodLevelResources *ResourceList
1507+
}
1508+
```
1509+
Note: Detailed KEP design is owned and being worked on by
1510+
sig-autoscaling: [#7571](https://github.com/kubernetes/autoscaler/issues/7571)
15661511
1567-
Cluster Autoscaler won't work as expected with pod-level resources in alpha since
1568-
it relies on container-level values to be specified. If a user specifies only
1569-
pod-level resources, the CA will assume that the pod requires no resources since
1570-
container-level values are not set. As a result, the CA won't scale the number of
1571-
nodes to accommodate this pod. Meanwhile, the scheduler will evaluate the
1572-
pod-level resource requests but may be unable to find a suitable node to fit the
1573-
pod. Consequently, the pod will not be scheduled. While this behavior is
1574-
acceptable for the alpha implementation, it is anticipated that Cluster
1575-
Autoscaler support will be addressed in the Beta phase with pod resource
1576-
requirements surfaced in a helper library/function that autoscalers can use to
1577-
make autoscaling decisions.
1512+
* [Scoped for GA] Phase 2 of improving recommendation algorithm
1513+
Pod-Level Resources allows pod-level limits to be greater than aggregated
1514+
container limits to allow the containers to share idle resources among each other.
1515+
Integrating this functionality with VPA necessitates the development of a
1516+
complex new recommendation algorithm. Concepts such as proportionate pod
1517+
and container level recommendations have been proposed and
1518+
require further discussion.
15781519
1579-
#### [Scoped for Beta] Support for Windows
1520+
#### [Future KEP Consideration in 1.35] Support for Windows
15801521
15811522
Pod-level resource specifications are a natural extension of Kubernetes' existing
15821523
resource management model. Although this new feature is expected to function with
15831524
Windows containers, careful testing and consideration are required due to
15841525
platform-specific differences. As the introduction of pod-level resources is a
15851526
major change in itself, full support for Windows will be addressed in future
1586-
stages, beyond the initial alpha release.
1527+
KEPs, beyond the scope of this KEP.
1528+
1529+
#### [Future KEP Consideration in 1.35] Memory Manager
1530+
1531+
The Memory Manager currently allocates memory resources at
1532+
the container level through its
1533+
[Allocate](https://github.com/kubernetes/kubernetes/blob/849a82b727b1cc1e77b58149b3cacbfa5ada30fd/pkg/kubelet/cm/memorymanager/memory_manager.go#L261)
1534+
method. The [Topology Manager](https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150) calls this Allocate method as part of its hint provider integration.
1535+
1536+
1537+
With the introduction of Pod Level Resources, the following modifications are needed:
1538+
1539+
1. Memory Manager Interface Extension:
1540+
Add a new AllocatePodLevel method to the Memory Manager interface to handle
1541+
resource allocation at the pod level. This method will complement the existing container-level Allocate method.
1542+
1543+
2. Topology Manager Integration: Modify the (Topology Manager)[https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150] to conditionally
1544+
call AllocatePodLevel when pod-level resources are configured. Maintain
1545+
backward compatibility by continuing to use the existing Allocate method for
1546+
container-level allocation scenarios
1547+
1548+
#### [Future KEP Consideration] CPU Manager
1549+
1550+
The Memory Manager currently allocates memory resources at
1551+
the container level through its
1552+
[Allocate](https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/cpumanager/cpu_manager.go#L255)
1553+
method. The [Topology Manager](https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150) calls this Allocate method as part of its hint provider integration.
1554+
1555+
With the introduction of Pod Level Resources, the following modifications are required:
1556+
1557+
1. CPU Manager Interface Extension: Add a new AllocatePodLevel method to the CPU
1558+
Manager interface to handle resource allocation at the pod level. This method
1559+
will complement the existing container-level Allocate method.
1560+
1561+
2. Topology Manager Integration: Modify the (Topology
1562+
Manager)[https://github.com/kubernetes/kubernetes/blob/fd53f7292c7d5899135fddd928c0dc3844126820/pkg/kubelet/cm/topologymanager/scope.go#L150]
1563+
to conditionally call AllocatePodLevel when pod-level resources are
1564+
configured. Backward compatibility will be maintained by continuing to use the
1565+
existing Allocate method for container-level allocation scenarios.
1566+
1567+
3. Policy-Specific Modifications: Not all existing CPU Manager policies remain
1568+
compatible with Pod Level Resources. Following are policy-specific
1569+
adaptations:
1570+
1571+
* distribute-cpus-across-numa: This policy is incompatible with pod-level
1572+
resources. Distributing CPUs across NUMA nodes requires detailed knowledge of
1573+
bandwidth-intensive containers, which is explicitly abstracted away by
1574+
pod-level resources. Without workload-specific information, the system cannot
1575+
optimally distribute containers across NUMA nodes, and incorrect placement
1576+
could degrade performance (How to distribute M containers across N NUMA
1577+
nodes).
1578+
1579+
* distribute-cpus-across-cores: Similarly, this policy is incompatible. Users focused on core-level optimization for individual containers would likely not opt for pod-level resources in the first place.
1580+
1581+
* full-pcpus-only: This policy is compatible and highly beneficial for multi-tenant pods requiring inter-pod isolation, as it helps prevent hyperthread contention. The CPU Manager will be extended to allocate full physical cores at the pod level and implement a shared CPU pool within pod boundaries.
1582+
1583+
* align-by-socket: This policy is compatible. It ensures all a pod's CPUs remain on the same socket when possible, reducing inter-socket latencies and benefiting containers that share L3 cache or communicate frequently. The socket alignment logic will be extended to work with pod-level CPU pools.
1584+
1585+
* strict-cpu-reservation: This policy is compatible and crucial for guaranteed workloads, preventing interference from burstable and best-effort pods. We'll update the CPU reservation logic to consider pod-level requests and limits.
1586+
1587+
* prefer-align-cpus-by-uncorecache: This policy is compatible. It optimizes CPU allocation across uncore cache groups, enhancing shared cache locality for containers within the pod. The allocation logic will be updated to consider pod-level requests and limits.
1588+
1589+
Note: This is a prelimnary analysis, and we might have real usecases to support
1590+
distribute-cpus-across-numa and distribute-cpus-across-cores with pod-level
1591+
resources. We can re-visit this again during the GA planning cycle.
1592+
1593+
#### [Future KEP Consideration] Topology Manager
1594+
1595+
Currently, scope=pod aggregates resource requirements from a pod's individual
1596+
containers to determine overall pod-level needs. With the introduction of Pod
1597+
Level Resources, scope=pod will directly use the pod-level resource values
1598+
specified in the Pod object for topology alignment.
1599+
1600+
Besides, scope=container won't be supported for pods with Pod Level Resources. This
1601+
is because these pods lack per-container resource specifications, leaving the
1602+
Topology Manager without the granular information needed to make informed
1603+
container-level topology decisions. If a user creates a pod with pod-level
1604+
resources and this pod gets scheduled on a node where the Kubelet's Topology Manager
1605+
is configured with scope=container, then the Topology Manager will not perform
1606+
resource alignment for that pod, and will explicitly throw an error with an
1607+
informative message.This message will guide the user to use scope=pod or to configure per-container resources if fine-grained container-level topology is truly desired.
1608+
1609+
#### [Scoped for GA] User Experience Survey
1610+
1611+
Before promoting the feature to GA, we plan to conduct a UX survey to
1612+
understand user expectations for setting various combinations of requests and
1613+
limits at both the pod and container levels. This will help us gather use cases
1614+
for different combinations, enabling us to enhance the feature's usability. If we
1615+
identify the need for significant changes to the defaulting logic based on this
1616+
feedback, we'll release another Beta version of Pod-Level Resources to
1617+
incorporate those adjustments.
15871618
15881619
### Test Plan
15891620

keps/sig-node/2837-pod-level-resource-spec/kep.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,12 @@ see-also: []
2121
replaces: []
2222

2323
# The target maturity stage in the current dev cycle for this KEP.
24-
stage: alpha
24+
stage: beta
2525

2626
# The most recent milestone for which work toward delivery of this KEP has been
2727
# done. This can be the current (upcoming) milestone, if it is being actively
2828
# worked on.
29-
latest-milestone: "v1.32"
29+
latest-milestone: "v1.34"
3030

3131
# The milestone at which this feature was, or is targeted to be, at each stage.
3232
milestone:

0 commit comments

Comments
 (0)