Skip to content

Commit 4622617

Browse files
committed
update test plan for GA
1 parent 0b00086 commit 4622617

File tree

1 file changed

+181
-105
lines changed
  • keps/sig-node/1287-in-place-update-pod-resources

1 file changed

+181
-105
lines changed

keps/sig-node/1287-in-place-update-pod-resources/README.md

Lines changed: 181 additions & 105 deletions
Original file line numberDiff line numberDiff line change
@@ -1161,123 +1161,198 @@ implementing this enhancement to ensure the enhancements have also solid foundat
11611161
#### Unit Tests
11621162

11631163
Unit tests will cover the sanity of code changes that implements the feature,
1164-
and the policy controls that are introduced as part of this feature.
1164+
and the policy controls that are introduced as part of this feature. This is
1165+
not exhaustive, but a few specifics are covered below:
1166+
1167+
##### Allocation Manager
1168+
Tests: https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/pkg/kubelet/allocation/allocation_manager_test.go
1169+
1170+
The allocation manager is responsible for determining whether a resize can be allocated.
1171+
Unit tests cover this logic, including:
1172+
- Resizes with unsupported features such as static cpu/memory memory or swap are marked infeasible.
1173+
- Resizes for which the node does not currently have room for are marked as deferred.
1174+
- Deferred resizes are retried according to the desired priority.
1175+
1176+
##### Kuberuntime Manager
1177+
Tests:
1178+
- https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/pkg/kubelet/kuberuntime/kuberuntime_manager_test.go#L3048
1179+
- https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/pkg/kubelet/kuberuntime/kuberuntime_manager_test.go#L2320
1180+
- https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/pkg/kubelet/kuberuntime/kuberuntime_manager_test.go#L3290
1181+
- https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/pkg/kubelet/kuberuntime/kuberuntime_manager_test.go#L3668
1182+
1183+
The kuberuntime manager is responsible for actuating a resize after it has been allocated.
1184+
Unit tests cover this logic, including:
1185+
- Validation of the resize, i.e. that memory limits cannot be resized below the usage
1186+
- The logic for determining whether a pod resize is in progress (and that the corresponding pod condition gets added)
1187+
- Computation of what resize actions need to be performed
1188+
- The mock container manager has the expected cgroup values post-resize.
1189+
1190+
##### CRI uunit tests
11651191

11661192
CRI unit tests are updated to reflect use of ContainerResources object in
11671193
UpdateContainerResources and ContainerStatus APIs.
11681194

11691195
#### Integration tests
11701196

1171-
Comprehensive E2E tests provide good coverage for alpha. We may replicate and/or move
1172-
some of the E2E tests functionality into integration tests before Beta using data from
1173-
any issues we uncover that are not covered by planned and implemented tests.
1197+
Comprehensive E2E tests provide good coverage. The following integration tests are also
1198+
added for additional coverage:
1199+
- https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/test/integration/pods/pods_test.go#L852
1200+
- https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/test/integration/scheduler/queueing/queue.go#L287
11741201

11751202
#### Pod Resize E2E Tests
11761203

1204+
##### How the tests perform verification
1205+
11771206
End-to-End tests resize a Pod via PATCH to Pod's Spec.Containers[i].Resources.
11781207
The e2e tests use docker as container runtime.
11791208
- Resizing of Requests are verified by querying the values in Pod's
11801209
Status.ContainerStatuses[i].AllocatedResources field.
11811210
- Resizing of Limits are verified by querying the cgroup limits of the Pod's
11821211
containers.
1212+
- Pending resizes have the corresponding condition set in the Pod Status.
1213+
Completed resizes have their resize status cleared.
1214+
1215+
##### Success test cases for Guaranteed Pods with one container
1216+
1217+
Tests: https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/test/e2e/common/node/pod_resize.go#L116-L127
11831218

1184-
E2E test cases for Guaranteed class Pod with one container:
1219+
For these tests, all pods had a restartable initContainer attached.
1220+
1221+
Resize operations performed:
11851222
1. Increase, decrease Requests & Limits for CPU only.
11861223
1. Increase, decrease Requests & Limits for memory only.
1187-
1. Increase, decrease Requests & Limits for CPU and memory.
1188-
1. Increase CPU and decrease memory.
1189-
1. Decrease CPU and increase memory.
1190-
1. Add memory request & limit for CPU only container.
1191-
1. Remove memory request & limit for CPU & memory container.
1192-
1193-
E2E test cases for Burstable class single container Pod that specifies
1194-
both CPU & memory:
1195-
1. Increase, decrease Requests - CPU only.
1196-
1. Increase, decrease Requests - memory only.
1197-
1. Increase, decrease Requests - both CPU & memory.
1198-
1. Increase, decrease Limits - CPU only.
1199-
1. Increase, decrease Limits - memory only.
1200-
1. Increase, decrease Limits - both CPU & memory.
1201-
1. Increase, decrease Requests & Limits - CPU only.
1202-
1. Increase, decrease Requests & Limits - memory only.
1203-
1. Increase, decrease Requests & Limits - both CPU and memory.
1204-
1. Increase CPU (Requests+Limits) & decrease memory(Requests+Limits).
1205-
1. Decrease CPU (Requests+Limits) & increase memory(Requests+Limits).
1206-
1. Increase CPU Requests while decreasing CPU Limits.
1207-
1. Decrease CPU Requests while increasing CPU Limits.
1208-
1. Increase memory Requests while decreasing memory Limits.
1209-
1. Decrease memory Requests while increasing memory Limits.
1210-
1. CPU: increase Requests, decrease Limits, Memory: increase Requests, decrease Limits.
1211-
1. CPU: decrease Requests, increase Limits, Memory: decrease Requests, increase Limits.
1212-
1. Set requests == limits, ensure QOS class remains Burstable
1213-
1214-
E2E tests for Burstable class single container Pod that specifies CPU only:
1215-
1. Increase, decrease CPU - Requests only.
1216-
1. Increase, decrease CPU - Limits only.
1217-
1. Increase, decrease CPU - both Requests & Limits.
1218-
1219-
E2E tests for Burstable class single container Pod that specifies memory only:
1220-
1. Increase, decrease memory - Requests only.
1221-
1. Increase, decrease memory - Limits only.
1222-
1. Increase, decrease memory - both Requests & Limits.
1223-
1224-
E2E tests for BestEffort class single container Pod:
1225-
1. Add CPU requests & limits, QOS class remains BestEffort
1226-
2. Add Memory requests & limits, QOS class remains BestEffort
1227-
1228-
E2E tests for Guaranteed class Pod with three containers (c1, c2, c3):
1229-
1. Increase CPU & memory for all three containers.
1230-
1. Decrease CPU & memory for all three containers.
1231-
1. Increase CPU, decrease memory for all three containers.
1232-
1. Decrease CPU, increase memory for all three containers.
1233-
1. Increase CPU for c1, decrease c2, c3 unchanged - no net CPU change.
1234-
1. Increase memory for c1, decrease c2, c3 unchanged - no net memory change.
1235-
1. Increase CPU for c1, decrease c2 & c3 - net CPU decrease for Pod.
1236-
1. Increase memory for c1, decrease c2 & c3 - net memory decrease for Pod.
1237-
1. Increase CPU for c1 & c3, decrease c2 - net CPU increase for Pod.
1238-
1. Increase memory for c1 & c3, decrease c2 - net memory increase for Pod.
1239-
1240-
E2E tests for sidecar containers
1241-
1. InitContainer, then sidecar - can increase & decrease CPU & memory of sidecar
1242-
2. Sidecar then InitContainer - can increase & decrease CPU & memory of sidecar
1243-
3. Resize sidecar along with container
1244-
1245-
#### CRI E2E Tests
1246-
1247-
1. E2E test is added to verify UpdateContainerResources API with containerd runtime.
1248-
1. E2E test is added to verify ContainerStatus API using containerd runtime.
1249-
1. E2E test is added to verify backward compatibility using containerd runtime.
1250-
1251-
#### Resource Quota and Limit Ranges
1252-
1253-
Setup a namespace with ResourceQuota and a single, valid Pod.
1254-
1. Resize the Pod within resource quota - CPU only.
1255-
1. Resize the Pod within resource quota - memory only.
1256-
1. Resize the Pod within resource quota - both CPU and memory.
1257-
1. Resize the Pod to exceed resource quota - CPU only.
1258-
1. Resize the Pod to exceed resource quota - memory only.
1259-
1. Resize the Pod to exceed resource quota - both CPU and memory.
1260-
1261-
Setup a namespace with min and max LimitRange and create a single, valid Pod.
1262-
1. Increase, decrease CPU within min/max bounds.
1263-
1. Increase CPU to exceed max value.
1264-
1. Decrease CPU to go below min value.
1265-
1. Increase memory to exceed max value.
1266-
1. Decrease memory to go below min value.
1267-
1268-
#### Resize Policy Tests
1269-
1270-
Setup a guaranteed class Pod with two containers (c1 & c2).
1271-
1. No resize policy specified, defaults to NotRequired. Verify that CPU and
1272-
memory are resized without restarting containers.
1273-
1. NotRequired (cpu, memory) policy for c1, RestartContainer (cpu, memory) for c2.
1274-
Verify that c1 is resized without restart, c2 is restarted on resize.
1275-
1. NotRequired cpu, RestartContainer memory policy for c1. Resize c1 CPU only,
1276-
verify container is resized without restart.
1277-
1. NotRequired cpu, RestartContainer memory policy for c1. Resize c1 memory only,
1278-
verify container is resized with restart.
1279-
1. NotRequired cpu, RestartContainer memory policy for c1. Resize c1 CPU & memory,
1280-
verify container is resized with restart.
1224+
1. Increase, decrease Requests & Limits for CPU and memory in the same direction.
1225+
1. Increase, decrease Requests & Limits for CPU and memory in opposite directions.
1226+
1227+
The following cases are tested against all the above resize operations:
1228+
1. No restart policy; no resize of init container.
1229+
1. No restart policy + resize of init container.
1230+
1. Memory restart policy; no resize of init container.
1231+
1. CPU restart policy; no resize of init container.
1232+
1. CPU + Memory restart policy; no resize of init container.
1233+
1. CPU + Memory restart policy + resize of init container.
1234+
1235+
##### Success test cases for Guaranteed Pods with multiple containers
1236+
1237+
Tests: https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/test/e2e/common/node/pod_resize.go#L130
1238+
1239+
1. 3 containers - increase cpu & mem on c1, c2, decrease cpu & mem on c3 - net increase
1240+
1. 3 containers - increase cpu & mem on c1, decrease cpu & mem on c2, c3 - net decrease
1241+
1. 3 containers - increase: CPU (c1,c3), memory (c2, c3) ; decrease: CPU (c2)
1242+
1243+
##### Success test cases for Burstable Pods with one container
1244+
1245+
Tests: https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/test/e2e/common/node/pod_resize.go#L208-L220
1246+
1247+
For these tests, there were no initContainers (since that is covered by the Guaranteed Pods cases).
1248+
1249+
Resize operations performed:
1250+
1. Increase, decrease CPU Requests
1251+
1. Increase, decrease CPU Limits
1252+
1. Increase, decrease memory Requests
1253+
1. Increase, decrease memory Limits
1254+
1. Increase, decrease CPU & memory Requests and Limits in the same direction
1255+
1. Increase, decrease CPU and memory in opposite directions
1256+
1. Increase, decrease Requests & Limits in opposite directions
1257+
1258+
The following cases are tested against all the above resize operations:
1259+
1. No restart policy
1260+
1. Memory restart policy
1261+
1. CPU restart policy
1262+
1. CPU + Memory restart policy
1263+
1264+
##### Other success test cases for Burstable Pods
1265+
1266+
Tests: https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/test/e2e/common/node/pod_resize.go#L228
1267+
1268+
1. 6 containers - various operations performed (including adding limits and requests)
1269+
1. Resizing with equivalents (e.g. 2m -> 1m)
1270+
1271+
##### Memory limit decrease
1272+
1273+
Test: https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/test/e2e/common/node/pod_resize.go#L548
1274+
1275+
This test covers that memory limits can be decreased, but not below the current usage.
1276+
1277+
##### Patch error tests
1278+
1279+
Tests: https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/test/e2e/common/node/pod_resize.go#L307
1280+
1281+
These tests cover that the following attempts to patch a pod for resize will be rejected by the API server:
1282+
1. Best Effort pod - request memory
1283+
1. Best Effort pod - request CPU
1284+
1. Guaranteed pod - remove cpu & memory limits
1285+
1. Burstable pod - remove cpu & memory limits + increase requests
1286+
1. Burstable pod - remove memory requests
1287+
1. Burstable pod - remove cpu requests
1288+
1. Burstable pod - reorder containers
1289+
1. Guaranteed pod - rename containers
1290+
1. Burstable pod - set requests == limits
1291+
1. Burstable pod - resize ephemeral storage
1292+
1. Burstable pod - nonrestartable initContainer
1293+
1294+
##### Scheduler logic tests
1295+
1296+
Tests: https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/test/e2e/node/pod_resize.go#L494
1297+
1298+
These tests cover the scheduler logic with respect to in-place pod resize and the defered / infeasible
1299+
conditions. The flow of this test is:
1300+
1. Create pod1 and pod2 on node such that pod1 has enough CPU to be scheduled, but pod2 does not.
1301+
1. Resize pod2 down so that it fits on the node and can be scheduled.
1302+
1. Verify that pod2 gets scheduled and comes up and running.
1303+
1. Create pod3 that requests more CPU than available, verify that it is pending.
1304+
1. Resize pod1 down so that pod3 gets room to be scheduled.
1305+
1. Verify that pod3 is scheduled and running.
1306+
1. attempt to scale up pod1 to requests more CPU than available, verify the resize is deferred.
1307+
1. Delete pod2 + pod3 to make room for pod3.
1308+
1. Verify that pod1 resize has completed.
1309+
1. Attempt to scale up pod1 to request more cpu than the node has, verify the resize is infeasible.
1310+
1311+
##### Retry of deferred resizes
1312+
1313+
Tests: https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/test/e2e/node/pod_resize.go#L690
1314+
1315+
These tests cover the logic for retrying deferred resizes in the following cases:
1316+
1. Deferred resizes succeed after the scale down of another pod. (Deletion case is covered in the previous tests).
1317+
1. Deferred resizes are attempted according to the desired priority.
1318+
1. Place 4 pods on the node; delete the first one and verify the chain reaction of deferred resizes succeeding. The
1319+
resources are carefully chosen such that
1320+
- deletion of pod1 should make room for pod2's resize (but not pod3 or pod4).
1321+
- pod2's resize should make room for pod3's resize (but not pod4).
1322+
- pod3's resize should make room for pod4's resize.
1323+
1324+
##### Resource Quota tests
1325+
1326+
Tests: https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/test/e2e/node/pod_resize.go#L47
1327+
1328+
1. Exceed max CPU
1329+
1. Exceed max memory
1330+
1. Exceed max CPU and memory
1331+
1. Valid increase of CPU
1332+
1. Valid increase of memory
1333+
1. Valid increase of CPU and memory
1334+
1335+
##### Limit Ranger tests
1336+
1337+
Tests: https://github.com/kubernetes/kubernetes/blob/ad82c3d39f5e9f21e173ffeb8aa57953a0da4601/test/e2e/node/pod_resize.go#L218
1338+
1339+
1. Exceed max CPU
1340+
1. Exceed max memory
1341+
1. Exceed max CPU and memory
1342+
1. Valid increase of CPU
1343+
1. Valid increase of memory
1344+
1. Valid increase of CPU and memory
1345+
1. Go below min CPU
1346+
1. Go below min memory
1347+
1. Go below min CPU and memory
1348+
1. Valid decrease of CPU
1349+
1. Valid decrease of memory
1350+
1. Valid decrease of CPU and memory
1351+
1352+
##### Coverage of the READ and REPLACE endpoints
1353+
1354+
The previous tests are planned to use the PATCH endpoint, but we also need coverage of READ and REPLACE endpoints.
1355+
A basic test will be added that uses REPLACE to perform a resize, and the READ endpoint to verify the result.
12811356

12821357
#### Backward Compatibility and Negative Tests
12831358

@@ -1292,7 +1367,6 @@ Setup a guaranteed class Pod with two containers (c1 & c2).
12921367
values of AllocatedResources and ResizePolicy fields being dropped.
12931368
1. Verify that only CPU and memory resources are mutable by user.
12941369

1295-
TODO: Identify more cases
12961370

12971371
### Graduation Criteria
12981372

@@ -1670,12 +1744,14 @@ _This section must be completed when targeting beta graduation to a release._
16701744
- Add instrumentation section
16711745
- Priority of resize requests
16721746
- 2025-09-22 - Correct KEP details to match actual implementation
1673-
- revert PreferNoRestart resize policy back to NotRequired
1674-
- add more details about the resize status
1675-
- document kubelet-triggered eviction for critical pods
1676-
- update outdated notes regarding static CPU
1677-
- correct details about instrumentation
1747+
- revert PreferNoRestart resize policy back to NotRequired
1748+
- add more details about the resize status
1749+
- document kubelet-triggered eviction for critical pods
1750+
- update outdated notes regarding static CPU
1751+
- correct details about instrumentation
16781752
- 2025-09-22 - Update in-place pod resize for GA
1753+
- Update test plan
1754+
16791755

16801756
## Drawbacks
16811757

0 commit comments

Comments
 (0)