Skip to content

Commit 31e1403

Browse files
committed
Remarks
1 parent 295014f commit 31e1403

File tree

1 file changed

+135
-5
lines changed
  • keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs

1 file changed

+135
-5
lines changed

keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs/README.md

Lines changed: 135 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -724,8 +724,9 @@ in back-to-back releases.
724724
#### Beta
725725

726726
- Address reviews and bug reports from Alpha users
727-
- Propose and implement metrics
727+
- Implement the `job_finished_indexes_total` metric
728728
- E2e tests are in Testgrid and linked in KEP
729+
- Move the [new reason declarations](https://github.com/kubernetes/kubernetes/blob/dc28eeaa3a6e18ef683f4b2379234c2284d5577e/pkg/controller/job/job_controller.go#L82-L89) from Job controller to the API package
729730
- Evaluate performance of Job controller for jobs using backoff limit per index
730731
with benchmarks at the integration or e2e level (discussion pointers from Alpha
731732
review: [thread1](https://github.com/kubernetes/kubernetes/pull/118009#discussion_r1261694406) and [thread2](https://github.com/kubernetes/kubernetes/pull/118009#discussion_r1263862076))
@@ -756,6 +757,9 @@ A downgrade to a version which does not support this feature should not require
756757
any additional configuration changes. Jobs which specified
757758
`.spec.backoffLimitPerIndex` (to make use of this feature) will be
758759
handled in a default way, ie. using the `.spec.backoffLimit`.
760+
However, since the `.spec.backoffLimit` defaults to max int32 value
761+
(see [here](#job-api)) is might require a manual setting of the `.spec.backoffLimit`
762+
to ensure failed pods are not retried indefinitely.
759763

760764
<!--
761765
If applicable, how will the component be upgraded and downgraded? Make sure
@@ -876,7 +880,8 @@ The Job controller starts to handle pod failures according to the specified
876880

877881
###### Are there any tests for feature enablement/disablement?
878882

879-
No. The tests will be added in Alpha.
883+
Yes, there is an [integration test](https://github.com/kubernetes/kubernetes/blob/dc28eeaa3a6e18ef683f4b2379234c2284d5577e/test/integration/job/job_test.go#L763)
884+
which tests the following path: enablement -> disablement -> re-enablement.
880885

881886
<!--
882887
The e2e framework does not currently support enabling or disabling feature
@@ -899,7 +904,16 @@ This section must be completed when targeting beta to a release.
899904

900905
###### How can a rollout or rollback fail? Can it impact already running workloads?
901906

902-
The change is opt-in, it doesn't impact already running workloads.
907+
This change does not impact how the rollout or rollback fail.
908+
909+
The change is opt-in, thus a rollout doesn't impact already running pods.
910+
911+
The rollback might affect how pod failures are handled, since they will
912+
be counted only against `.spec.backoffLimit`, which is defaulted to max int32
913+
value, when using `.spec.backoffLimitPerIndex` (see [here](#job-api)).
914+
Thus, similarly as in case of a downgrade (see [here](#downgrade))
915+
it might be required to manually set `spec.backoffLimit` to ensure failed pods
916+
are not retried indefinitely.
903917

904918
<!--
905919
Try to be as paranoid as possible - e.g., what if some components will restart
@@ -932,7 +946,109 @@ that might indicate a serious problem?
932946

933947
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
934948

935-
It will be tested manually prior to beta launch.
949+
The Upgrade->downgrade->upgrade testing was done manually using the `alpha`
950+
version in 1.28 with the following steps:
951+
952+
1. Start the cluster with the `JobBackoffLimitPerIndex` enabled:
953+
```sh
954+
kind create cluster --name per-index --image kindest/node:v1.28.0 --config config.yaml
955+
```
956+
using `config.yaml`:
957+
```yaml
958+
kind: Cluster
959+
apiVersion: kind.x-k8s.io/v1alpha4
960+
featureGates:
961+
"JobBackoffLimitPerIndex": true
962+
nodes:
963+
- role: control-plane
964+
- role: worker
965+
```
966+
967+
Then, create the job using `.spec.backoffLimitPerIndex=1`:
968+
969+
```sh
970+
kubectl create -f job.yaml
971+
```
972+
using `job.yaml`:
973+
```yaml
974+
apiVersion: batch/v1
975+
kind: Job
976+
metadata:
977+
name: job-longrun
978+
spec:
979+
parallelism: 3
980+
completions: 3
981+
completionMode: Indexed
982+
backoffLimitPerIndex: 1
983+
template:
984+
spec:
985+
restartPolicy: Never
986+
containers:
987+
- name: sleep
988+
image: busybox:1.36.1
989+
command: ["sleep"]
990+
args: ["1800"] # 30min
991+
imagePullPolicy: IfNotPresent
992+
```
993+
994+
Await for the pods to be running and delete 0-indexed pod:
995+
```sh
996+
kubectl delete pods -l job-name=job-longrun -l batch.kubernetes.io/job-completion-index=0 --grace-period=1
997+
```
998+
Await for the replacement pod to be created and repeat the deletion.
999+
1000+
Check job status and confirm `.status.failedIndexes="0"`
1001+
1002+
```sh
1003+
kubectl get jobs -ljob-name=job-longrun -oyaml
1004+
```
1005+
Also, notice that `.status.active=2`, because the pod for a failed index is not
1006+
re-created.
1007+
1008+
2. Simulate downgrade by disabling the feature for api server and control-plane:
1009+
1010+
```sh
1011+
docker exec -it per-index-control-plane sed -i 's/JobBackoffLimitPerIndex=true/JobBackoffLimitPerIndex=false/' /etc/kubernetes/manifests/kube-controller-manager.yaml
1012+
docker exec -it per-index-control-plane sed -i 's/JobBackoffLimitPerIndex=true/JobBackoffLimitPerIndex=false/' /etc/kubernetes/manifests/kube-apiserver.yaml
1013+
```
1014+
Await for the control-plane and apiserver pods to be restarted.
1015+
1016+
Verify that 3 pods are running again, and the `.status.failedIndexes` is gone by
1017+
```sh
1018+
kubectl get jobs -ljob-name=job-longrun -oyaml
1019+
```
1020+
this will produce output similar to:
1021+
```yaml
1022+
...
1023+
status:
1024+
active: 3
1025+
failed: 2
1026+
ready: 2
1027+
startTime: "2023-09-29T14:00:28Z"
1028+
uncountedTerminatedPods: {}
1029+
```
1030+
1031+
3. Simulate upgrade by re-enabling the feature for api server and control-plane:
1032+
```sh
1033+
docker exec -it per-index-control-plane sed -i 's/JobBackoffLimitPerIndex=false/JobBackoffLimitPerIndex=true/' /etc/kubernetes/manifests/kube-controller-manager.yaml
1034+
docker exec -it per-index-control-plane sed -i 's/JobBackoffLimitPerIndex=false/JobBackoffLimitPerIndex=true/' /etc/kubernetes/manifests/kube-apiserver.yaml
1035+
```
1036+
1037+
Await for the pods to be running and delete 1-indexed pod:
1038+
```sh
1039+
kubectl delete pods -l job-name=job-longrun -l batch.kubernetes.io/job-completion-index=1 --grace-period=1
1040+
```
1041+
Await for the replacement pod to be created and repeat the deletion.
1042+
1043+
Check job status and confirm `.status.failedIndexes="1"`
1044+
1045+
```sh
1046+
kubectl get jobs -ljob-name=job-longrun -oyaml
1047+
```
1048+
Also, notice that `.status.active=2`, because the pod for a failed index is not
1049+
re-created.
1050+
1051+
This demonstrates that the feature is working again for the job.
9361052

9371053
<!--
9381054
Describe manual testing that was done and the outcomes.
@@ -1023,7 +1139,7 @@ are marked failed,
10231139

10241140
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
10251141

1026-
For Beta we will consider introduction of a new metric `job_finished_indexes_total`
1142+
For Beta we will introduce of a new metric `job_finished_indexes_total`
10271143
with labels `status=(failed|succeeded)`, and `backoffLimit=(perIndex|global)`.
10281144
It will count the number of failed and succeeded indexes across jobs using
10291145
`backoffLimitPerIndex`, or regular Indexed Jobs (using only `.spec.backoffLimit`).
@@ -1169,6 +1285,20 @@ This through this both in small and large cases, again with respect to the
11691285
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
11701286
-->
11711287

1288+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
1289+
1290+
No. This feature does not introduce any resource exhaustive operations.
1291+
1292+
<!--
1293+
Focus not just on happy cases, but primarily on more pathological cases
1294+
(e.g. probes taking a minute instead of milliseconds, failed pods consuming resources, etc.).
1295+
If any of the resources can be exhausted, how this is mitigated with the existing limits
1296+
(e.g. pods per node) or new limits added by this KEP?
1297+
1298+
Are there any tests that were run/should be run to understand performance characteristics better
1299+
and validate the declared limits?
1300+
-->
1301+
11721302
### Troubleshooting
11731303

11741304
<!--

0 commit comments

Comments
 (0)