Skip to content

Commit 1d19fe2

Browse files
mimowotengqmsftim
committed
Review remarks
Co-authored-by: Qiming Teng <tengqm@outlook.com> Co-authored-by: Tim Bannister <22591623+sftim@users.noreply.github.com>
1 parent 0d500ee commit 1d19fe2

File tree

3 files changed

+40
-34
lines changed

3 files changed

+40
-34
lines changed

content/en/docs/concepts/workloads/controllers/job.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -383,6 +383,8 @@ from failed Jobs is not lost inadvertently.
383383

384384
### Backoff limit per index {#backoff-limit-per-index}
385385

386+
{{< feature-state feature_gate_name="JobBackoffLimitPerIndex" >}}
387+
386388
When you run an [indexed](#completion-mode) Job, you can choose to handle retries
387389
for pod failures independently for each index. To do so, set the
388390
`.spec.backoffLimitPerIndex` to specify the maximal number of pod failures

content/en/docs/tasks/job/pod-failure-policy.md

Lines changed: 36 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -244,49 +244,53 @@ The cluster automatically cleans up the Pods.
244244

245245
## Using Pod Failure Policy to avoid unnecessary Pod retries per index
246246

247-
With the following example, you can learn how to use Pod failure policy and
248-
Backoff limit per index features to avoid unnecessary Pod restarts per index.
247+
To avoid unnecessary Pod restarts per index, you can use the _Pod failure policy_ and
248+
_backoff limit per index_ features. This section of the page shows how to use these features
249+
together.
249250

250-
1. First, create a Job based on the config:
251+
1. Save the following manifest as `job-backoff-limit-per-index-failindex.yaml`:
252+
253+
{{% code_sample file="/controllers/job-backoff-limit-per-index-failindex.yaml" %}}
254+
255+
1. Apply the manifest:
251256

252-
{{% code_sample file="/controllers/job-backoff-limit-per-index-failindex.yaml" %}}
257+
```sh
258+
kubectl create -f job-backoff-limit-per-index-failindex.yaml
259+
```
253260

254-
by running:
261+
1. After around 15 seconds, inspect the status of the Pods for the Job. You can do that by running:
255262

256-
```sh
257-
kubectl create -f job-backoff-limit-per-index-failindex.yaml
258-
```
263+
```shell
264+
kubectl get pods -l job-name=job-backoff-limit-per-index-failindex -o yaml
265+
```
259266

260-
2. After a while inspect the status of the job's Pods by running:
267+
You will see output similar to this:
261268

262-
```sh
263-
kubectl get pods -l job-name=job-backoff-limit-per-index-failindex -o yaml
264-
```
269+
```none
270+
NAME READY STATUS RESTARTS AGE
271+
job-backoff-limit-per-index-failindex-0-4g4cm 0/1 Error 0 4s
272+
job-backoff-limit-per-index-failindex-0-fkdzq 0/1 Error 0 15s
273+
job-backoff-limit-per-index-failindex-1-2bgdj 0/1 Error 0 15s
274+
job-backoff-limit-per-index-failindex-2-vs6lt 0/1 Completed 0 11s
275+
job-backoff-limit-per-index-failindex-3-s7s47 0/1 Completed 0 6s
276+
```
265277

266-
You will see output similar to this:
267-
```yaml
268-
NAME READY STATUS RESTARTS AGE
269-
job-backoff-limit-per-index-failindex-0-4g4cm 0/1 Error 0 4s
270-
job-backoff-limit-per-index-failindex-0-fkdzq 0/1 Error 0 15s
271-
job-backoff-limit-per-index-failindex-1-2bgdj 0/1 Error 0 15s
272-
job-backoff-limit-per-index-failindex-2-vs6lt 0/1 Completed 0 11s
273-
job-backoff-limit-per-index-failindex-3-s7s47 0/1 Completed 0 6s
274-
```
278+
Note that the output shows the following:
275279

276-
Note that there are two Pods with index 0, because the backoff limit allowed
277-
for one retry of the index. At the same time, there is only one Pod with index
278-
1, because the exit code of the failed Pod matched the Pod failure policy with
279-
the FailIndex action.
280+
* Two Pods have index 0, because of the backoff limit allowed for one retry
281+
of the index.
282+
* Only one Pod has index 1, because the exit code of the failed Pod matched
283+
the Pod failure policy with the `FailIndex` action.
280284

281-
3. Inspect the status of the Job by running:
285+
1. Inspect the status of the Job by running:
282286

283-
```sh
284-
kubectl get jobs -l job-name=job-backoff-limit-per-index-failindex -o yaml
285-
```
287+
```sh
288+
kubectl get jobs -l job-name=job-backoff-limit-per-index-failindex -o yaml
289+
```
286290

287-
In the Job status, see the `failedIndexes` field shows "0,1", because both
288-
indexes failed. Since the index 1 was not retried the number of failed Pods,
289-
indicated by the status field "failed" equals 3.
291+
In the Job status, see that the `failedIndexes` field shows "0,1", because
292+
both indexes failed. Because the index 1 was not retried the number of failed
293+
Pods, indicated by the status field "failed" equals 3.
290294

291295
### Cleaning up
292296

content/en/examples/controllers/job-backoff-limit-per-index-failindex.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@ spec:
1212
restartPolicy: Never
1313
containers:
1414
- name: main
15-
image: python
15+
image: docker.io/library/python:3
1616
command:
1717
# The script:
18-
# - fails the Pod with index 0 with exit code 1 which result in retry,
18+
# - fails the Pod with index 0 with exit code 1, which results in one retry;
1919
# - fails the Pod with index 1 with exit code 42 which results
2020
# in failing the index without retry.
2121
# - succeeds Pods with any other index.

0 commit comments

Comments
 (0)