Skip to content

Commit 0d500ee

Browse files
committed
Add an example for PodFailurePolicy with FailIndex
1 parent 4fe1751 commit 0d500ee

File tree

2 files changed

+96
-0
lines changed

2 files changed

+96
-0
lines changed

content/en/docs/tasks/job/pod-failure-policy.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,62 @@ kubectl delete jobs/job-pod-failure-policy-config-issue
242242

243243
The cluster automatically cleans up the Pods.
244244

245+
## Using Pod Failure Policy to avoid unnecessary Pod retries per index
246+
247+
With the following example, you can learn how to use Pod failure policy and
248+
Backoff limit per index features to avoid unnecessary Pod restarts per index.
249+
250+
1. First, create a Job based on the config:
251+
252+
{{% code_sample file="/controllers/job-backoff-limit-per-index-failindex.yaml" %}}
253+
254+
by running:
255+
256+
```sh
257+
kubectl create -f job-backoff-limit-per-index-failindex.yaml
258+
```
259+
260+
2. After a while inspect the status of the job's Pods by running:
261+
262+
```sh
263+
kubectl get pods -l job-name=job-backoff-limit-per-index-failindex -o yaml
264+
```
265+
266+
You will see output similar to this:
267+
```yaml
268+
NAME READY STATUS RESTARTS AGE
269+
job-backoff-limit-per-index-failindex-0-4g4cm 0/1 Error 0 4s
270+
job-backoff-limit-per-index-failindex-0-fkdzq 0/1 Error 0 15s
271+
job-backoff-limit-per-index-failindex-1-2bgdj 0/1 Error 0 15s
272+
job-backoff-limit-per-index-failindex-2-vs6lt 0/1 Completed 0 11s
273+
job-backoff-limit-per-index-failindex-3-s7s47 0/1 Completed 0 6s
274+
```
275+
276+
Note that there are two Pods with index 0, because the backoff limit allowed
277+
for one retry of the index. At the same time, there is only one Pod with index
278+
1, because the exit code of the failed Pod matched the Pod failure policy with
279+
the FailIndex action.
280+
281+
3. Inspect the status of the Job by running:
282+
283+
```sh
284+
kubectl get jobs -l job-name=job-backoff-limit-per-index-failindex -o yaml
285+
```
286+
287+
In the Job status, see the `failedIndexes` field shows "0,1", because both
288+
indexes failed. Since the index 1 was not retried the number of failed Pods,
289+
indicated by the status field "failed" equals 3.
290+
291+
### Cleaning up
292+
293+
Delete the Job you created:
294+
295+
```sh
296+
kubectl delete jobs/job-backoff-limit-per-index-failindex
297+
```
298+
299+
The cluster automatically cleans up the Pods.
300+
245301
## Alternatives
246302

247303
You could rely solely on the
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
apiVersion: batch/v1
2+
kind: Job
3+
metadata:
4+
name: job-backoff-limit-per-index-failindex
5+
spec:
6+
completions: 4
7+
parallelism: 2
8+
completionMode: Indexed
9+
backoffLimitPerIndex: 1
10+
template:
11+
spec:
12+
restartPolicy: Never
13+
containers:
14+
- name: main
15+
image: python
16+
command:
17+
# The script:
18+
# - fails the Pod with index 0 with exit code 1 which result in retry,
19+
# - fails the Pod with index 1 with exit code 42 which results
20+
# in failing the index without retry.
21+
# - succeeds Pods with any other index.
22+
- python3
23+
- -c
24+
- |
25+
import os, sys
26+
index = int(os.environ.get("JOB_COMPLETION_INDEX"))
27+
if index == 0:
28+
sys.exit(1)
29+
elif index == 1:
30+
sys.exit(42)
31+
else:
32+
sys.exit(0)
33+
backoffLimit: 6
34+
podFailurePolicy:
35+
rules:
36+
- action: FailIndex
37+
onExitCodes:
38+
containerName: main
39+
operator: In
40+
values: [42]

0 commit comments

Comments
 (0)