Skip to content

Commit

Permalink
TEP-0121: Implement Retries in TaskRun
Browse files Browse the repository at this point in the history
This commit implements `Retries` in `TaskRun`, and removes the
logic that PipelineRun controller relies on `RetriesStatus` to
determine the termination of a TaskRun or CustomRun/Run.

Key Changes:
- New `Retries` field in both `v1beta1.TaskRun` and `v1.TaskRun`
- Archive retry attempt history in `RetriesStatus` for a failed
  `TaskRun`, before sending kubernetes and cloud events before a
  reconcile loop ends.
- Unit Tests to test the `TaskRun` object changes, especially the
  changes on `status.conditions` and `status.retriesStatus` after
  being reconciled once (one reconcile loop).
  • Loading branch information
XinruZhang committed Nov 30, 2022
1 parent 83af167 commit 415cccc
Show file tree
Hide file tree
Showing 26 changed files with 3,332 additions and 1,840 deletions.
62 changes: 55 additions & 7 deletions docs/pipeline-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -1148,6 +1148,18 @@ TaskRunSpecStatusMessage
</tr>
<tr>
<td>
<code>retries</code><br/>
<em>
int
</em>
</td>
<td>
<em>(Optional)</em>
<p>Retries represents how many times this taskRun should be retried in case of task failure.</p>
</td>
</tr>
<tr>
<td>
<code>timeout</code><br/>
<em>
<a href="https://godoc.org/k8s.io/apimachinery/pkg/apis/meta/v1#Duration">
Expand All @@ -1157,7 +1169,7 @@ Kubernetes meta/v1.Duration
</td>
<td>
<em>(Optional)</em>
<p>Time after which the build times out. Defaults to 1 hour.
<p>Time after which one retry attempt times out. Defaults to 1 hour.
Specified build timeout should be less than 24h.
Refer Go&rsquo;s ParseDuration documentation for expected format: <a href="https://golang.org/pkg/time/#ParseDuration">https://golang.org/pkg/time/#ParseDuration</a></p>
</td>
Expand Down Expand Up @@ -4781,6 +4793,18 @@ TaskRunSpecStatusMessage
</tr>
<tr>
<td>
<code>retries</code><br/>
<em>
int
</em>
</td>
<td>
<em>(Optional)</em>
<p>Retries represents how many times this taskRun should be retried in case of task failure.</p>
</td>
</tr>
<tr>
<td>
<code>timeout</code><br/>
<em>
<a href="https://godoc.org/k8s.io/apimachinery/pkg/apis/meta/v1#Duration">
Expand All @@ -4790,7 +4814,7 @@ Kubernetes meta/v1.Duration
</td>
<td>
<em>(Optional)</em>
<p>Time after which the build times out. Defaults to 1 hour.
<p>Time after which one retry attempt times out. Defaults to 1 hour.
Specified build timeout should be less than 24h.
Refer Go&rsquo;s ParseDuration documentation for expected format: <a href="https://golang.org/pkg/time/#ParseDuration">https://golang.org/pkg/time/#ParseDuration</a></p>
</td>
Expand Down Expand Up @@ -7887,7 +7911,7 @@ TaskRunSpecStatus
</td>
<td>
<em>(Optional)</em>
<p>Used for cancelling a taskrun (and maybe more later on)</p>
<p>Used for cancelling a TaskRun (and maybe more later on)</p>
</td>
</tr>
<tr>
Expand All @@ -7906,6 +7930,18 @@ TaskRunSpecStatusMessage
</tr>
<tr>
<td>
<code>retries</code><br/>
<em>
int
</em>
</td>
<td>
<em>(Optional)</em>
<p>Retries represents how many times this TaskRun should be retried in case of Task failure.</p>
</td>
</tr>
<tr>
<td>
<code>timeout</code><br/>
<em>
<a href="https://godoc.org/k8s.io/apimachinery/pkg/apis/meta/v1#Duration">
Expand All @@ -7915,7 +7951,7 @@ Kubernetes meta/v1.Duration
</td>
<td>
<em>(Optional)</em>
<p>Time after which the build times out. Defaults to 1 hour.
<p>Time after which one retry attempt times out. Defaults to 1 hour.
Specified build timeout should be less than 24h.
Refer Go&rsquo;s ParseDuration documentation for expected format: <a href="https://golang.org/pkg/time/#ParseDuration">https://golang.org/pkg/time/#ParseDuration</a></p>
</td>
Expand Down Expand Up @@ -12811,7 +12847,7 @@ TaskRunSpecStatus
</td>
<td>
<em>(Optional)</em>
<p>Used for cancelling a taskrun (and maybe more later on)</p>
<p>Used for cancelling a TaskRun (and maybe more later on)</p>
</td>
</tr>
<tr>
Expand All @@ -12830,6 +12866,18 @@ TaskRunSpecStatusMessage
</tr>
<tr>
<td>
<code>retries</code><br/>
<em>
int
</em>
</td>
<td>
<em>(Optional)</em>
<p>Retries represents how many times this TaskRun should be retried in case of Task failure.</p>
</td>
</tr>
<tr>
<td>
<code>timeout</code><br/>
<em>
<a href="https://godoc.org/k8s.io/apimachinery/pkg/apis/meta/v1#Duration">
Expand All @@ -12839,7 +12887,7 @@ Kubernetes meta/v1.Duration
</td>
<td>
<em>(Optional)</em>
<p>Time after which the build times out. Defaults to 1 hour.
<p>Time after which one retry attempt times out. Defaults to 1 hour.
Specified build timeout should be less than 24h.
Refer Go&rsquo;s ParseDuration documentation for expected format: <a href="https://golang.org/pkg/time/#ParseDuration">https://golang.org/pkg/time/#ParseDuration</a></p>
</td>
Expand Down Expand Up @@ -12926,7 +12974,7 @@ Kubernetes core/v1.ResourceRequirements
(<em>Appears on:</em><a href="#tekton.dev/v1beta1.TaskRunSpec">TaskRunSpec</a>)
</p>
<div>
<p>TaskRunSpecStatus defines the taskrun spec status the user can provide</p>
<p>TaskRunSpecStatus defines the TaskRun spec status the user can provide</p>
</div>
<h3 id="tekton.dev/v1beta1.TaskRunSpecStatusMessage">TaskRunSpecStatusMessage
(<code>string</code> alias)</h3>
Expand Down
21 changes: 18 additions & 3 deletions docs/taskruns.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ weight: 300
- [Specifying `Sidecars`](#specifying-sidecars)
- [Overriding `Task` `Steps` and `Sidecars`](#overriding-task-steps-and-sidecars)
- [Specifying `LimitRange` values](#specifying-limitrange-values)
- [Specifying `Retries`](#specifying-retries)
- [Configuring the failure timeout](#configuring-the-failure-timeout)
- [Specifying `ServiceAccount` credentials](#specifying-serviceaccount-credentials)
- [Monitoring execution status](#monitoring-execution-status)
Expand Down Expand Up @@ -698,11 +699,25 @@ object(s), if present. Any `Request` or `Limit` specified by the user (on `Task`

For more information, see the [`LimitRange` support in Pipeline](./compute-resources.md#limitrange-support).

### Specifying `Retries`
You can use the `retries` field to set how many times you want to retry on a failed TaskRun.
All TaskRun failures are retriable except for `Cancellation`.

For a retriable `TaskRun`, when an error occurs:
- The error status is archived in `.status.RetriesStatus`
- The `Succeed` condition in `.status` is updated:
```
Type: Succeed
Status: Unknown
Reason: CompleteWithRetries
```
- `status.StartTime` and `status.PodName` are unset to trigger another retry attempt.

### Configuring the failure timeout

You can use the `timeout` field to set the `TaskRun's` desired timeout value. If you do not specify this
value for the `TaskRun`, the global default timeout value applies. If you set the timeout to 0, the `TaskRun` will
have no timeout and will run until it completes successfully or fails from an error.
You can use the `timeout` field to set the `TaskRun's` desired timeout value for each retry attempt. If you do
not specify this value for the `TaskRun`, the global default timeout value applies. If you set the timeout to 0,
the `TaskRun` will have no timeout and will run until it completes successfully or fails from an error.

The global default timeout is set to 60 minutes when you first install Tekton. You can set
a different global default timeout value using the `default-timeout-minutes` field in
Expand Down
9 changes: 8 additions & 1 deletion pkg/apis/pipeline/v1/openapi_generated.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 6 additions & 1 deletion pkg/apis/pipeline/v1/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -1787,6 +1787,11 @@
"description": "PodTemplate holds pod specific configuration",
"$ref": "#/definitions/pod.Template"
},
"retries": {
"description": "Retries represents how many times this taskRun should be retried in case of task failure.",
"type": "integer",
"format": "int32"
},
"serviceAccountName": {
"type": "string",
"default": ""
Expand Down Expand Up @@ -1825,7 +1830,7 @@
"$ref": "#/definitions/v1.TaskSpec"
},
"timeout": {
"description": "Time after which the build times out. Defaults to 1 hour. Specified build timeout should be less than 24h. Refer Go's ParseDuration documentation for expected format: https://golang.org/pkg/time/#ParseDuration",
"description": "Time after which one retry attempt times out. Defaults to 1 hour. Specified build timeout should be less than 24h. Refer Go's ParseDuration documentation for expected format: https://golang.org/pkg/time/#ParseDuration",
"$ref": "#/definitions/v1.Duration"
},
"workspaces": {
Expand Down
5 changes: 4 additions & 1 deletion pkg/apis/pipeline/v1/taskrun_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,10 @@ type TaskRunSpec struct {
// Status message for cancellation.
// +optional
StatusMessage TaskRunSpecStatusMessage `json:"statusMessage,omitempty"`
// Time after which the build times out. Defaults to 1 hour.
// Retries represents how many times this taskRun should be retried in case of task failure.
// +optional
Retries int `json:"retries,omitempty"`
// Time after which one retry attempt times out. Defaults to 1 hour.
// Specified build timeout should be less than 24h.
// Refer Go's ParseDuration documentation for expected format: https://golang.org/pkg/time/#ParseDuration
// +optional
Expand Down
11 changes: 9 additions & 2 deletions pkg/apis/pipeline/v1beta1/openapi_generated.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 7 additions & 2 deletions pkg/apis/pipeline/v1beta1/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -2793,6 +2793,11 @@
"resources": {
"$ref": "#/definitions/v1beta1.TaskRunResources"
},
"retries": {
"description": "Retries represents how many times this TaskRun should be retried in case of Task failure.",
"type": "integer",
"format": "int32"
},
"serviceAccountName": {
"type": "string",
"default": ""
Expand All @@ -2807,7 +2812,7 @@
"x-kubernetes-list-type": "atomic"
},
"status": {
"description": "Used for cancelling a taskrun (and maybe more later on)",
"description": "Used for cancelling a TaskRun (and maybe more later on)",
"type": "string"
},
"statusMessage": {
Expand All @@ -2831,7 +2836,7 @@
"$ref": "#/definitions/v1beta1.TaskSpec"
},
"timeout": {
"description": "Time after which the build times out. Defaults to 1 hour. Specified build timeout should be less than 24h. Refer Go's ParseDuration documentation for expected format: https://golang.org/pkg/time/#ParseDuration",
"description": "Time after which one retry attempt times out. Defaults to 1 hour. Specified build timeout should be less than 24h. Refer Go's ParseDuration documentation for expected format: https://golang.org/pkg/time/#ParseDuration",
"$ref": "#/definitions/v1.Duration"
},
"workspaces": {
Expand Down
2 changes: 2 additions & 0 deletions pkg/apis/pipeline/v1beta1/taskrun_conversion.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ func (trs *TaskRunSpec) ConvertTo(ctx context.Context, sink *v1.TaskRunSpec) err
}
sink.Status = v1.TaskRunSpecStatus(trs.Status)
sink.StatusMessage = v1.TaskRunSpecStatusMessage(trs.StatusMessage)
sink.Retries = trs.Retries
sink.Timeout = trs.Timeout
sink.PodTemplate = trs.PodTemplate
sink.Workspaces = nil
Expand Down Expand Up @@ -141,6 +142,7 @@ func (trs *TaskRunSpec) ConvertFrom(ctx context.Context, source *v1.TaskRunSpec)
}
trs.Status = TaskRunSpecStatus(source.Status)
trs.StatusMessage = TaskRunSpecStatusMessage(source.StatusMessage)
trs.Retries = source.Retries
trs.Timeout = source.Timeout
trs.PodTemplate = source.PodTemplate
trs.Workspaces = nil
Expand Down
24 changes: 17 additions & 7 deletions pkg/apis/pipeline/v1beta1/taskrun_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,16 @@ type TaskRunSpec struct {
TaskRef *TaskRef `json:"taskRef,omitempty"`
// +optional
TaskSpec *TaskSpec `json:"taskSpec,omitempty"`
// Used for cancelling a taskrun (and maybe more later on)
// Used for cancelling a TaskRun (and maybe more later on)
// +optional
Status TaskRunSpecStatus `json:"status,omitempty"`
// Status message for cancellation.
// +optional
StatusMessage TaskRunSpecStatusMessage `json:"statusMessage,omitempty"`
// Time after which the build times out. Defaults to 1 hour.
// Retries represents how many times this TaskRun should be retried in case of Task failure.
// +optional
Retries int `json:"retries,omitempty"`
// Time after which one retry attempt times out. Defaults to 1 hour.
// Specified build timeout should be less than 24h.
// Refer Go's ParseDuration documentation for expected format: https://golang.org/pkg/time/#ParseDuration
// +optional
Expand Down Expand Up @@ -85,7 +88,7 @@ type TaskRunSpec struct {
ComputeResources *corev1.ResourceRequirements `json:"computeResources,omitempty"`
}

// TaskRunSpecStatus defines the taskrun spec status the user can provide
// TaskRunSpecStatus defines the TaskRun spec status the user can provide
type TaskRunSpecStatus string

const (
Expand Down Expand Up @@ -166,9 +169,11 @@ const (
TaskRunReasonSuccessful TaskRunReason = "Succeeded"
// TaskRunReasonFailed is the reason set when the TaskRun completed with a failure
TaskRunReasonFailed TaskRunReason = "Failed"
// TaskRunReasonCancelled is the reason set when the Taskrun is cancelled by the user
// TaskRunReasonToBeRetried is the reason set when the last TaskRun retry attempt failed
TaskRunReasonToBeRetried TaskRunReason = "ToBeRetried"
// TaskRunReasonCancelled is the reason set when the TaskRun is cancelled by the user
TaskRunReasonCancelled TaskRunReason = "TaskRunCancelled"
// TaskRunReasonTimedOut is the reason set when the Taskrun has timed out
// TaskRunReasonTimedOut is the reason set when the TaskRun has timed out
TaskRunReasonTimedOut TaskRunReason = "TaskRunTimeout"
// TaskRunReasonResolvingTaskRef indicates that the TaskRun is waiting for
// its taskRef to be asynchronously resolved.
Expand Down Expand Up @@ -417,7 +422,7 @@ type TaskRunList struct {
Items []TaskRun `json:"items"`
}

// GetPipelineRunPVCName for taskrun gets pipelinerun
// GetPipelineRunPVCName for TaskRun gets pipelinerun
func (tr *TaskRun) GetPipelineRunPVCName() string {
if tr == nil {
return ""
Expand Down Expand Up @@ -446,7 +451,7 @@ func (tr *TaskRun) IsDone() bool {
return !tr.Status.GetCondition(apis.ConditionSucceeded).IsUnknown()
}

// HasStarted function check whether taskrun has valid start time set in its status
// HasStarted function check whether TaskRun has valid start time set in its status
func (tr *TaskRun) HasStarted() bool {
return tr.Status.StartTime != nil && !tr.Status.StartTime.IsZero()
}
Expand All @@ -471,6 +476,11 @@ func (tr *TaskRun) IsTaskRunResultDone() bool {
return !tr.Status.GetCondition(apis.ConditionType(TaskRunConditionResultsVerified.String())).IsUnknown()
}

// IsRetriable returns true if the TaskRun's Retries is not exhausted.
func (tr *TaskRun) IsRetriable() bool {
return len(tr.Status.RetriesStatus) < tr.Spec.Retries
}

// HasTimedOut returns true if the TaskRun runtime is beyond the allowed timeout
func (tr *TaskRun) HasTimedOut(ctx context.Context, c clock.PassiveClock) bool {
if tr.Status.StartTime.IsZero() {
Expand Down
Loading

0 comments on commit 415cccc

Please sign in to comment.