Set pipeline status when all tasks complete #2774

afrittoli · 2020-06-07T11:29:45Z

Changes

We used to set the pipeline status to failed as soon as the first
task in the pipeline failed or was cancelled.

As soon as the first task in the pipeline fails or is cancelled, we
stop scheduling new tasks, as we did before, but we will report
status Unknown until all Tasks are complete, with reason "stopping".

This allows to:

the completion time at the same time that the status is set
and avoid inconsistencies
wait until all tasks are complete before we cleanup the pipeline
artifact storage, affinity assistant and record metrics
report the correct number of failed / cancelled tasks, as there
may be more than one. Other tasks that were already running
when the first failure happened may fail too
prepare the pipeline controller more complex workflows, where
the controller may continue working scheduling after failures

Add test coverage for isSkipped and extend the pipelineresolution
module unit test coverage to capture more input pipeline state.

The DAG / GetNextTasks functions have not been touched at all.
The pipeline run reconciler, when the pipeline run is in stopping
mode, stops asking for next tasks to run. Once all running tasks
finish, the pipeline run finally gets in failed state.

When the pipeline run is in stopping mode, tasks that have not
been started yet are also counted as skipped in the status message
reported to the user.

Fixes #1680

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

[WIP] Includes tests (if functionality changed/added)
Includes docs (if user facing)
Commit messages follow commit message best practices

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

If you are adding a new binary/image to the cmd dir, please update
the release Task to build and release this image.

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

Release Notes

In case of task run failure or cancellation, the pipeline run stops scheduling new tasks like before.
The status is marked as failed however only once all the task runs already scheduled are complete.

A new reason "PipelineRunStopping" indicate that the pipeline run has found a failure and is waiting for task runs to complete.

tekton-robot · 2020-06-07T11:30:21Z

This PR cannot be merged: expecting exactly one kind/ label

Available kind/ labels are:

kind/bug: Categorizes issue or PR as related to a bug.
kind/flake: Categorizes issue or PR as related to a flakey test
kind/cleanup: Categorizes issue or PR as related to cleaning up code, process, or technical debt.
kind/design: Categorizes issue or PR as related to design.
kind/documentation: Categorizes issue or PR as related to documentation.
kind/feature: Categorizes issue or PR as related to a new feature.
kind/misc: Categorizes issue or PR as a miscellaneuous one.

tekton-robot · 2020-06-07T11:30:21Z

This PR cannot be merged: expecting exactly one kind/ label

Available kind/ labels are:

kind/bug: Categorizes issue or PR as related to a bug.
kind/flake: Categorizes issue or PR as related to a flakey test
kind/cleanup: Categorizes issue or PR as related to cleaning up code, process, or technical debt.
kind/design: Categorizes issue or PR as related to design.
kind/documentation: Categorizes issue or PR as related to documentation.
kind/feature: Categorizes issue or PR as related to a new feature.
kind/misc: Categorizes issue or PR as a miscellaneuous one.

tekton-robot · 2020-06-07T11:32:11Z

This PR cannot be merged: expecting exactly one kind/ label

Available kind/ labels are:

kind/bug: Categorizes issue or PR as related to a bug.
kind/flake: Categorizes issue or PR as related to a flakey test
kind/cleanup: Categorizes issue or PR as related to cleaning up code, process, or technical debt.
kind/design: Categorizes issue or PR as related to design.
kind/documentation: Categorizes issue or PR as related to documentation.
kind/feature: Categorizes issue or PR as related to a new feature.
kind/misc: Categorizes issue or PR as a miscellaneuous one.

tekton-robot · 2020-06-07T21:14:49Z

This PR cannot be merged: expecting exactly one kind/ label

Available kind/ labels are:

kind/bug: Categorizes issue or PR as related to a bug.
kind/flake: Categorizes issue or PR as related to a flakey test
kind/cleanup: Categorizes issue or PR as related to cleaning up code, process, or technical debt.
kind/design: Categorizes issue or PR as related to design.
kind/documentation: Categorizes issue or PR as related to documentation.
kind/feature: Categorizes issue or PR as related to a new feature.
kind/misc: Categorizes issue or PR as a miscellaneuous one.

tekton-robot · 2020-06-07T21:17:04Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go	90.5%	89.3%	-1.2

tekton-robot · 2020-06-08T05:01:19Z

This PR cannot be merged: expecting exactly one kind/ label

Available kind/ labels are:

kind/bug: Categorizes issue or PR as related to a bug.
kind/flake: Categorizes issue or PR as related to a flakey test
kind/cleanup: Categorizes issue or PR as related to cleaning up code, process, or technical debt.
kind/design: Categorizes issue or PR as related to design.
kind/documentation: Categorizes issue or PR as related to documentation.
kind/feature: Categorizes issue or PR as related to a new feature.
kind/misc: Categorizes issue or PR as a miscellaneuous one.

tekton-robot · 2020-06-08T05:03:36Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go	90.5%	89.3%	-1.2

afrittoli · 2020-06-08T09:40:42Z

/kind feature

tekton-robot · 2020-06-08T10:51:39Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go	90.5%	89.3%	-1.2

tekton-robot · 2020-06-08T15:36:05Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go	90.5%	90.1%	-0.3

tekton-robot · 2020-06-08T15:50:13Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/pipelinerun.go	81.7%	81.5%	-0.2
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go	90.5%	90.1%	-0.3

pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go

tekton-robot · 2020-06-10T10:11:13Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/pipelinerun.go	81.7%	81.5%	-0.2
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go	90.5%	91.8%	1.3

vdemeester

/lgtm

pritidesai · 2020-06-10T16:59:58Z

pkg/apis/pipeline/v1beta1/pipelinerun_types.go

@@ -231,6 +231,9 @@ const (
 	PipelineRunReasonCancelled PipelineRunReason = "Cancelled"
 	// PipelineRunReasonTimedOut is the reason set when the PipelineRun has timed out
 	PipelineRunReasonTimedOut PipelineRunReason = "PipelineRunTimeout"
+	// ReasonStopping indicates that no new Tasks will be scheduled by the controller, and the
+	// pipeline will stop once all running tasks complete their work
+	PipelineRunReasonStopping PipelineRunReason = "PipelineRunStopping"


I think it looks little odd, I ran a simple pipeline with these changes and the state transitions from stop to failed, pipeline stopped while task is running 😕

kubectl get pr NAME SUCCEEDED REASON STARTTIME COMPLETIONTIME pipelinerun-one-failure-two-success Unknown PipelineRunStopping 21s kubectl get tr NAME SUCCEEDED REASON STARTTIME COMPLETIONTIME pipelinerun-one-failure-two-success-task-a-qdphk False Failed 25s 14s pipelinerun-one-failure-two-success-task-b-z8ctx True Succeeded 25s 7s pipelinerun-one-failure-two-success-task-c-pwkhv Unknown Running 25s kubectl get pr NAME SUCCEEDED REASON STARTTIME COMPLETIONTIME pipelinerun-one-failure-two-success False Failed 31s 5s

Thanks for trying it out on a real cloud - I kind of relied on unit and E2E tests :)

The reason changes as follows:

start / running / stopping / failed
which is what I was hoping to achieve.
When we extend the pipeline with finally, it could look something like:

start / running / stopping / running-finally / failed

The message could also help providing more details.

tekton-robot · 2020-06-10T17:44:48Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go	90.5%	91.8%	1.3

tekton-robot · 2020-06-10T17:53:06Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go	90.5%	91.8%	1.3

pritidesai · 2020-06-10T17:57:20Z

pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go

+		return false
+	}
+
+	c := t.TaskRun.Status.GetCondition(apis.ConditionSucceeded)


task with a condition should be covered here and should help mark pipeline as started when a condition container has started and condition is executing.

pritidesai · 2020-06-10T18:02:26Z

this looks great, thanks @afrittoli
I have some work to do now 😢 on finally PR, it wouldnt make sense to have pipeline as:
started -> running -> stopping -> running -> failed may be / may be not 🤔 , will verify ✍️
/lgtm

We used to set the pipeline status to failed as soon as the first task in the pipeline failed or was cancelled. As soon as the first task in the pipeline fails or is cancelled, we stop scheduling new tasks, as we did before, but we will report status Unknown until all Tasks are complete, with reason "stopping". This allows to: - the completion time at the same time that the status is set and avoid inconsistencies - wait until all tasks are complete before we cleanup the pipeline artifact storage, affinity assistant and record metrics - report the correct number of failed / cancelled tasks, as there may be more than one. Other tasks that were already running when the first failure happened may fail too - prepare the pipeline controller more complex workflows, where the controller may continue working scheduling after failures Add test coverage for isSkipped and extend the pipelineresolution module unit test coverage to capture more input pipeline state. The DAG / GetNextTasks functions have not been touched at all. The pipeline run reconciler, when the pipeline run is in stopping mode, stops asking for next tasks to run. Once all running tasks finish, the pipeline run finally gets in failed state. When the pipeline run is in stopping mode, tasks that have not been started yet are also counted as skipped in the status message reported to the user.

tekton-robot · 2020-06-10T18:16:29Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/pipelinerun.go	82.1%	82.0%	-0.1
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go	90.5%	91.8%	1.3

pritidesai · 2020-06-10T18:22:35Z

pkg/reconciler/pipelinerun/pipelinerun.go

+
+// runNextSchedulableTask gets the next schedulable Tasks from the dag based on the current
+// pipeline run state, and starts them
+func (c *Reconciler) runNextSchedulableTask(ctx context.Context, pr *v1beta1.PipelineRun, d *dag.Graph, pipelineState resources.PipelineRunState, as artifacts.ArtifactStorageInterface) error {


NIT: how to avoid any more reconciler receiver functions and at the same time not explode the reconcile itself? 🤔

Good point!
Getting the recorder and the logger from the context helps with that.
This function depends on other receiver functions, so it still needs c, but perhaps it is possible now to reduce the number of receiver functions - in a separate PR

afrittoli · 2020-06-10T19:51:30Z

this looks great, thanks @afrittoli
I have some work to do now 😢 on finally PR, it wouldnt make sense to have pipeline as:
started -> running -> stopping -> running -> failed may be / may be not 🤔 , will verify ✍️
/lgtm

I think it could be like that as a start, it would work fine still. We could improve on the reason as a PR on top. I would be happy to help with that.

pritidesai · 2020-06-11T00:57:35Z

/lgtm

This PR will impact finally PR #2661, I am fine merging this before we merge finally PR, will incorporate necessary changes into that PR.

vdemeester

/meow

tekton-robot · 2020-06-11T08:18:53Z

@vdemeester:

In response to this:

/meow

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot · 2020-06-11T08:18:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [vdemeester]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tekton-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 7, 2020

tekton-robot requested review from bobcatfish and a user June 7, 2020 11:29

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 7, 2020

afrittoli mentioned this pull request Jun 7, 2020

Use a helper for setting the Succeeded condition on PipelineRun. #2749

Merged

afrittoli force-pushed the pipeline_fail_later branch from f4513a1 to e72a32e Compare June 7, 2020 21:14

afrittoli force-pushed the pipeline_fail_later branch from e72a32e to e7ea796 Compare June 8, 2020 05:00

tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 8, 2020

afrittoli force-pushed the pipeline_fail_later branch from e7ea796 to 25ca466 Compare June 8, 2020 10:48

afrittoli force-pushed the pipeline_fail_later branch from 25ca466 to 16d7973 Compare June 8, 2020 15:33

afrittoli force-pushed the pipeline_fail_later branch from 16d7973 to f57d553 Compare June 8, 2020 15:47

afrittoli changed the title ~~WIP Set pipeline status when all tasks complete~~ Set pipeline status when all tasks complete Jun 8, 2020

tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 8, 2020

bobcatfish added this to the Pipelines 0.13.1 🐱 milestone Jun 8, 2020

GregDritschler reviewed Jun 8, 2020

View reviewed changes

pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go Show resolved Hide resolved

afrittoli changed the title ~~Set pipeline status when all tasks complete~~ WIP Set pipeline status when all tasks complete Jun 8, 2020

tekton-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 8, 2020

afrittoli force-pushed the pipeline_fail_later branch from f57d553 to 4e90952 Compare June 8, 2020 20:39

tekton-robot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 8, 2020

vdemeester reviewed Jun 10, 2020

View reviewed changes

tekton-robot assigned vdemeester Jun 10, 2020

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 10, 2020

pritidesai reviewed Jun 10, 2020

View reviewed changes

afrittoli force-pushed the pipeline_fail_later branch from 96e646f to f79dfc9 Compare June 10, 2020 17:42

tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Jun 10, 2020

afrittoli force-pushed the pipeline_fail_later branch from f79dfc9 to 6712c13 Compare June 10, 2020 17:49

pritidesai reviewed Jun 10, 2020

View reviewed changes

tekton-robot assigned pritidesai Jun 10, 2020

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 10, 2020

afrittoli force-pushed the pipeline_fail_later branch from 6712c13 to 4318bec Compare June 10, 2020 18:13

tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Jun 10, 2020

pritidesai reviewed Jun 10, 2020

View reviewed changes

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2020

vdemeester approved these changes Jun 11, 2020

View reviewed changes

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2020

tekton-robot merged commit 3b95494 into tektoncd:master Jun 11, 2020

pritidesai mentioned this pull request Jun 12, 2020

implementing pipeline level finally #2661

Merged

3 tasks

pritidesai mentioned this pull request Aug 21, 2020

Pipelinerun status will not be set to PipelineRunStopping if pipeline contains finally tasks #3119

Closed

pritidesai mentioned this pull request Mar 14, 2022

Allow tasks to retry when PipelineRun stops #4651

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set pipeline status when all tasks complete #2774

Set pipeline status when all tasks complete #2774

afrittoli commented Jun 7, 2020 •

edited

Loading

tekton-robot commented Jun 7, 2020

tekton-robot commented Jun 7, 2020

tekton-robot commented Jun 7, 2020

tekton-robot commented Jun 7, 2020

tekton-robot commented Jun 7, 2020

tekton-robot commented Jun 8, 2020

tekton-robot commented Jun 8, 2020

afrittoli commented Jun 8, 2020

tekton-robot commented Jun 8, 2020

tekton-robot commented Jun 8, 2020

tekton-robot commented Jun 8, 2020

tekton-robot commented Jun 10, 2020

vdemeester left a comment

pritidesai Jun 10, 2020

afrittoli Jun 10, 2020

tekton-robot commented Jun 10, 2020

tekton-robot commented Jun 10, 2020

pritidesai Jun 10, 2020 •

edited

Loading

pritidesai commented Jun 10, 2020

tekton-robot commented Jun 10, 2020

pritidesai Jun 10, 2020

afrittoli Jun 10, 2020

afrittoli commented Jun 10, 2020

pritidesai commented Jun 11, 2020

vdemeester left a comment

tekton-robot commented Jun 11, 2020

tekton-robot commented Jun 11, 2020

Set pipeline status when all tasks complete #2774

Set pipeline status when all tasks complete #2774

Conversation

afrittoli commented Jun 7, 2020 • edited Loading

Changes

Submitter Checklist

Reviewer Notes

Release Notes

tekton-robot commented Jun 7, 2020

tekton-robot commented Jun 7, 2020

tekton-robot commented Jun 7, 2020

tekton-robot commented Jun 7, 2020

tekton-robot commented Jun 7, 2020

tekton-robot commented Jun 8, 2020

tekton-robot commented Jun 8, 2020

afrittoli commented Jun 8, 2020

tekton-robot commented Jun 8, 2020

tekton-robot commented Jun 8, 2020

tekton-robot commented Jun 8, 2020

tekton-robot commented Jun 10, 2020

vdemeester left a comment

Choose a reason for hiding this comment

pritidesai Jun 10, 2020

Choose a reason for hiding this comment

afrittoli Jun 10, 2020

Choose a reason for hiding this comment

tekton-robot commented Jun 10, 2020

tekton-robot commented Jun 10, 2020

pritidesai Jun 10, 2020 • edited Loading

Choose a reason for hiding this comment

pritidesai commented Jun 10, 2020

tekton-robot commented Jun 10, 2020

pritidesai Jun 10, 2020

Choose a reason for hiding this comment

afrittoli Jun 10, 2020

Choose a reason for hiding this comment

afrittoli commented Jun 10, 2020

pritidesai commented Jun 11, 2020

vdemeester left a comment

Choose a reason for hiding this comment

tekton-robot commented Jun 11, 2020

tekton-robot commented Jun 11, 2020

afrittoli commented Jun 7, 2020 •

edited

Loading

pritidesai Jun 10, 2020 •

edited

Loading