-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set pipeline status when all tasks complete #2774
Conversation
This PR cannot be merged: expecting exactly one kind/ label Available
|
2 similar comments
This PR cannot be merged: expecting exactly one kind/ label Available
|
This PR cannot be merged: expecting exactly one kind/ label Available
|
f4513a1
to
e72a32e
Compare
This PR cannot be merged: expecting exactly one kind/ label Available
|
The following is the coverage report on the affected files.
|
e72a32e
to
e7ea796
Compare
This PR cannot be merged: expecting exactly one kind/ label Available
|
The following is the coverage report on the affected files.
|
/kind feature |
e7ea796
to
25ca466
Compare
The following is the coverage report on the affected files.
|
25ca466
to
16d7973
Compare
The following is the coverage report on the affected files.
|
16d7973
to
f57d553
Compare
The following is the coverage report on the affected files.
|
f57d553
to
4e90952
Compare
The following is the coverage report on the affected files.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@@ -231,6 +231,9 @@ const ( | |||
PipelineRunReasonCancelled PipelineRunReason = "Cancelled" | |||
// PipelineRunReasonTimedOut is the reason set when the PipelineRun has timed out | |||
PipelineRunReasonTimedOut PipelineRunReason = "PipelineRunTimeout" | |||
// ReasonStopping indicates that no new Tasks will be scheduled by the controller, and the | |||
// pipeline will stop once all running tasks complete their work | |||
PipelineRunReasonStopping PipelineRunReason = "PipelineRunStopping" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks little odd, I ran a simple pipeline with these changes and the state transitions from stop to failed, pipeline stopped while task is running 😕
kubectl get pr
NAME SUCCEEDED REASON STARTTIME COMPLETIONTIME
pipelinerun-one-failure-two-success Unknown PipelineRunStopping 21s
kubectl get tr
NAME SUCCEEDED REASON STARTTIME COMPLETIONTIME
pipelinerun-one-failure-two-success-task-a-qdphk False Failed 25s 14s
pipelinerun-one-failure-two-success-task-b-z8ctx True Succeeded 25s 7s
pipelinerun-one-failure-two-success-task-c-pwkhv Unknown Running 25s
kubectl get pr
NAME SUCCEEDED REASON STARTTIME COMPLETIONTIME
pipelinerun-one-failure-two-success False Failed 31s 5s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for trying it out on a real cloud - I kind of relied on unit and E2E tests :)
The reason changes as follows:
- start / running / stopping / failed
which is what I was hoping to achieve.
When we extend the pipeline with finally, it could look something like: - start / running / stopping / running-finally / failed
The message could also help providing more details.
96e646f
to
f79dfc9
Compare
The following is the coverage report on the affected files.
|
f79dfc9
to
6712c13
Compare
The following is the coverage report on the affected files.
|
return false | ||
} | ||
|
||
c := t.TaskRun.Status.GetCondition(apis.ConditionSucceeded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task with a condition should be covered here and should help mark pipeline as started
when a condition container has started and condition is executing.
this looks great, thanks @afrittoli |
We used to set the pipeline status to failed as soon as the first task in the pipeline failed or was cancelled. As soon as the first task in the pipeline fails or is cancelled, we stop scheduling new tasks, as we did before, but we will report status Unknown until all Tasks are complete, with reason "stopping". This allows to: - the completion time at the same time that the status is set and avoid inconsistencies - wait until all tasks are complete before we cleanup the pipeline artifact storage, affinity assistant and record metrics - report the correct number of failed / cancelled tasks, as there may be more than one. Other tasks that were already running when the first failure happened may fail too - prepare the pipeline controller more complex workflows, where the controller may continue working scheduling after failures Add test coverage for isSkipped and extend the pipelineresolution module unit test coverage to capture more input pipeline state. The DAG / GetNextTasks functions have not been touched at all. The pipeline run reconciler, when the pipeline run is in stopping mode, stops asking for next tasks to run. Once all running tasks finish, the pipeline run finally gets in failed state. When the pipeline run is in stopping mode, tasks that have not been started yet are also counted as skipped in the status message reported to the user.
6712c13
to
4318bec
Compare
The following is the coverage report on the affected files.
|
|
||
// runNextSchedulableTask gets the next schedulable Tasks from the dag based on the current | ||
// pipeline run state, and starts them | ||
func (c *Reconciler) runNextSchedulableTask(ctx context.Context, pr *v1beta1.PipelineRun, d *dag.Graph, pipelineState resources.PipelineRunState, as artifacts.ArtifactStorageInterface) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: how to avoid any more reconciler receiver functions and at the same time not explode the reconcile
itself? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
Getting the recorder and the logger from the context helps with that.
This function depends on other receiver functions, so it still needs c
, but perhaps it is possible now to reduce the number of receiver functions - in a separate PR
I think it could be like that as a start, it would work fine still. We could improve on the reason as a PR on top. I would be happy to help with that. |
/lgtm This PR will impact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/meow
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Changes
We used to set the pipeline status to failed as soon as the first
task in the pipeline failed or was cancelled.
As soon as the first task in the pipeline fails or is cancelled, we
stop scheduling new tasks, as we did before, but we will report
status Unknown until all Tasks are complete, with reason "stopping".
This allows to:
and avoid inconsistencies
artifact storage, affinity assistant and record metrics
may be more than one. Other tasks that were already running
when the first failure happened may fail too
the controller may continue working scheduling after failures
Add test coverage for isSkipped and extend the pipelineresolution
module unit test coverage to capture more input pipeline state.
The DAG / GetNextTasks functions have not been touched at all.
The pipeline run reconciler, when the pipeline run is in stopping
mode, stops asking for next tasks to run. Once all running tasks
finish, the pipeline run finally gets in failed state.
When the pipeline run is in stopping mode, tasks that have not
been started yet are also counted as skipped in the status message
reported to the user.
Fixes #1680
Submitter Checklist
These are the criteria that every PR should meet, please check them off as you
review them:
See the contribution guide for more details.
Double check this list of stuff that's easy to miss:
cmd
dir, please updatethe release Task to build and release this image.
Reviewer Notes
If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.
Release Notes