-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TEP-0050] Add proposed design details #785
[TEP-0050] Add proposed design details #785
Conversation
5b4be2c
to
5412084
Compare
5412084
to
216d009
Compare
/cc @jerop @pritidesai |
/kind tep |
3c772aa
to
cfe56e8
Compare
/assign |
cfe56e8
to
bd99b0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @QuanZhang-William!
bd99b0d
to
751b030
Compare
751b030
to
acaad10
Compare
/assign @pritidesai |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jerop, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
teps/0050-ignore-task-failures.md
Outdated
Setting ```Retry``` and ```OnFail``` to ```continue``` at the same time is not allowed in this iteration of the TEP, as there is no point to retry a task that allows to fail. Pipeline validation will be added accordingly. We can support retries with ignored failed task in the future if needed. | ||
|
||
### Tasks with Missing Resource Dependency | ||
In the following example, the first task fails to produce a result (due to step error [details](https://github.com/tektoncd/pipeline/issues/3749)) that is going to be consumed by the second task. The second task will be skipped with reason ```Results were missing``` irrelevant of its OnFail type. This behaviour is consistent with [Guarding a Task only](https://github.com/tektoncd/pipeline/blob/main/docs/pipelines.md#guarding-a-task-only). When we add support for [default Results](https://github.com/tektoncd/community/blob/main/teps/0048-task-results-without-results.md), then the resource-dependent Tasks may be executed if the default Results from the skipped parent Task are specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph is little confusing and inconsistent with step.onError
.
The linked issue has been partially fixed in which pipeline results are emitted even when a task fails. Pipeline results are initialized with the task results from all successful tasks until a task has failed.
step.onError
does support emitting results and consuming them - https://github.com/tektoncd/pipeline/blob/main/docs/tasks.md#produce-a-task-result-with-onerror
As long as a task has initialized results before failing, they can be made available for the consumption. This only should be supported with task.onFail: continue
.
When a pipeline author has chosen a strategy where a task failure is expected, the author must have complete understanding of what the result
could mean and how to consume that result
if it's initialized before failing.
For example, git-clone
produces commit
as a task result. If pipeline author chooses to ignore failure from git-clone
, the author is aware that the commit
might/might not be initialized depending on the failure. Now, as a pipeline author, when a git-clone
failed for some reason but commit
is available, its guaranteed that my repo was cloned as expected. If the commit
is not initialized and git-clone
failed, the repo was not cloned for some reason.
Another example is boskos-acquire. As a pipeline author, I can choose to ignore failure from boskos-acquire
. If boskos-acquire
failed for some reason to create heartbeat pod but was able to lease resources, leased-resource
will be initialized and can be made available to the cleanup
task if its initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as a task has initialized results before failing, they can be made available for the consumption. This only should be supported with
task.onFail: continue
.When a pipeline author has chosen a strategy where a task failure is expected, the author must have complete understanding of what the
result
could mean and how to consume thatresult
if it's initialized before failing.
Thanks for your input, @pritidesai! This is a good point and the question is essentially: Should we emit results for failed tasks with onFail:continue? And this may need some discussion.
Currently we don't emit results for failed tasks. There are some discussions around it and I'm not sure why we ended up not implement it. Looking at the requirement 3 of this TEP: the task would be considered "successful" for the purposes of determining the status of the pipeline
, it seems to me we should respect the existing behaviour about emitting result, and change behaviour of emitting task is another discussion/issue/TEP? Is it a good idea that we respect the existing behaviour as default, and create another TEP/Issue specifically deal with the emitting result behaviour if we can such request from users?
In terms of consistency, I think step.OnError
and task.OnFail
are different things. step.onError
emits the result only when the overall task run status is success, but in task.OnFail
we keep the task run status as failed even when it is set to continue
Thoughts? @jerop @vdemeester ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have an opportunity here to design this right for many different use cases we have seen. I would recommend attempting to solve here rather than delegating it to a separate TEP.
task.onError
is a failure strategy and no different from step.onError
. Pipeline is not permitted to continue running after a first failure is encountered. With task.onError
, even though the task is failed (container exit code stays non-zero), the pipeline is allowed to continue executing the rest of the tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. If we want to consolidate the behaviour about emitting results from failed task results in this TEP, I think it makes more sense to make the emitting result behaviour consistent (i.e. we emit or not emit results from failed tasks regardless of the OnErrorType
), otherwise such inconsistency would be confusing to users, since OnErrorType
simply describes "should we stop executing the pipeline on task failure or not", and not about "should we emit results from such failed tasks or not".
Looking at the discussion and some feedback from this PR, it looks we do have such requests, why don't we emit result for failed tasks across the board?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a new section Emit Results from Ignored Failed Tasks and updated Task with Missing Resource Dependency section based on our discussion. Please take a look
Thanks a bunch @QuanZhang-William for taking this forward, appreciate all the efforts you have put in 🤗. @jerop @vdemeester I have left some comments on the design, please feel free to ignore them if they are not aligned, thanks 🙏 |
API WG - @QuanZhang-William working on comments |
da51a1f
to
2f7c61d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if we have any desire for this to be consistent with step onError, or if they're addressing different use cases? If so, might be worth including a short section discussing similarities/differences between the two features. Not blocking
2f7c61d
to
9227ecb
Compare
This commit proposes adding a new field "OnError" to "PipelineTask" definition to allow users define task failure strategy
9227ecb
to
1b6c71e
Compare
Thank you @QuanZhang-William once again 🤗I have one last NIT but will not block this PR. “continueAndFail” sounds very confusing to me 🙏 “continue” and “fail” are two opposites states and meant for two different runs here I.e. continue pipelineRun but fail taskRun. The simpler options are “continue” (continue execution of PipelineRun) or “ignoreFailure” (ignore taskRun failure). Please feel free to address this NIT in a separate PR or we implement as is. /lgtm |
/retest |
/test pull-community-teps-lin |
@pritidesai: No presubmit jobs available for tektoncd/community@main In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Prior to this commit, the proposal 2 valid values for `PipelineTask` `OnError` field: `continueAndFail` and `StopAndFail`. `continueAndFail` indicates to fail the `PipelineTask`, continue to execute the DAG, and DO NOT fail the `PipelineRun`. However, the name `continueAndFail` is very confusing since - 'continue' and 'Fail' sounds like a conflict (see [comment](tektoncd#785 (comment))) - 'Fail' could be intepreted as failing the whole `pipelinerun`, which is not the expected behavior This commit simplifies the value to `continue`. This is more straightforward and more consistent with `step.OnError`. /kind tep
Prior to this commit, the proposal 2 valid values for `PipelineTask` `OnError` field: `continueAndFail` and `StopAndFail`. `continueAndFail` indicates to fail the `PipelineTask`, continue to execute the DAG, and DO NOT fail the `PipelineRun`. However, the name `continueAndFail` is very confusing since - 'continue' and 'Fail' sounds like a conflict (see [comment](#785 (comment))) - 'Fail' could be intepreted as failing the whole `pipelinerun`, which is not the expected behavior This commit simplifies the value to `continue`. This is more straightforward and more consistent with `step.OnError`. /kind tep
This PR proposes adding a new field
OnError
toPipelineTask
definition to allow users define task failure strategy