[TEP-0050] Add proposed design details #785

QuanZhang-William · 2022-08-16T16:24:27Z

This PR proposes adding a new field OnErrorto PipelineTaskdefinition to allow users define task failure strategy

QuanZhang-William · 2022-08-16T18:13:04Z

/cc @jerop @pritidesai

dibyom · 2022-08-17T14:56:33Z

/kind tep

jerop · 2022-08-18T19:22:18Z

/assign

jerop

Thank you @QuanZhang-William!

teps/0050-ignore-task-failures.md

jerop · 2022-08-22T16:15:11Z

/assign @pritidesai

tekton-robot · 2022-08-24T10:01:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jerop, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~teps/OWNERS~~ [jerop,vdemeester]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

teps/0050-ignore-task-failures.md

pritidesai · 2022-08-25T17:07:32Z

teps/0050-ignore-task-failures.md

+Setting ```Retry``` and ```OnFail``` to ```continue``` at the same time is not allowed in this iteration of the TEP, as there is no point to retry a task that allows to fail. Pipeline validation will be added accordingly. We can support retries with ignored failed task in the future if needed.
+
+### Tasks with Missing Resource Dependency
+In the following example, the first task fails to produce a result (due to step error [details](https://github.com/tektoncd/pipeline/issues/3749)) that is going to be consumed by the second task. The second task will be skipped with reason ```Results were missing``` irrelevant of its OnFail type. This behaviour is consistent with [Guarding a Task only](https://github.com/tektoncd/pipeline/blob/main/docs/pipelines.md#guarding-a-task-only). When we add support for [default Results](https://github.com/tektoncd/community/blob/main/teps/0048-task-results-without-results.md), then the resource-dependent Tasks may be executed if the default Results from the skipped parent Task are specified. 


This paragraph is little confusing and inconsistent with step.onError.

The linked issue has been partially fixed in which pipeline results are emitted even when a task fails. Pipeline results are initialized with the task results from all successful tasks until a task has failed.

step.onError does support emitting results and consuming them - https://github.com/tektoncd/pipeline/blob/main/docs/tasks.md#produce-a-task-result-with-onerror

As long as a task has initialized results before failing, they can be made available for the consumption. This only should be supported with task.onFail: continue.

When a pipeline author has chosen a strategy where a task failure is expected, the author must have complete understanding of what the result could mean and how to consume that result if it's initialized before failing.

For example, git-clone produces commit as a task result. If pipeline author chooses to ignore failure from git-clone, the author is aware that the commit might/might not be initialized depending on the failure. Now, as a pipeline author, when a git-clone failed for some reason but commit is available, its guaranteed that my repo was cloned as expected. If the commit is not initialized and git-clone failed, the repo was not cloned for some reason.

Another example is boskos-acquire. As a pipeline author, I can choose to ignore failure from boskos-acquire. If boskos-acquire failed for some reason to create heartbeat pod but was able to lease resources, leased-resource will be initialized and can be made available to the cleanup task if its initialized.

As long as a task has initialized results before failing, they can be made available for the consumption. This only should be supported with task.onFail: continue.

When a pipeline author has chosen a strategy where a task failure is expected, the author must have complete understanding of what the result could mean and how to consume that result if it's initialized before failing.

Thanks for your input, @pritidesai! This is a good point and the question is essentially: Should we emit results for failed tasks with onFail:continue? And this may need some discussion.

Currently we don't emit results for failed tasks. There are some discussions around it and I'm not sure why we ended up not implement it. Looking at the requirement 3 of this TEP: the task would be considered "successful" for the purposes of determining the status of the pipeline, it seems to me we should respect the existing behaviour about emitting result, and change behaviour of emitting task is another discussion/issue/TEP? Is it a good idea that we respect the existing behaviour as default, and create another TEP/Issue specifically deal with the emitting result behaviour if we can such request from users?

In terms of consistency, I think step.OnError and task.OnFail are different things. step.onError emits the result only when the overall task run status is success, but in task.OnFail we keep the task run status as failed even when it is set to continue

Thoughts? @jerop @vdemeester ?

I think we have an opportunity here to design this right for many different use cases we have seen. I would recommend attempting to solve here rather than delegating it to a separate TEP.

task.onError is a failure strategy and no different from step.onError. Pipeline is not permitted to continue running after a first failure is encountered. With task.onError, even though the task is failed (container exit code stays non-zero), the pipeline is allowed to continue executing the rest of the tasks.

Sure. If we want to consolidate the behaviour about emitting results from failed task results in this TEP, I think it makes more sense to make the emitting result behaviour consistent (i.e. we emit or not emit results from failed tasks regardless of the OnErrorType), otherwise such inconsistency would be confusing to users, since OnErrorType simply describes "should we stop executing the pipeline on task failure or not", and not about "should we emit results from such failed tasks or not".

Looking at the discussion and some feedback from this PR, it looks we do have such requests, why don't we emit result for failed tasks across the board?

@pritidesai @lbernick @jerop

I have added a new section Emit Results from Ignored Failed Tasks and updated Task with Missing Resource Dependency section based on our discussion. Please take a look

pritidesai · 2022-08-25T17:14:45Z

Thanks a bunch @QuanZhang-William for taking this forward, appreciate all the efforts you have put in 🤗.

@jerop @vdemeester I have left some comments on the design, please feel free to ignore them if they are not aligned, thanks 🙏

pritidesai · 2022-08-29T16:07:29Z

API WG - @QuanZhang-William working on comments

lbernick

Curious if we have any desire for this to be consistent with step onError, or if they're addressing different use cases? If so, might be worth including a short section discussing similarities/differences between the two features. Not blocking

teps/0050-ignore-task-failures.md

This commit proposes adding a new field "OnError" to "PipelineTask" definition to allow users define task failure strategy

pritidesai · 2022-09-21T23:36:22Z

Thank you @QuanZhang-William once again 🤗I have one last NIT but will not block this PR.

“continueAndFail” sounds very confusing to me 🙏 “continue” and “fail” are two opposites states and meant for two different runs here I.e. continue pipelineRun but fail taskRun. The simpler options are “continue” (continue execution of PipelineRun) or “ignoreFailure” (ignore taskRun failure).

Please feel free to address this NIT in a separate PR or we implement as is.

/lgtm

pritidesai · 2022-09-22T04:25:14Z

/retest

pritidesai · 2022-09-22T04:26:57Z

/test pull-community-teps-lin

tekton-robot · 2022-09-22T04:26:59Z

@pritidesai: No presubmit jobs available for tektoncd/community@main

In response to this:

/test pull-community-teps-lin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Prior to this commit, the proposal 2 valid values for `PipelineTask` `OnError` field: `continueAndFail` and `StopAndFail`. `continueAndFail` indicates to fail the `PipelineTask`, continue to execute the DAG, and DO NOT fail the `PipelineRun`. However, the name `continueAndFail` is very confusing since - 'continue' and 'Fail' sounds like a conflict (see [comment](tektoncd#785 (comment))) - 'Fail' could be intepreted as failing the whole `pipelinerun`, which is not the expected behavior This commit simplifies the value to `continue`. This is more straightforward and more consistent with `step.OnError`. /kind tep

Prior to this commit, the proposal 2 valid values for `PipelineTask` `OnError` field: `continueAndFail` and `StopAndFail`. `continueAndFail` indicates to fail the `PipelineTask`, continue to execute the DAG, and DO NOT fail the `PipelineRun`. However, the name `continueAndFail` is very confusing since - 'continue' and 'Fail' sounds like a conflict (see [comment](#785 (comment))) - 'Fail' could be intepreted as failing the whole `pipelinerun`, which is not the expected behavior This commit simplifies the value to `continue`. This is more straightforward and more consistent with `step.OnError`. /kind tep

tekton-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 16, 2022

tekton-robot requested review from pradeepitm12 and sthaha August 16, 2022 16:24

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 16, 2022

QuanZhang-William force-pushed the tep-0050-add-proposed-design branch 2 times, most recently from 5b4be2c to 5412084 Compare August 16, 2022 17:21

QuanZhang-William marked this pull request as ready for review August 16, 2022 17:22

tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 16, 2022

tekton-robot requested review from kimsterv and lbernick August 16, 2022 17:22

QuanZhang-William force-pushed the tep-0050-add-proposed-design branch from 5412084 to 216d009 Compare August 16, 2022 17:58

tekton-robot requested review from jerop and pritidesai August 16, 2022 18:13

tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Aug 17, 2022

QuanZhang-William marked this pull request as draft August 18, 2022 13:08

tekton-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 18, 2022

QuanZhang-William force-pushed the tep-0050-add-proposed-design branch 2 times, most recently from 3c772aa to cfe56e8 Compare August 18, 2022 15:26

QuanZhang-William marked this pull request as ready for review August 18, 2022 15:26

tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 18, 2022

tekton-robot requested review from pratap0007 and skaegi August 18, 2022 15:26

tekton-robot assigned jerop Aug 18, 2022

QuanZhang-William force-pushed the tep-0050-add-proposed-design branch from cfe56e8 to bd99b0d Compare August 18, 2022 22:20

jerop reviewed Aug 18, 2022

View reviewed changes

teps/0050-ignore-task-failures.md Outdated Show resolved Hide resolved

teps/0050-ignore-task-failures.md Show resolved Hide resolved

teps/0050-ignore-task-failures.md Outdated Show resolved Hide resolved

teps/0050-ignore-task-failures.md Outdated Show resolved Hide resolved

QuanZhang-William force-pushed the tep-0050-add-proposed-design branch from bd99b0d to 751b030 Compare August 19, 2022 18:39

QuanZhang-William removed the request for review from pritidesai August 19, 2022 18:50

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 19, 2022

jerop reviewed Aug 19, 2022

View reviewed changes

teps/0050-ignore-task-failures.md Outdated Show resolved Hide resolved

QuanZhang-William force-pushed the tep-0050-add-proposed-design branch from 751b030 to acaad10 Compare August 19, 2022 19:10

tekton-robot assigned pritidesai Aug 22, 2022

vdemeester approved these changes Aug 24, 2022

View reviewed changes

pritidesai reviewed Aug 25, 2022

View reviewed changes

teps/0050-ignore-task-failures.md Outdated Show resolved Hide resolved

pritidesai reviewed Aug 25, 2022

View reviewed changes

teps/0050-ignore-task-failures.md Show resolved Hide resolved

pritidesai reviewed Aug 25, 2022

View reviewed changes

teps/0050-ignore-task-failures.md Show resolved Hide resolved

pritidesai reviewed Aug 25, 2022

View reviewed changes

QuanZhang-William force-pushed the tep-0050-add-proposed-design branch 3 times, most recently from da51a1f to 2f7c61d Compare August 30, 2022 14:05

lbernick reviewed Sep 12, 2022

View reviewed changes

teps/0050-ignore-task-failures.md Outdated Show resolved Hide resolved

teps/0050-ignore-task-failures.md Outdated Show resolved Hide resolved

teps/0050-ignore-task-failures.md Outdated Show resolved Hide resolved

QuanZhang-William force-pushed the tep-0050-add-proposed-design branch from 2f7c61d to 9227ecb Compare September 16, 2022 20:44

[TEP-0050] Add proposed design details

1b6c71e

This commit proposes adding a new field "OnError" to "PipelineTask" definition to allow users define task failure strategy

QuanZhang-William force-pushed the tep-0050-add-proposed-design branch from 9227ecb to 1b6c71e Compare September 16, 2022 20:47

vdemeester self-requested a review September 21, 2022 09:19

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 21, 2022

tekton-robot merged commit f495d48 into tektoncd:main Sep 22, 2022

QuanZhang-William mentioned this pull request Sep 28, 2023

[TEP-0050] Simplify continueAndFail to continue #1075

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEP-0050] Add proposed design details #785

[TEP-0050] Add proposed design details #785

QuanZhang-William commented Aug 16, 2022

QuanZhang-William commented Aug 16, 2022

dibyom commented Aug 17, 2022

jerop commented Aug 18, 2022

jerop left a comment

jerop commented Aug 22, 2022

tekton-robot commented Aug 24, 2022

pritidesai Aug 25, 2022 •

edited

Loading

QuanZhang-William Aug 29, 2022

pritidesai Aug 30, 2022 •

edited

Loading

QuanZhang-William Aug 31, 2022

QuanZhang-William Sep 16, 2022

pritidesai commented Aug 25, 2022

pritidesai commented Aug 29, 2022

lbernick left a comment

pritidesai commented Sep 21, 2022

pritidesai commented Sep 22, 2022

pritidesai commented Sep 22, 2022

tekton-robot commented Sep 22, 2022

[TEP-0050] Add proposed design details #785

[TEP-0050] Add proposed design details #785

Conversation

QuanZhang-William commented Aug 16, 2022

QuanZhang-William commented Aug 16, 2022

dibyom commented Aug 17, 2022

jerop commented Aug 18, 2022

jerop left a comment

Choose a reason for hiding this comment

jerop commented Aug 22, 2022

tekton-robot commented Aug 24, 2022

pritidesai Aug 25, 2022 • edited Loading

Choose a reason for hiding this comment

QuanZhang-William Aug 29, 2022

Choose a reason for hiding this comment

pritidesai Aug 30, 2022 • edited Loading

Choose a reason for hiding this comment

QuanZhang-William Aug 31, 2022

Choose a reason for hiding this comment

QuanZhang-William Sep 16, 2022

Choose a reason for hiding this comment

pritidesai commented Aug 25, 2022

pritidesai commented Aug 29, 2022

lbernick left a comment

Choose a reason for hiding this comment

pritidesai commented Sep 21, 2022

pritidesai commented Sep 22, 2022

pritidesai commented Sep 22, 2022

tekton-robot commented Sep 22, 2022

pritidesai Aug 25, 2022 •

edited

Loading

pritidesai Aug 30, 2022 •

edited

Loading