TEP-0069: Support retries for custom task in a pipeline. #441

ScrapCodes · 2021-05-31T11:35:31Z

I need feedback on the proposal, if it looks good then I can work on demo and other design aspects.

Summary

A pipeline task can be configured with a retries count, this is
currently only supported for TaskRuns and not Runs (i.e. custom tasks).

This TEP is about, a pipeline task can be configured with a retries count
for Custom tasks.

Also, a PipelineRun already manages a retry for regular task
by updating it's status. However, for custom task, a tekton owned controller
can signal a custom task controller, to retry. A custom task controller may
optionally support it.

Tomcli · 2021-06-01T17:46:49Z

This TEP can be part of the TEP 02 proposal if needed.

/cc @afrittoli

jerop · 2021-06-07T16:22:58Z

/kind tep

afrittoli · 2021-06-07T16:27:43Z

/assign @afrittoli
/assign @vdemeester

ScrapCodes · 2021-06-14T11:19:56Z

Thank you @afrittoli and @vdemeester for volunteering to review this TEP. Can you please take a look !! 🙏

vdemeester

/cc @imjasonh

jerop · 2021-07-12T16:09:55Z

/assign

bobcatfish · 2021-07-19T21:15:51Z

teps/0069-support-retries-for-custom-task-in-a-pipeline.md

+Consider both the user's role (are they a Task author? Catalog Task user?
+Cluster Admin? etc...) and experience (what workflows or actions are enhanced
+if this problem is solved?).
+-->


Is it possible to add some use cases here describing why this is needed?

It almost makes sense to me without use cases but the part I am missing is that the way we do this with Pipeline -> TaskRun is that hte Pipeline controls the retries; I'd like to understand why we want the Pipeline to pass responsibility to the Run in this case, and why having a "retry" parameter for cases where it's important that the Custom Task do it's own retrying isn't enough

Yes, I can add.

A pipelineRun will have the information of retry history of a Run, just as it has for a TaskRun. So Pipeline run is responsible for controlling the retry, e.g. the passing of Retries is FYI only. Actual, retry is triggered by pipelinerun controller, by setting /spec/status for a Run. However, how exactly the retry is done, is taken care of by the custom task controller. It may not respond at all, and we should be able to handle that.

Thanks for the extra info - another alternative approach could be to have the Pipeline create a new Run every time a retry is needed, do you think there's any possibility that might work? I'm guessing that would have some downsides in that the custom task wouldn't have any control over how the retries work? 🤔

(This reminds me of our other conversations in #422 in that I only recently realized we are re-using the same TaskRun when we retry; part of the motivation of decoupling TaskRuns from Pipelines was that something like a retry of a TaskRun could be accomplished by simply making as many TaskRuns as we wanted, but apparently thats not the route we went - so I'm hoping we can decide conclusively which approach we want to embrace)

A custom task is different from a regular task, retrying can be very different from just retrying a Pod. So, a custom task may need to examine the state between two retries and optimise.

e.g. PipelineLoop controller can retry only failed pipelines based on certain criteria, instead of retrying everything incase we created a fresh Run.

Can you explain a bit more about what kind of criteria the PipelineLoop controller might be looking at?

A PipelineLoop controller maintains the list of failed pipeline runs for each Run.
e.g. for a particular run - 2 out of 5 loops were not successful and as a result, it updates its status as Failed.

We give it a signal to retry, it can simply retry the two failed PipelineRun loops and not everything.

We create a fresh Run to retry, it has no information of what happened before. So, it starts as if fresh.

Ahhh okay that makes a lot of sense - if you're willing to add this to the use cases description i think that would be great context to record.

Thanks!

done, updated the alternatives section too!

bobcatfish · 2021-07-19T21:16:30Z

/assign

vdemeester · 2021-07-22T14:36:52Z

/assign

bobcatfish · 2021-07-23T22:33:10Z

/approve

tekton-robot · 2021-07-23T22:33:14Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobcatfish, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~teps/OWNERS~~ [bobcatfish,vdemeester]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

afrittoli · 2021-07-26T16:13:16Z

/lgtm

tekton-robot requested review from AlanGreene and vdemeester May 31, 2021 11:35

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 31, 2021

tekton-robot requested a review from afrittoli June 1, 2021 17:46

tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 1, 2021

ScrapCodes force-pushed the tep-69 branch from 2e9fb90 to 54bdb6f Compare June 7, 2021 11:18

tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 7, 2021

tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Jun 7, 2021

tekton-robot assigned afrittoli and vdemeester Jun 7, 2021

ScrapCodes force-pushed the tep-69 branch from 54bdb6f to a605e84 Compare June 14, 2021 11:18

ScrapCodes force-pushed the tep-69 branch from a605e84 to 2e26abd Compare June 14, 2021 11:22

tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 22, 2021

ScrapCodes force-pushed the tep-69 branch 2 times, most recently from 9a0a8a5 to 57c02d5 Compare June 28, 2021 13:40

tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 28, 2021

ScrapCodes force-pushed the tep-69 branch 8 times, most recently from 3c1552a to 71f5f0e Compare July 6, 2021 11:49

vdemeester approved these changes Jul 8, 2021

View reviewed changes

tekton-robot requested a review from imjasonh July 8, 2021 14:33

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 8, 2021

ScrapCodes force-pushed the tep-69 branch 2 times, most recently from 62d19a4 to 99bdb82 Compare July 12, 2021 15:13

tekton-robot assigned jerop Jul 12, 2021

ScrapCodes mentioned this pull request Jul 19, 2021

[WIP][POC]TEP-0069: Support retries for custom task in a pipeline. tektoncd/pipeline#4103

Closed

5 tasks

bobcatfish reviewed Jul 19, 2021

View reviewed changes

tekton-robot assigned bobcatfish Jul 19, 2021

ScrapCodes force-pushed the tep-69 branch 2 times, most recently from 8318ca8 to 755eba6 Compare July 21, 2021 09:13

ScrapCodes force-pushed the tep-69 branch from 755eba6 to 7702fbb Compare July 26, 2021 10:29

TEP-0069: Support retries for custom task in a pipeline.

3844269

ScrapCodes force-pushed the tep-69 branch from 7702fbb to 3844269 Compare July 26, 2021 10:31

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 26, 2021

tekton-robot merged commit 285ff99 into tektoncd:main Jul 26, 2021

pritidesai mentioned this pull request Aug 9, 2021

TEP-0069: Support retries for custom task in a pipeline - design. #491

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TEP-0069: Support retries for custom task in a pipeline. #441

TEP-0069: Support retries for custom task in a pipeline. #441

ScrapCodes commented May 31, 2021 •

edited

Loading

Tomcli commented Jun 1, 2021

jerop commented Jun 7, 2021

afrittoli commented Jun 7, 2021

ScrapCodes commented Jun 14, 2021 •

edited

Loading

vdemeester left a comment

jerop commented Jul 12, 2021

bobcatfish Jul 19, 2021

ScrapCodes Jul 20, 2021

bobcatfish Jul 20, 2021

ScrapCodes Jul 21, 2021

bobcatfish Jul 22, 2021

ScrapCodes Jul 23, 2021

bobcatfish Jul 23, 2021

ScrapCodes Jul 26, 2021

bobcatfish commented Jul 19, 2021

vdemeester commented Jul 22, 2021

bobcatfish commented Jul 23, 2021

tekton-robot commented Jul 23, 2021

afrittoli commented Jul 26, 2021

TEP-0069: Support retries for custom task in a pipeline. #441

TEP-0069: Support retries for custom task in a pipeline. #441

Conversation

ScrapCodes commented May 31, 2021 • edited Loading

Summary

Tomcli commented Jun 1, 2021

jerop commented Jun 7, 2021

afrittoli commented Jun 7, 2021

ScrapCodes commented Jun 14, 2021 • edited Loading

vdemeester left a comment

Choose a reason for hiding this comment

jerop commented Jul 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bobcatfish commented Jul 19, 2021

vdemeester commented Jul 22, 2021

bobcatfish commented Jul 23, 2021

tekton-robot commented Jul 23, 2021

afrittoli commented Jul 26, 2021

ScrapCodes commented May 31, 2021 •

edited

Loading

ScrapCodes commented Jun 14, 2021 •

edited

Loading