-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEP-0069: Support retries for custom task in a pipeline. #441
Conversation
This TEP can be part of the TEP 02 proposal if needed. /cc @afrittoli |
/kind tep |
/assign @afrittoli |
Thank you @afrittoli and @vdemeester for volunteering to review this TEP. Can you please take a look !! 🙏 |
9a0a8a5
to
57c02d5
Compare
3c1552a
to
71f5f0e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @imjasonh
62d19a4
to
99bdb82
Compare
/assign |
Consider both the user's role (are they a Task author? Catalog Task user? | ||
Cluster Admin? etc...) and experience (what workflows or actions are enhanced | ||
if this problem is solved?). | ||
--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add some use cases here describing why this is needed?
It almost makes sense to me without use cases but the part I am missing is that the way we do this with Pipeline -> TaskRun is that hte Pipeline controls the retries; I'd like to understand why we want the Pipeline to pass responsibility to the Run in this case, and why having a "retry" parameter for cases where it's important that the Custom Task do it's own retrying isn't enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can add.
A pipelineRun will have the information of retry history of a Run, just as it has for a TaskRun. So Pipeline run is responsible for controlling the retry, e.g. the passing of Retries
is FYI only. Actual, retry is triggered by pipelinerun controller, by setting /spec/status
for a Run. However, how exactly the retry is done, is taken care of by the custom task controller. It may not respond at all, and we should be able to handle that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the extra info - another alternative approach could be to have the Pipeline create a new Run every time a retry is needed, do you think there's any possibility that might work? I'm guessing that would have some downsides in that the custom task wouldn't have any control over how the retries work? 🤔
(This reminds me of our other conversations in #422 in that I only recently realized we are re-using the same TaskRun when we retry; part of the motivation of decoupling TaskRuns from Pipelines was that something like a retry of a TaskRun could be accomplished by simply making as many TaskRuns as we wanted, but apparently thats not the route we went - so I'm hoping we can decide conclusively which approach we want to embrace)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A custom task is different from a regular task, retrying can be very different from just retrying a Pod. So, a custom task may need to examine the state between two retries and optimise.
e.g. PipelineLoop controller can retry only failed pipelines based on certain criteria, instead of retrying everything incase we created a fresh Run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a bit more about what kind of criteria the PipelineLoop controller might be looking at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A PipelineLoop controller maintains the list of failed pipeline runs for each Run.
e.g. for a particular run - 2 out of 5 loops were not successful and as a result, it updates its status as Failed.
-
We give it a signal to retry, it can simply retry the two failed PipelineRun loops and not everything.
-
We create a fresh Run to retry, it has no information of what happened before. So, it starts as if fresh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh okay that makes a lot of sense - if you're willing to add this to the use cases description i think that would be great context to record.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, updated the alternatives section too!
/assign |
8318ca8
to
755eba6
Compare
/assign |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bobcatfish, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
I need feedback on the proposal, if it looks good then I can work on demo and other design aspects.
Summary
A pipeline task can be configured with a
retries
count, this iscurrently only supported for
TaskRun
s and notRun
s (i.e. custom tasks).This TEP is about, a pipeline task can be configured with a
retries
countfor Custom tasks.
Also, a
PipelineRun
already manages a retry for regular taskby updating it's status. However, for custom task, a tekton owned controller
can signal a custom task controller, to retry. A custom task controller may
optionally support it.