-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retries for Custom Tasks #5218
Comments
The reason I stepped back on the pipeline run controller managing retries is, it gives more freedom to the custom tasks how exactly they want to approach retries. In reality, custom tasks are very different from each other for some - custom tasks retires does not mean anything at all, and for some it is similar to how the task runs handle retries. Also, so far we did not force update the status of a Run, once we get into managing retries, we will have to update it. Lastly, in my implementation of retries here managed by PipelineRun, it was rather complex and had a need to synchronize with custom task controller on each retry (this might impact performance not just timing issues when scaling). Lastly, this approach requires updating existing custom task controllers. Side question, Can custom task controllers have replicas? |
Thanks @ScrapCodes for the discussion!
I think it would be helpful to list out a few examples of custom tasks and how they would work if the PipelineRun controller handled retries. For example, it doesn't make sense to have retries for the wait task, so I doubt anyone would write a pipeline with retries for the wait task, but even if they did the wait task doesn't fail, so it wouldn't get retried. Are there custom tasks where having the PipelineRun retry the task would result in undesired behavior?
One other option is to have the PipelineRun controller create a new Run for each retry once the initial attempt has failed. Also it's worth noting that we already use this pattern for TaskRuns (but perhaps we should change that).
Could you elaborate a bit? What do you mean here that the PR controller had to synchronize with the custom task controller?
I agree this is an unfortunate downside. However,
Sure, I don't see why not! You could run them via a k8s deployment with multiple replicas. Curious if you see some alternate solution here related to replicas? |
Plan is to first revisit how we do TaskRun retries (tracked in #5248). After this, it may be easier to explore implementing CustomRun retries in a similar way. |
I think we can close this, since we're going to make TaskRun retries more like Run retries instead of the other way around. The only remaining AI here would be make Run retries easier to implement via an SDK or similar but this is tracked in #5157. |
Spin-off from #4313.
Right now, some custom tasks support retries implemented in the custom task controller logic. Pipeline tasks pass their desired number of retries to the custom run.
I'm wondering if it would make sense to have the PipelineRun controller responsible for retries of custom runs. This would make custom tasks easier to author, and make them more consistent with how we handle TaskRuns. It would involve removing spec.retries from customRun, and the PR controller would be fully responsible for updating retriesstatus when a run has failed. This solution was discussed in tektoncd/community#491 (comment), and it looks like it was rejected to avoid having the PipelineRun controller be responsible for updating Run statuses, but that's a pattern we already use with TaskRuns here. In this solution, a run would not be able to retry on its own.
An alternative option is to provide some SDK for retries for custom task controllers. This would make authoring custom tasks easier, but not as easy as having it implemented in the PR controller.
The text was updated successfully, but these errors were encountered: