Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move retry Pod deletions out of Server and into Controller for proper separation of duties #12538

Open
agilgur5 opened this issue Jan 17, 2024 · 1 comment · May be fixed by #12734
Open

Move retry Pod deletions out of Server and into Controller for proper separation of duties #12538

agilgur5 opened this issue Jan 17, 2024 · 1 comment · May be fixed by #12734
Assignees
Labels
area/controller Controller issues, panics area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries area/server solution/suggested A solution to the bug has been suggested. Someone needs to implement it. type/feature Feature request type/security Security related

Comments

@agilgur5
Copy link
Member

Summary

Currently, the Server and Controller are architected & intended to be independent, with the Server not being strictly necessary for any operations. Most of the Server's functionality is to be a simple CRUD wrapper where a user could replicate that functionality themselves via kubectl. When the Server has to communicate with the Controller, it typically signals to it by adding a label to a Workflow.

This separation of duties is important to keep consistent and is currently true for all but one case: the retry operation currently has the Server delete Pods of a Workflow, which is something that the Controller should do instead. The Server shouldn't need permissions to delete Pods either as it currently does.

Use Cases

I (and then others) noticed this in #12105 (comment) and #12419 (comment) and were pretty surprised when we saw this.

Removing this functionality from the Server will make it more secure by not having delete pods permissions which the Controller already has.

It will also make it possible to do a retry with just kubectl by adding a label to the Workflow CR, as is intended and as was thought as possible per #12027 (comment).

Implementation details

  1. The Server should only label the Workflow
  2. The Server should no longer need delete pods permissions
  3. The Controller should detect that label as a trigger for the retry
  4. The Controller should perform the Pod deletion and then initiate the retry
  5. The Controller should handle bugs / missing functionality such as that of fix: Clean up pods of fulfilled nodes when workflow manual retry. Fix… #12105 / Completed pods were not all cleaned when workflow is succeesed after manual retry #12028

Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritize the proposals with the most 👍.

@agilgur5 agilgur5 added type/feature Feature request area/controller Controller issues, panics area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries labels Jan 17, 2024
@agilgur5 agilgur5 added the solution/suggested A solution to the bug has been suggested. Someone needs to implement it. label Jan 17, 2024
@terrytangyuan
Copy link
Member

SGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries area/server solution/suggested A solution to the bug has been suggested. Someone needs to implement it. type/feature Feature request type/security Security related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants