Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backend] Terminate Behavior Regression between Workflow Controllers 2.x vs 3.x (HEAD) #7963

Closed
aabbccddeeffgghhii1438 opened this issue Jun 29, 2022 · 7 comments
Labels
area/backend kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label.

Comments

@aabbccddeeffgghhii1438
Copy link

Environment

  • How did you deploy Kubeflow Pipelines (KFP)?
  • KFP version:
    1.8.2
  • KFP SDK version:
    1.8.2

Steps to reproduce

Currently, Kubeflow's terminate function works by setting activeDeadlineSeconds to 0. However, argo's controller loop only checks this for Pending pods, so a long running pod must run to completion before terminating. This has since been fixed on Argo's workflow controller with commit. However, Argo 3.3 is fundamentally incompatible with Kubeflow as updates to workflows are now done by WorkflowTaskResult instead of patching pods.

Expected result

Ideally, clicking terminate instantly terminates the workflow and any running pod. This behavior is honored by the above commit. However, due to other changes there is no feasible route to upgrade ATM. If Kubeflow wishes to stay on 3.2.3, perhaps it should be considered to cherrypick out the patch?

Materials and Reference

argoproj/argo-workflows#8065


Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@aabbccddeeffgghhii1438
Copy link
Author

aabbccddeeffgghhii1438 commented Jun 29, 2022

Do note that the expected behavior was true for Argo < 3, as the same function would fall through. I.e https://github.com/argoproj/argo-workflows/blob/v2.12.13/workflow/controller/exec_control.go#L89.

@aabbccddeeffgghhii1438 aabbccddeeffgghhii1438 changed the title [backend] Terminate Behavior [backend] Terminate Behavior Regression between Workflow Controllers 2.x vs 3.x (HEAD) Jun 29, 2022
@Linchin Linchin self-assigned this Jun 30, 2022
@Linchin
Copy link
Contributor

Linchin commented Jun 30, 2022

This will be resolved after we upgrade argo to version 3.3.

@aabbccddeeffgghhii1438
Copy link
Author

Is there a timeline for this? For this upgrade, is it for 2.0, or 1.8.x?

@ryansteakley
Copy link
Contributor

Additionally some of the pre-existing terminate commands do not work due to a change in the way they killed containers and the signals that were sent. Looks like they fixed this in argoproj/argo-workflows#8687 what is the communities path-forward to regain this functionality?

@Sharathmk99
Copy link

@Linchin When can we expect version upgrade to 3.3?

@Linchin Linchin removed their assignment Aug 28, 2023
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Apr 29, 2024
Copy link

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label.
Projects
Status: Closed
Development

No branches or pull requests

4 participants