[backend] "Signal command failed: command terminated with exit code 1" when terminating a pipeline #7361
Labels
area/backend
kind/bug
lifecycle/stale
The issue / pull request is stale, any activities remove this label.
Environment
Steps to reproduce
Expected result
In step 3 above, the currently-running pod should immediately terminate.
Materials and Reference
I have been able to determine the following so far:
activeDeadlineSeconds: 0
to the spec of the workflow being terminated. This is happening as expected.sh -c kill -s USR2 $(pidof argoexec)
in themain
container of the running pod. This causes the Argo Workflows controller to log the following error:This appears to be a bug with the Argo Workflows controller -- instead of executing
sh -c kill -s USR2 $(pidof argoexec)
in themain
container, it should execute that command in thewait
container.It appears this bug was introduced in argoproj/argo-workflows#5099, and fixed as part of an unrelated refactor in argoproj/argo-workflows#6022, which is included in Argo Workflows 3.2.0.
The specific buggy code is this block, which iterates over the main containers in a pod and sends the USR2 signal there.
Interestingly enough, I could not find an issue related to this bug in the Argo Workflows project.
Questions:
Thanks so much!
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: