Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods do not get deleted on success/failure #2173

Closed
3 of 4 tasks
nirav24 opened this issue Feb 5, 2020 · 17 comments
Closed
3 of 4 tasks

Pods do not get deleted on success/failure #2173

nirav24 opened this issue Feb 5, 2020 · 17 comments
Labels

Comments

@nirav24
Copy link
Contributor

nirav24 commented Feb 5, 2020

Checklist:

  • I've included the version.
  • I've included reproduction steps.
  • I've included the workflow YAML.
  • I've included the logs.

What happened:
I am trying to implement GC strategy to delete pods automatically after workflow is done. However, podsdon't get deleted. I've tried with all strategy options (see), but result is the same.

What you expected to happen:

Workflows should be deleted after Success or failure.

How to reproduce it (as minimally and precisely as possible):
Submit following workflow with argo CLI.

Anything else we need to know?:

Workflow.yaml

apiVersion: argoproj.io/v1alpha1
kind: Workflow                  # new type of k8s spec
metadata:
  generateName: nirav-test-    # name of the workflow spec
spec:
  podGC:
    strategy: OnPodCompletion
  entrypoint: pod-gc-strategy          # invoke the whalesay template
  templates:
  - name: pod-gc-strategy
    steps:
    - - name: fail
        template: fail
      - name: succeed
        template: succeed

  - name: fail
    container:
      image: alpine:3.7
      command: [sh, -c]
      args: ["exit 1"]

  - name: succeed
    container:
      image: alpine:3.7
      command: [sh, -c]
      args: ["exit 0"]

Environment:

  • Argo version:
argo: v2.4.2
  BuildDate: 2019-10-21T18:39:52Z
  GitCommit: 675c66267f0c916de0f233d8101aa0646acb46d4
  GitTreeState: clean
  GitTag: v2.4.2
  GoVersion: go1.11.5
  Compiler: gc
  Platform: darwin/amd64
  • Kubernetes version :
clientVersion:
  buildDate: "2019-10-15T12:11:03Z"
  compiler: gc
  gitCommit: 211047e9a1922595eaa3a1127ed365e9299a6c23
  gitTreeState: clean
  gitVersion: v1.14.8
  goVersion: go1.12.10
  major: "1"
  minor: "14"
  platform: darwin/amd64
serverVersion:
  buildDate: "2019-11-07T19:12:22Z"
  compiler: gc
  gitCommit: 56d89863d1033f9668ddd6e1c1aea81cd846ef88
  gitTreeState: clean
  gitVersion: v1.13.11-gke.14
  goVersion: go1.12.11b4
  major: "1"
  minor: 13+
  platform: linux/amd64
@alexec alexec self-assigned this Feb 5, 2020
@alexec alexec added this to the v2.5 milestone Feb 5, 2020
@alexec
Copy link
Contributor

alexec commented Feb 5, 2020

I feel like I have also seen this unexpected behaviour.

@alexec
Copy link
Contributor

alexec commented Feb 5, 2020

Introduced in #1234

@alexec
Copy link
Contributor

alexec commented Feb 5, 2020

Unable to repro on master (v2.5)

@nirav24 this is a new feature. Can I ask you to upgrade to v2.4.3 and see if the issue still occurs? This may (very low probability) fix this issue. If not, I'd like to ask for more data. Specifically, can you include the output of

kubectl get wf ${WORKFLOW_NAME}
kubectl get pod -l workflows.argoproj.io/workflow=${WORKFLOW_NAME}

@jackywu do you want to chime in on this feature as you authored it originally?

I wonder if it is possible for a workflow to complete before a pod is ready to be GCd? I'd noted you can delete a workflows, but PodGC does not happen - the pod can become orphaned. Additionally, if the workflow complete before the pod, then I'm not sure it'll get GC.

I wonder if we need to tweak the implementation:

  • Whenever we start a pod - we add an annotation to indicate the GC to use.
  • We have a watch of pods that listens to workflow pods. When they complete - delete them.
  • Bonus 1: what about a global setting for PodGC?
  • Bonus 2: do we need to set-up the owner reference? Pods can get orphaned on workflow deletion. We don't currently support cascade deletion AFAIK. @jessesuen is this intentional?

@alexec alexec modified the milestones: v2.5, v2.6 Feb 5, 2020
alexec added a commit to alexec/argo-workflows that referenced this issue Feb 5, 2020
@alexec
Copy link
Contributor

alexec commented Feb 5, 2020

I've linked a PoC fix for my hypothetical issue with this.

@nirav24
Copy link
Contributor Author

nirav24 commented Feb 6, 2020

Thanks @alexec for the quick reply. I will update argo and try it again. Also, I will update the logs as well.

@alexec alexec removed this from the v2.6 milestone Feb 6, 2020
@nirav24
Copy link
Contributor Author

nirav24 commented Feb 7, 2020

I am including output requested below

$ kubectl get wf nirav-test-wfrdn -n test-nirav
NAME               AGE
nirav-test-wfrdn   2m
kubectl get pod -l workflows.argoproj.io/workflow=nirav-test-wfrdn -n test-nirav
NAME                          READY   STATUS      RESTARTS   AGE
nirav-test-wfrdn-1910956530   0/2     Error       0          2m46s
nirav-test-wfrdn-2175938526   0/2     Completed   0          2m46s

upgrading v2.4.3 gave the same behaviour.

@alexec
Copy link
Contributor

alexec commented Feb 7, 2020

Thank you @nirav24 - great! @jackywu and @jessesuen what do you think to my comments above please?

@alexec
Copy link
Contributor

alexec commented Feb 7, 2020

@nirav24 I've come back to this and I need to clarify something really important.

PodGC only applies to pods, not to workflows - there is no GC for workflows. It is actually not clear from the title of this issue whether or not you are referring to pods, to workflows, or to both.

@nirav24
Copy link
Contributor Author

nirav24 commented Feb 7, 2020

@alexec I am referring to both. I can see pods with kubectl get pods and workflows with argo list

@jackywu
Copy link
Contributor

jackywu commented Feb 7, 2020

@nirav24 @alexec yes, there is no GC for workflow. and I'm not sure whether GC for workflow is necessary .

@nirav24 nirav24 changed the title Workflow does not get deleted on success/failure Pods do not get deleted on success/failure Feb 7, 2020
@nirav24
Copy link
Contributor Author

nirav24 commented Feb 7, 2020

Sorry for the wrong title. I've changed it.

@alexec
Copy link
Contributor

alexec commented Feb 7, 2020

Ok. You're running an older version. Have you tried in v2.4.3 or v2.5.0-rc9 (if you're on a test environment - but this is a big change!)

@nirav24
Copy link
Contributor Author

nirav24 commented Feb 7, 2020

I've tried with v2.4.3. I can try with v2.5.0-rc9 as well.

Running v2.4.3 did not delete pod as well.

@nirav24
Copy link
Contributor Author

nirav24 commented Feb 21, 2020

@alexec sorry for super delay on this. I've tried with v2.5.0-rc9. Argo deletes pod based on given strategy. Thanks for all your help.

I am not sure to keep this issue open since it did not work with v2.4.3 or it was not suppose to work.

@alexec
Copy link
Contributor

alexec commented Feb 21, 2020

@jackywu it does sound like there is a bug with this feature - is this something you'd want to own investigating or fixing?

@jackywu
Copy link
Contributor

jackywu commented Feb 21, 2020

@nirav24 @alexec I'm so sorry, I don't have enough time to investigate it right now. @jessesuen , do you have time to check it.

@alexec alexec removed their assignment Mar 16, 2020
@stale
Copy link

stale bot commented Jul 1, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 1, 2020
@stale stale bot closed this as completed Jul 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants