Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulty destroying #35

Open
schue opened this issue Apr 9, 2021 · 9 comments
Open

Difficulty destroying #35

schue opened this issue Apr 9, 2021 · 9 comments

Comments

@schue
Copy link

schue commented Apr 9, 2021

I'm using your tool with https://github.com/roboll/helmfile and can successfully start up my terraform from a GIT repo. When I try to update or destroy, however, I seem to be having some problems. It doesn't seem to check GIT sources for updates if I do an apply and the runner pod seems to stay around if I do a destroy. I'm running the service in its own namespace and trying to delete that seems to hang in a "terminating" state. The result is that I basically have to recreate the cluster to get it to see updates. Any thoughts?

@schue
Copy link
Author

schue commented Apr 9, 2021

I did try editing the config map and changing action to "abort" makes the runner shut down but switching back to "apply" doesn't seem to do anything.

@isaaguilar
Copy link
Collaborator

Helmfile is a great tool, I'm not sure it's related to the problem. I'm going to assume this would also happen when running helm delete on the terraform resource as well.

As far as the behavior you described, I'd like to clarify to get a clear picture. So...

  1. is helmfile apply
  2. a terraform-apply job/pod gets triggered
  3. the "apply" pod is still running when you run a destroy

Questions for you:

  1. Does a destroy pod start?
  2. Does the terraform resource in k8s get deleted?
  3. i should have started with this question, what are the applyOnCreate, applyOnUpdate, applyOnDelete, and ignoreDelete options set as?

@isaaguilar
Copy link
Collaborator

Also, what version of tfo are you running?

@schue
Copy link
Author

schue commented Apr 9, 2021

It does run a destroy pod and it runs to completion and stays there as "complete". Destroying leaves the runner around and doing a "kubectl delete pod" respawns it without a new config. The apply settings are:

applyOnCreate: true
applyOnUpdate: true
applyOnDelete: true
ignoreDelete: false

@schue
Copy link
Author

schue commented Apr 9, 2021

This on K3D and K3S on Linux by the way.

@isaaguilar
Copy link
Collaborator

Does kubectl get terraform still have the resource there?

Just in case you missed my previous comment, what version of terraform-operator are you running?

@schue
Copy link
Author

schue commented Apr 12, 2021

After a destroy? It does not immediately go away but does after a bit. The pod remains running.

I'm running latest 0.3.8.

My terraform file originally used SSH agent but now I'm migrating it to use private key secrets and a lot of my sessions end with a stuck terraform waiting for an SSH agent that isn't going to show up. Does your tool rely on terraform runs always finishing?

@schue
Copy link
Author

schue commented Apr 13, 2021

I finished updating my terraform to not use local subdirectories. I can confirm that things are much more well behaved if apply runs to completion. Trying to destroy a stuck terraform seems problematic.

@isaaguilar
Copy link
Collaborator

Thank you for keeping this ticket updated with your findings. I appreciate the feedback.

Does your tool rely on terraform runs always finishing?

There is an "soft" order of running apply and delete. In general, it is important to wait until the terraform apply completes before running destroy. This is especially important when the user is not using a “locking” backend.

If the user is using a “locking” backend, when the user deletes the terraform resource while apply is still running, the terraform destroy command will be blocked from running until it can obtain the backend lock. In essence, the terraform apply must release the lock. It is not up to this project, terraform-operator, to handle locks.

Now there is a caveat that both terraform apply and terraform destroy only retry 10 times to get started. 10 is an arbitrary number in run.sh. I’m wondering if the destroy pod gets stuck in this loop in your case. In any case, I don’t think that the “destroy” pod should remain running so I will look into that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants