-
Notifications
You must be signed in to change notification settings - Fork 118
Question about updating existing cluster vs creating new one in new vcn #182
Comments
In general I've been seeing so many issues that it's hard for me to pin down. I assume that I should be able to update the number of workers in ad's all day long (add or reduce the number) and run terraform and everything should work but that's not the case. For example, after 4 successful runs in a row where I changed the number of worker nodes in each AD between runs, I just went from 1 worker in each of three ad's to 0 in one of them and I got this error. My intent is to have 200 different tfvars files w different settings and copy each in turn, run terraform and not see a single error. Is this reasonable? I won't feel comfortable using oracle cloud/terraform in production until I can run terraform over and over w. different configs w/o a single error. |
Hi @jferr, Firstly, sorry if you've had any slow responses. Terraform stores all state about what it's created in a terraform.tfstate file. Generally this needs to be persisted and used for each run with the same resources so that Terraform can keep track of what it has created (https://www.terraform.io/docs/state/). We notice that this isn't being persisted between runs which at a first guess is likely the cause of many of these issues. Manually deleting and editing things outside of Terraform will likely causes problems. The un-deletable subnet issue happens when a subnet is referenced by another resource and so it is technically unsafe to delete it. This is most likely caused by the manual deleting and editing of things outside of Terraform's control. In short, if the state file and the state of the world is inconsistent for consecutive runs then it's likely Terraform will get confused. This is generally more a property of Terraform itself than the OCI specific implementation. The Terraform refresh command is used to reconcile the state Terraform knows about (via its state file) so the state file is needed for Terraform to do the right thing. Let us know if that helps and please ask if you have any further questions on this. |
Thanks @owainlewis My thought was that refreshing w. each run is what will allow us to run this via a docker container and should also allow terraform to see an accurate view of infrastructure as-is (e.g. even if something is modified via the OCI console terraform will accurately see "as is" and will reconcile). Is this not true? In our case we are a small group and we can manage a single terraform run at a time by only running terraform via a Jenkins job. |
@owainlewis shouldn't this work? I started w. using an s3 backend for terraform but had a number of issues plus we've had issues where at some point terraform starts failing and we need to delete everything from the oracle console (compute/lb/vcn) and start over. I figure refreshing each time should be the most stable though less performant. Stability is the important thing here for us. |
Hi @jferr I think for stability you'll want to
|
Thanks @owainlewis I will try this out. I started that way...with a persistant tfstate file...I tried both locally and amazon s3 backed...but I still had lots of issues and failures which required me to manually delete resources via the console. It seems to me that the most stable should be having terraform read the state w. every run w/o persisting the tfstate between runs...though for some organizations this might not be practical. Can you explain why this wouldn't be the most stable way to go. |
@owainlewis in my testing I often saw that when terraform was refreshing right after I deleted resources via the console (because of the inability to get a successful followup terraform run) I would see references to entities (e.g. subnets) that had already been deleted. Is that the reason why? Is there some bug on Oracle's side where perhaps console changes aren't reflected immediately in the API? We are very uncomfortable w. Oracle Kubernetes Terraform stability at the moment (we are not live yet) and are escalating this via other channels so any info that you can give us is appreciated. |
Hi @jferr To clarify the "un-deletable" resources issue when you destroy a cluster, this is because the resource cannot be safely deleted (i.e someone/something else is using or referencing the resource). This is a deliberate feature.
The docs are helpful here when discussing why we need to persist the terraform.tfstate file.
|
Terraform Version
Latest and greatest from dockerhub hashicorp/terraform
See https://github.com/jferr/oci_tf_kuber/blob/master/dockerfile
OCI Provider Version
2.0.7
Terraform Installer for Kubernetes Version
Latest and greatest. I've got my docker container which runs terraform cloning the repo at docker build time and "git pull -prune" and runtime to get the lastest master.
FYI...I'm not a contributor to the project/a "go" developer so looking at the code is not my preferred first option here. We are a paying customer of Oracle OCI.We are refreshing state w. each run (see https://github.com/jferr/oci_tf_kuber/blob/master/doit.sh).
If I manually delete all compute/load balancers/subnets/vcn's in my compartment then run terraform, it appears to still "see" these deleted entities. For example the screenshot below is partial output from a run immediately after a deletion. It is showing entities in the log which didn't exist in my compartment because I manually deleted them a few minutes before the run
I've been working w. oracle support on many related issues with limited success and with a very slow response time so I am moving here for now. Is this the correct avenue for such questions or is there a better place?
this issue above along w an issue where after a run, at certain times we end up w. an undelet-able subnet makes troubleshooting difficult.
A second question. What does this terraform provider use to determine whether to create a new VCN and create a new cluster or to update an existing cluster in an existing vcn in the compartment?
I'm seeing both things happen. Usually if I create the cluster then re-run terraform (this is done via docker so it's a brand new container doing a refresh of state) it is smart enough to see that it's an existing cluster which matches my desired variables...but sometimes it'll just create a new VCN and a new cluster.
Thanks in advance
The text was updated successfully, but these errors were encountered: