Make it harder to destroy resources #24658

danawillow · 2020-04-14T00:03:45Z

(I'm sorry for not filling out the template exactly, but my thoughts didn't really follow that format).

Destroying resources is too easy

tl;dr: Terraform makes it easy to destroy infrastructure and hard to ensure it sticks around. For production systems, the reverse would be preferable.

Background

Each field in a resource has two options to describe what to do if it changes:

ForceNew: false if it can be updated in place (usually by calling a PUT or PATCH method on the resource API)
ForceNew: true if it cannot be updated in place

If the field cannot be updated in place (it is marked ForceNew), any change to it will destroy and recreate the resource.

Resources are also destroyed if they are removed from the config but still present in the state file.

Problems

This can fail users in several ways:

A field was added to Terraform without update support, and update support is added later. This is a fairly minor issue- the user can just upgrade to the version that has that support. Still, sometimes the person performing the upgrade and the person making the change are not the same person, adding some organizational complexity. Likewise, major version upgrades might take an organization a long time to complete safely, causing them to be stuck at a lower version for longer.
A change was made on the API side (such as a default value changing), which will cause users to have to update their plans to the new value to avoid diffs. Again, this is doable, but it's a scary plan to see with no context (vs something like an error message), especially if it happens at a time where the user modified a different resource and then ran Terraform.
Human error may cause resources to be removed from the config and present in state. https://www.youtube.com/watch?v=ix0Tw8uinWs (the first half) shows an example of two PRs being merged out of order, causing infrastructure to get destroyed. In this case, they didn't quite get the order of their PRs right, so they imported a resource into state before merging the config that included it, and then made a change to other resources in that same config shortly after, which caused Terraform to destroy the newly-imported resource.
Values of interpolated, computed fields could change. See https://coderbook.com/@marcus/prevent-terraform-from-recreating-or-deleting-resource/ for an example.

This behavior becomes devastating in cases where Terraform is being run in a loop for continuous actuation. Although the official recommendation is to never run Terraform without looking at a plan, there are many cases why someone might want to have an auto-approve loop, with plan only being run when a user actually makes a change to their config. For example, users may want to ensure they're detecting and managing drift quickly, and waiting for the next terraform plan to run wouldn't be fast enough.

All of these situations can be avoided when people use Terraform perfectly, making sure they look carefully at all plans before they're applied, and that they understand the correct order to perform non-plan/apply actions (like import). Regardless of the exact reasons why people end up in these situations, though, we've heard from our customers that they'd like more safeguards around accidentally destroying infrastructure, especially pieces that should be long-lived, like databases.

Potential Solutions

The following are a few ideas I have (ones I like and ones I don't), mainly put out here for discussion and consolidation.

Encourage lifecycle.prevent_destroy on everything

This is basically where we're at right now, but it's not great. It requires users to know about the feature at all, and to put it on every single resource (since there's no way to specify it to default to true globally). It has several flaws in its current implementation that make it hard and/or dangerous to use (#16392, #3874, #17599).

Let providers handle it

There's a few options for this:

The provider decides on a per-resource basis whether it can be destroy/recreated, finds out from the SDK whether the destroy was explicit or implicit, and returns an error if it was implicit (this is the proposal in https://github.com/hashicorp/terraform/issues/23330). This requires some decision making on the provider side and is less flexible for our users who expect that behavior on a different set of resources than we have it on.
We remove the ForceNew: true from fields that can't be updated, and add CustomizeDiff behavior to the resource to fail if there is a diff on those fields. This requires no changes to Terraform core/SDK, but requires there to be a special customizediff on basically every resource in the provider. It also means that different providers will behave inconsistently, which isn't a great UX.
Add a field to each resource (or a top-level field in the provider block), "allow_destroy", which would fail the destroy step if set to false. Once again, it's not a consistent experience between providers. This one also would have to be done at apply-time only, meaning users' plans would succeed while the apply fails, which isn't great.

Deprecate lifecycle.prevent_destroy, make that the default behavior, and add lifecycle.allow_recreate

This would make deleting resources something that can only happen by removing it from the config or running terraform destroy, or by opting into an allow_recreate. It would be a breaking change, so have to be rolled out carefully, but it adds safeguards.

This makes it much more explicit that removing a resource from your config will destroy it. It doesn't prevent the human error case of running terraform apply when the resource isn't in the config but is in state, but it at least prevents unintentional destroy/recreates.

This also could be extended to have several different destroy options: in addition to allow_recreate, users could also have abandon (destroy removes from state but does not call Destroy()) or a stricter prevent_all_destroy (which wouldn't allow a destroy/create, a removal from state, or a terraform destroy to succeed).

Add the option to specify default destroy behavior in a `terraform{ }` or `provider{ }` block

This lets users specify what they want their default to be across all resources in their configs, so it doesn't need to be specified on each one. This could be combined with the previous solution.

Add a command-line flag `--allow-destroy`

This would make plans that include destroying a resource fail if the flag is not set. This is similar to making prevent_destroy the default, but with making the destroy allowed on a run of terraform rather than as a property on the resource. This makes it much more explicit what the behavior is when removing a resource from config, and you don't have to worry about comparing with a previous version of state. Users would have to adjust any CI systems to adapt to the new flag, but other than that wouldn't have to make any config changes. However, it may be tedious for users that need to frequently, but not always, replace resources that cannot be updated in-place.

The text was updated successfully, but these errors were encountered:

paultyng · 2020-04-27T19:28:50Z

An additional option here for letting the providers handle it: resources could specify their destroy behavior as part of schema (ie. default to prevention or allowing), basically controlling a default for the lifecycle flag, but preserving the existing lifecycle syntax for overriding it.

erSitzt · 2020-05-12T08:38:37Z

I like @danawillow CLI Flag... default behavior should never destroy anything except stated explicitly

danawillow added the enhancement label Apr 14, 2020

pkolyvas added the lifecycle label Apr 21, 2020

thcp mentioned this issue Jun 30, 2020

Resource block to protect and control any harmfull action on the infrastructure. #25434

Open

jbardin mentioned this issue Oct 8, 2020

Preventing accidental deletes #26511

Closed

jbardin mentioned this issue Nov 19, 2020

Prevent Terraform from destroying specific resource #26972

Closed

rileykarson mentioned this issue Nov 23, 2020

Add deletion_protection to google_storage_bucket hashicorp/terraform-provider-google#7868

Closed

rileykarson mentioned this issue Jan 29, 2021

Terraform plan permits renaming BigQuery column hashicorp/terraform-provider-google#1144

Closed

rileykarson mentioned this issue Mar 5, 2021

Disk downsize triggers recreate hashicorp/terraform-provider-google#8597

Closed

rileykarson mentioned this issue May 17, 2021

google_pubsub_topic and google_pubsub_subscription -> provide deletion_protection argument hashicorp/terraform-provider-google#9134

Closed

jbardin mentioned this issue Jun 17, 2021

Bug or Feature: Dynamic or Variable-controlled Lifecycle blocks do not work #28972

Closed

PatMyron mentioned this issue Feb 2, 2022

[proposal] Warn google_bigtable_instance without prevent_destroy terraform-linters/tflint-ruleset-google#23

Open

fho mentioned this issue Jun 14, 2022

feat(bunny_storagezone): add a storagezone resource simplesurance/terraform-provider-bunny#64

Merged

rileykarson mentioned this issue Feb 6, 2023

Add deletion_protection argument to google_secret_manager_secret hashicorp/terraform-provider-google#13661

Closed

This was referenced Apr 13, 2023

Unable to 👍 - Allow interpolations in lifecycle attributes #33031

Closed

"Variables may not be used here" for prevent_destroy #22544

Open

crw mentioned this issue Apr 13, 2023

Cannot use interpolations in lifecycle attributes #3116

Closed

flebotany mentioned this issue Apr 21, 2023

terraform lifecycle ignore_changes should not be static or should otherwise support conditional ignore lists #33063

Closed

jbardin mentioned this issue Aug 8, 2023

GCP connector - Allow force_destroy on resources in apply stage #33649

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it harder to destroy resources #24658

Make it harder to destroy resources #24658

danawillow commented Apr 14, 2020 •

edited

Loading

paultyng commented Apr 27, 2020

erSitzt commented May 12, 2020

Make it harder to destroy resources #24658

Make it harder to destroy resources #24658

Comments

danawillow commented Apr 14, 2020 • edited Loading

Destroying resources is too easy

Background

Problems

Potential Solutions

Encourage lifecycle.prevent_destroy on everything

Let providers handle it

Deprecate lifecycle.prevent_destroy, make that the default behavior, and add lifecycle.allow_recreate

Add the option to specify default destroy behavior in a terraform{ } or provider{ } block

Add a command-line flag --allow-destroy

paultyng commented Apr 27, 2020

erSitzt commented May 12, 2020

danawillow commented Apr 14, 2020 •

edited

Loading

Add the option to specify default destroy behavior in a `terraform{ }` or `provider{ }` block

Add a command-line flag `--allow-destroy`