Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent_destroy does not prevent destruction when removing a resource from the configuration #17599

Open
samjgalbraith opened this issue Mar 15, 2018 · 53 comments · May be fixed by #33229
Open

Comments

@samjgalbraith
Copy link

samjgalbraith commented Mar 15, 2018

The documentation for prevent_destroy says

This flag provides extra protection against the destruction of a given resource. When this is set to true, any plan that includes a destroy of this resource will return an error message

However, removing or commenting out a resource seems to still destroy it with no warning, even though technically it was a plan that includes the destruction of the resource. What I expected was that the state file would remember that it was created with prevent_destroy, and prevent its destruction even if it was no longer part of the parsed configuration. It would be reasonable to expect that the user would be forced to remove the prevent_destroy flag, terraform apply, then remove it from the configuration if they were absolutely sure they wanted to go ahead.

Terraform Version

Terraform v0.11.3
+ provider.aws v1.11.0

Terraform Configuration Files

locals {
  dynamo_db_mutext_attribute = "LockID"  # The name must be exactly this
}

provider "aws" {
  profile = "test"
  region = "ap-southeast-2"
}

resource "aws_s3_bucket" "test_environment_terraform_backend_storage" {
  region = "ap-southeast-2"
  bucket_prefix = "SOME-BUCKET-NAME_PREFIX-"
  versioning {
    enabled = true
  }
  tags {
    Environment = "test"
  }
  # We explicitly prevent destruction using terraform. Remove this only if you really know what you're doing.
  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_dynamodb_table" "test_environment_terraform_backend_mutex" {
  "attribute" {
    name = "${local.dynamo_db_mutext_attribute}"
    type = "S"
  }
  hash_key = "${local.dynamo_db_mutext_attribute}"
  name = "test_environment_terraform_backend_mutex"
  read_capacity = 5
  write_capacity = 5
  # We explicitly prevent destruction using terraform. Remove this only if you really know what you're doing.
  lifecycle {
    prevent_destroy = true
  }
}

Expected Behavior

Resource is prevented from being destroyed because it was created with lifecycle { prevent_destroy = true }

Actual Behavior

Resource with prevent_destroy set to true is destroyed when commenting out or removing from configuration.

Steps to Reproduce

  1. Enter the above Terraform configuration into a new module.
  2. terraform apply and say yes.
  3. Now, comment out the s3 bucket resource.
  4. terraform apply again, and say yes

Additional Context

References

@apparentlymart
Copy link
Contributor

Hi @theflyingnerd! Thanks for pointing this out.

Currently removing a resource from configuration is treated as intent to remove the resource, since that's an explicit action on your part rather than it happening as a part of some other action, such as resource replacement or a whole-configuration destroy.

However, your idea of retaining this in the state and requiring multiple steps to actually destroy is an interesting one:

  1. Leave resource in config but remove prevent_destroy
  2. Run terraform apply to update the state to no longer have that flag set.
  3. Remove the resource from configuration entirely.
  4. Run terraform apply again to actually destroy it.

I'd like to think about this some more because while this does seem safer it also feels like this flow could be quite frustrating if e.g. you've already committed some reorganization to your version control that includes removing a resource and then Terraform refuses to apply it until you revert and do steps 1 and 2 from the above.

In the mean time, we should at least update the documentation to be more explicit about how prevent_destroy behaves in this scenario.

@samjgalbraith
Copy link
Author

samjgalbraith commented Mar 16, 2018

Great, thanks for your quick and thorough reply @apparentlymart . I agree that a documentation update would be sufficient for now and the proposed functionality could be left open for debate. My comments about this process were a mixture of how I might expect it to work if I didn't read any more documentation as well as a proposal. It's influenced by the idea of AWS instance termination protection. This protection must be explicitly removed first before the instance may be terminated, even if the instance is explicitly terminated by id.

@a-nldisr
Copy link

Please see: #16392

@dguisinger
Copy link

Any further thoughts on this?
I used Terraform to setup build pipelines including AWS CodeCommit for storage of code.
A mistake in terraform will wipe out the entire contents of the Git repo. I too want a way to protect resources in case they get removed from a script.

@tdmalone
Copy link

tdmalone commented Jul 2, 2018

Potential duplicate of #3468

@roidelapluie
Copy link
Contributor

Strange that 17599 closes 3468 and not the opposite but that is still a very wanted feature.

@JustinGrote
Copy link

JustinGrote commented Jan 5, 2019

This was (is) a major risk factor of accidental deletion and was unacceptable for us using terraform in production.

This is our workaround for this that runs in our Azure DevOps pipeline on every commit before application, I'll see if I can get the code (Powershell Pester Test) approved to be shared.

  1. Run Terraform plan and get list of resources to-be-destroyed
  2. Review the previous state (retrieved from previous commit) for any to-be-destroyed resources that had prevent_destroy set and if so, fail the test, and stop the pipeline.
  3. To fix, two commits are required, one to remove prevent_destroy (which doesn't trigger the destroy resource and thus doesn't trigger failing the test), and then one to remove the resource from the config.

In addition to this all production changes have an approval step that formats the terraform apply state very clearly and has to be approved by all the team members.

@Zeal0us
Copy link

Zeal0us commented Feb 9, 2019

I currently don't intend to manage certain things with Terraform due to this. As part of deployment of new code (i.e. lambda, ECS containers, api gateway, etc...) my Terraform files are built dynamically, so although it seems like removing it from the .tf files is explicit approval, it could just as easily be a coding error.

As such, managing any type of Dynamo table or RDS or something is out of the question, and those will just have to be managed manually. While I'm kind of ok with this, it certainly would be nice to just manage the entire application and still be able to sleep.

@simlu
Copy link

simlu commented Mar 5, 2019

Any update on this? It seems like quite an important feature...

@JustinGrote
Copy link

JustinGrote commented Mar 5, 2019 via email

@evmin02
Copy link

evmin02 commented Apr 17, 2019

+2

@CodeSwimBikeRun
Copy link

CodeSwimBikeRun commented May 6, 2019

omg. I'm looking at how to just prevent helm provider from destroying a chart every time I deploy. I came here to see this amazing prevent_destroy trait that I want to try out, but judging from this, it doesnt even work?

You are willing to wipe out a whole database created with prevent_destroy flag on? What the heck?

@tdmalone
Copy link

tdmalone commented May 6, 2019

@CodeSwimBikeRun Your comment before the edit was more correct - it works, but not if you remove the resource from your .tf file, because by doing so you're then also no longer setting prevent_destroy. This issue is about potentially storing that in the state so that it the instruction can potentially still remain around (I'm not sure if this is particularly what you were after or not).

You can, however, set flags on some resources (such as AWS EC2 and RDS instances) to prevent deletion at the AWS API level, which will help you no matter what.

Not sure the language is warranted.

@nbering
Copy link

nbering commented May 6, 2019

@CodeSwimBikeRun This issue is about a specific case, in which the configuration block for the resource no longer exists. In which case, Terraform is following instructions in the strict sense that the resource is no longer configured with prevent_destroy.

Terraform interprets the configuration files as your desired state, and the state file (and/or refreshed in-memory state) as the current state. Destruction becomes a complex edge-case in this way because the absence of a resource implicitly means that you desired it to be destroyed, but in practice this may not always be true. There are other cases to watch out for which are similar, but are consequences of the basic design. For example, #13549.

The feature works - in the general sense - and is worth using. But it's still pretty important to examine the plan before applying changes. Declarative configuration systems in general are very powerful, but also come with an inherent risk of accidentally automating failure (see: Automation: Enabling Failure at Scale for an example outside Terraform). I personally find that Terraform's plan feature is actually superior to some other declarative configuration tools I've used in that regard. It will tell you, before changing anything at all, that it wants to destroy a resource.

@rusik69
Copy link

rusik69 commented Sep 30, 2019

we had part of our production deleted because of this :(

@beanaroo
Copy link

Storing last set prevent_destroy in state would be beneficial. Especially when generating tf plans programmatically.

@bionicles
Copy link

seems like a no-brainer safety-critical feature ... prevent destroy should work

@aaronmell
Copy link

Would be nice to have a feature added like Cloudformations Retain. If the resource is removed from the file, CF doesn't try to remove the resource, and just removes it from the state file.

@penfold
Copy link

penfold commented Jan 26, 2022

I'm looking at configuring Azure FrontDoor which means that I'll need to reference Azure DNS zone. As I currently use the same domain (different sub-domains) for live and test environments, then I run the risk of destroying live the DNS zone when I clean down a test environment.

Is there any thought on implementing the change that @apparentlymart suggested at the start of the thread?

@nbering
Copy link

nbering commented Feb 15, 2022

@penfold Your case sounds like terraform_remote_state should be a suitable workaround. Or possibly a better implementation all-around depending on your constraints.

https://www.terraform.io/language/state/remote-state-data

You could put your shared resources in a separate state project to protect the dependant Terraform runs from clobbering it's resources, but still benefit from referencing values from it's resources across environments.

@HariSekhon
Copy link

HariSekhon commented Mar 21, 2022

This is still such a huge problem for everything containing state/data - from Databases to GitHub repos.

Terraform is simply not production-grade for these use cases until it records the prevent_destroy in the state file.

@aton-elvisb
Copy link

Hi @apparentlymart, as an alternative to the following workflow to remove a protected resource:

  1. Leave resource in config but remove prevent_destroy
  2. Run terraform apply to update the state to no longer have that flag set.
  3. Remove the resource from configuration entirely.
  4. Run terraform apply again to actually destroy it.

we could also think about an alternative workflow:

  1. Remove the resource from configuration entirely.
  2. Run terraform state allow-destroy <resoure name> (a new command)
  3. Run terraform apply to actually destroy it.

The order of steps 1 and 2 is not relevant. The step 2 removes the destroy prevention from the state.

If the the step 1 is not executed, so that the resource remains in the configuration entirely, but the step 2 is executed, then the terraform apply at step 3 simply reintroduce the protection.

@m98
Copy link

m98 commented Jul 4, 2022

I ended up resolving this by just protecting my resources with iam on aws. For me I wanted to prevent our user pool from being deleted.

resource "aws_iam_policy" "prevent_user_pool_deletion" {
  name = "prevent_user_pool_deletion"

  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
        "Effect": "Deny",
        "Action": [
          "cognito-idp:DeleteUserPool",
          "cognito-idp:DeleteUserPoolClient",
          "cognito-idp:DeleteUserPoolDomain"
        ],
        "Resource": [
            "arn:aws:cognito-idp:${var.region}:${local.account_number}:userpool/eu-west-3_xxxx"
        ]
    }
  ]
}
EOF
}

resource "aws_iam_group_policy_attachment" "attach" {
  group      = "admin"
  policy_arn = aws_iam_policy.prevent_user_pool_deletion.arn
}

With this in place I have less fear of accidentally deleting our user pool.

I think the problem with this solution is that if this policy attachment is accidentally getting removed from the .tf config, then there is a chance of destroying the Cognito User Pool resource. It's very similar to the prevent_destory lifecycle, but with more code!

I think another way is to limit the IAM user that Terraforms itself is using. You can add this policy to the user:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "NotAction": [
        "cognito-idp:AdminDeleteUser",
        "cognito-idp:DeleteGroup",
        "cognito-idp:AdminDeleteUserAttributes",
        "cognito-idp:DeleteUserPool",
        "cognito-idp:DeleteIdentityProvider",
        "cognito-idp:DeleteResourceServer",
        "cognito-idp:DeleteUserPoolClient",
        "cognito-identity:UnlinkDeveloperIdentity",
        "cognito-idp:DeleteUserPoolDomain",
        "cognito-identity:DeleteIdentityPool",
        "cognito-idp:DeleteUserAttributes",
        "cognito-identity:DeleteIdentities",
        "cognito-identity:UnlinkIdentity",
        "cognito-idp:DeleteUser"
      ],
      "Resource": ["*"]
    }
  ]
}

But anyway, I think Terraform itself should not delete any user pool that has at least one user in it. It does the same for the S3 bucket that can't get destroyed if there is an object in it.

@lpil
Copy link

lpil commented Jul 19, 2022

Hi gang. We got bit by this yesterday. Is there a workaround yet?

@koreyGambill
Copy link

I'm trying to automate a CI/CD solution for terraform, but am deeply concerned with auto-applying changes on merges to main because of this issue. If someone doesn't look over a plan properly, they could cause a pretty drastic outage, and we could lose a lot of data. I suppose my real concern is managing anything with terraform that contains persistent data, and doesn't have a prevent deletion flag available to be set.

The best solution (workaround) I see in these comments is checking previous terraform state before doing a deploy as #17599 (comment) suggested, or perhaps not being very permissive with terraform's permissions through IAM. But I'd much rather a first class solution.

There are multiple good (and seemingly easy) solutions that could be implemented here. I liked the Tombstone idea, or doing something like a Cloudformation Retain, or ensuring that the lifecycle is negated in the previous apply, or another terraform apply flag like terraform apply --allow-destroy that can destroy resources with that lifecycle set. I think we'd all love to see a native, intuitive way to prevent an accident from wiping out data.

Is HashiCorp not worried about this issue?

@HariSekhon
Copy link

HariSekhon commented Jul 20, 2022

@koreyGambill surprisingly dangerous, isn't it?

I've automated Terraform CI/CD for 2 companies in 2 different ways for you to consider:

  1. auto-applies on merge to main using my GitHub Actions reusable workflow, the plan is pasted into the Pull Request ticket, but once you merge that PR, that's it, boom. This is probably the one that scares you more.
  2. Jenkins runs the Plan from the main branch and then uses an interactive Approval before doing the Apply. This uses a saved plan so you know exactly what Terraform is going to apply if you approve the Jenkins pipeline final stage and it only applies that.

Even with this 2nd stronger option, there is still always the possibility of human error that somebody will approve without checking properly and it'll destroy something critical.

For such critical data resources, you might have to keep them outside of Terraform until this issue is solved.

@Harrisonbro
Copy link

Just adding a +1 to this. I'm really struggling to recommend Terraform to our clients for this reason.

@simonweil
Copy link

@Harrisonbro terraform is not perfect, but what is the better alternative that has no downsides of it's own?

@Harrisonbro
Copy link

@Harrisonbro terraform is not perfect, but what is the better alternative that has no downsides of it's own?

Nothing has no downsides, of course. We do recommend terraform for stateless resources that can be recreated, but for databases and such we have to recommend clients use other approaches (often just manual config) because they can't accept the risk of a Git commit wiping out data.

@whiskeysierra
Copy link

The way we deal with this in our company is twofold:

  1. Locks - depends on your cloud provider. Azure has management locks which we will put on databases and database servers preventing any delete, not just via terraform.
  2. Lack of permissions - we restrict the identity that terraform assumes to not have the necessary permissions to delete.

@maxb
Copy link

maxb commented Nov 28, 2022

There are undoubtedly many possibly workarounds. Indeed, where I work, we've resorted to patching terraform-provider-github to embed the feature at the provider level specifically for github_repo resources.

However given the outpouring of supportive sentiment, @apparentlymart could you give us an update on your comment from 2018?

@simonweil
Copy link

I use delete protection and final snapshots, this prevents the DB from being deleted by mistake...

@ben-z
Copy link

ben-z commented May 19, 2023

There are undoubtedly many possibly workarounds. Indeed, where I work, we've resorted to patching terraform-provider-github to embed the feature at the provider level specifically for github_repo resources.

@maxb any chance the patch to terraform-provider-github is open source? My org has the same use case and we don't want to reinvent the wheel.

@maxb
Copy link

maxb commented May 19, 2023

It is not (and I no longer work at that employer). However, the modern github_repository resource has a setting archive_on_destroy which can be turned on, which provides a different safety net to accidental repository deletion.

ben-z added a commit to WATonomous/terraform that referenced this issue May 20, 2023
ben-z added a commit to WATonomous/terraform that referenced this issue May 20, 2023
@ben-z
Copy link

ben-z commented May 20, 2023

I made a proof-of-concept (#33229) to implement storing the flag in the state. You can try it using:

git clone https://github.com/WATonomous/terraform.git
git checkout prevent_removal
go install .

and using the prevent_removal lifecycle flag instead of the prevent_destroy lifecycle flag.

@thedewi
Copy link

thedewi commented May 26, 2023

This is very surprising behaviour IMO. I'd have expected the policy to be stored in state and applied in all destroy contexts. Does anyone actually want it to apply only part of the time? Having a second similar policy will add confusion; maybe then the original should be deprecated?

@HariSekhon
Copy link

HariSekhon commented May 31, 2023

I made a proof-of-concept (#33229) to implement the behaviour of the change requested in this issue. You can test it out by:

git clone https://github.com/WATonomous/terraform.git
git checkout prevent_removal
go install .

and using the prevent_removal lifecycle flag instead of the prevent_destroy lifecycle flag.

It would be better for Hashicorp to just fix the behaviour of prevent_destroy to be stored in the terraform state.

ben-z added a commit to WATonomous/terraform that referenced this issue Jul 27, 2023
ben-z added a commit to WATonomous/terraform that referenced this issue Nov 5, 2023
ben-z added a commit to WATonomous/terraform that referenced this issue May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.