Bug: Terraform attempts to delete security groups before dependent EC2 instances #8617

hydroxide · 2016-09-01T23:36:35Z

I have a configuration of EC2 instances belonging to security groups in AWS:

resource "aws_instance" "foo" {
    # some config
    vpc_security_group_ids      = ["${aws_security_group.foo.id}", "${var.vpc-sg}"]
}

resource "aws_security_group" "foo" {
    # some config
    vpc_id = "${var.vpc-id}"
}

When running terraform destroy, Terraform attempts to destroy the security groups until timeout (5 minutes), at which point it prints the following error:

* aws_security_group.foo: DependencyViolation: resource sg-4f189d35 has a dependent object
        status code: 400, request id: 0a06f6a1-0792-42c5-9180-beac33fb9037

Indeed, the instances (which Terraform has not yet attempted to destroy) are dependent on the security group. Given that the configuration for these resources is maintained completely in Terraform, it seems to be some bug with the dependency resolution. I wonder if this may have anything to do with the VPC.

The text was updated successfully, but these errors were encountered:

mengesb · 2016-09-02T20:26:38Z

@hydroxide I don't experience this problem with 0.7.0; do you have any other useful information such as a debug log (gist link) or other information? I use this dependency quite extensively in a lot of plans and haven't run into this unless something is modifying the SGs outside of management.

sebbonnet · 2016-09-09T16:03:23Z

I am facing a similar issue with terraform 0.7.3, when conditionally adding an extra security group.
When the extra security group exists, atttempting to remove it fails due do a dependency violation.

The terraform plan shows that both the extra security group will be deleted and disassociated from the EC2 instances:

- aws_security_group.cassandra_overrides

~ module.aws_instance_zone_a.aws_instance.cassandra_node.0
    vpc_security_group_ids.#:          "2" => "1"
    vpc_security_group_ids.2978748095: "sg-4ccbcf2b" => "sg-4ccbcf2b"
    vpc_security_group_ids.3169002004: "sg-b6e3ebd1" => ""

~ module.aws_instance_zone_a.aws_instance.cassandra_node.1
    vpc_security_group_ids.#:          "2" => "1"
    vpc_security_group_ids.2978748095: "sg-4ccbcf2b" => "sg-4ccbcf2b"
    vpc_security_group_ids.3169002004: "sg-b6e3ebd1" => ""

~ module.aws_instance_zone_b.aws_instance.cassandra_node.0
    vpc_security_group_ids.#:          "2" => "1"
    vpc_security_group_ids.2978748095: "sg-4ccbcf2b" => "sg-4ccbcf2b"
    vpc_security_group_ids.3169002004: "sg-b6e3ebd1" => ""

~ module.aws_instance_zone_b.aws_instance.cassandra_node.1
    vpc_security_group_ids.#:          "2" => "1"
    vpc_security_group_ids.2978748095: "sg-4ccbcf2b" => "sg-4ccbcf2b"
    vpc_security_group_ids.3169002004: "sg-b6e3ebd1" => ""

Yet applying the changes attemps to remove the extra security group before it is removed from the dependent EC2 instance.

aws_security_group.cassandra_overrides: Still destroying... (5m0s elapsed)
Error applying plan:

1 error(s) occurred:

* aws_security_group.cassandra_overrides: DependencyViolation: resource sg-b6e3ebd1 has a dependent object
    status code: 400, request id: b7a55106-963a-4404-aa0b-2653bc70ccf1

Here is a simplified version of my terraform config.
The idea here is to use the allow_all_internal_ips variable to define whether an extra security group should be added

variables.tf

variable security_rule_overrides {
  type = "map"

  default = {
    allow_all_internal_ips = 1
  }
}

security_group.tf

resource "aws_security_group" "cassandra" {
  description = "Cassandra node security group"
  vpc_id      = "${data.terraform_remote_state.vpc.vpc_id}"

  ingress {
    protocol  = "tcp"
    from_port = 9042
    to_port   = 9042

    cidr_blocks = [
      "${data.terraform_remote_state.vpc.vpc_cidr_block}",
    ]
  }
}

resource "aws_security_group" "cassandra_overrides" {
  description = "Cassandra node security group overrides"
  vpc_id      = "${data.terraform_remote_state.vpc.vpc_id}"
  count       = "${lookup(var.security_rule_overrides, "allow_all_internal_ips", 0)}"

  ingress {
    protocol  = "tcp"
    from_port = 9042
    to_port   = 9042

    cidr_blocks = [
      "10.0.0.0/8",
    ]
  }
}

nodes.tf

module "aws_instance_zone_a" {
  source                 = "./zone_instances"
  zone_name              = "a"
  environment            = "${var.environment}"
  vpc_security_group_ids = "${aws_security_group.cassandra.id}, ${join(", ", compact(aws_security_group.cassandra_overrides.*.id))}"
  subnet_id              = "${data.terraform_remote_state.vpc.subnet_id_private_a}"
  instance_type          = "${var.instance_types[var.environment]}"
  instance_count         = "${lookup(var.instances, "${var.environment}_zone_a", 0)}"
  # other config
}

zone_instances/main.tf

resource "aws_instance" "cassandra_node" {
  ami                    = "${var.ami}"
  instance_type          = "${var.instance_type}"
  subnet_id              = "${var.subnet_id}"
  vpc_security_group_ids = ["${compact(split(", ", var.vpc_security_group_ids))}"]
  key_name               = "${var.key_name}"
  count                  = "${var.instance_count}"
  # other config
}

evankroske · 2017-02-21T21:22:20Z

This is happening to me, too. In my case, I'm trying to rename a security group, which requires that Terraform destroy the group and recreate it. However, all instances have to be removed from the group and all references to the group must be removed before it can be destroyed. Terraform doesn't remove instances from the group or remove references to the group before it tries to destroy it. Error message:

* aws_security_group.jump: DependencyViolation: resource sg-9691e2ee has a dependent object
	status code: 400, request id: 4de9318b-fee9-48a4-90dd-46fc6336298b

joestump · 2017-08-30T22:08:38Z

I believe the answer is in explicit dependencies:

Terraform ensures that dependencies are successfully created before a resource is created. During a destroy operation, Terraform ensures that this resource is destroyed before its dependencies.

Emphasis mine. Sadly, this doesn't seem to work for me (though I'm mentioning a few resources and only needing to destroy a couple).

christophetd · 2017-10-26T14:53:58Z

Same issue here with latest version (v0.10.8).

BastianM3 · 2017-11-02T03:26:45Z

Ran into this issue on 0.10.8 as well.

b-dean · 2017-11-13T16:25:15Z

A combination of create before destroy and name_prefix (so the security groups don't have conflicting names) solves this for me:

resource "aws_security_group" "example" {
  name_prefix = "example-"
  // other stuff

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_instance" "example" {
  vpc_security_group_ids = ["${aws_security_group.example.id}"]
  // other stuff
}

Then the new SG gets created, swapped out on the ENI for the EC2 instance and then the old SG can be deleted.

peterhallen · 2017-11-29T15:36:49Z

Thanks @b-dean! This approach worked for me as well.

antgel · 2018-03-23T22:51:05Z

@hydroxide Any chance you can re-open this? It's happening for lots of us, apparently (me included). It's very simple to reproduce, simply remove a security group that's assigned to at least one server. Terraform hangs until its timeout. It's possible to workaround by manually (AWS UI) removing the assignment in parallel, but that wouldn't be fun for a significant number of instances.

ndobbs · 2018-04-09T17:58:48Z

I am currently experiencing this same behavior. Renamed a security group attached to a instance resource; terraform detects I'm no longer using said security group and hangs until it times out in an attempt to delete it. If I remove the security group in aws console or destroy the instance, terraform returns successfully.

vishwakumba · 2018-04-27T12:29:07Z

I am also experiencing a similar issue in Terraform v0.11.7. I am unable to delete existing security groups that are no longer applicable to an EC2 instance dynamically based on a boolean variable for an environment. "Terraform plan" seems to be Ok, but "Terraform apply" seems to be producing an error after 10 mins. I think terraform is trying to delete the security groups first and then detach it from the EC2 instance. It should be the other way round. For now, I am now manually detaching the security groups from the EC2 instance from AWS Console and then running "terraform apply", which seems to work.

shargal · 2018-05-31T11:02:08Z

Exactly the same happens when the security group is in use by ECS service.
Terraform doesn't detach it first, thus unable to destroy the security group.

jcarrothers-sap · 2018-06-28T20:47:49Z

I have just run into this issue as well while writing tests for a custom terraform provider. I wrote a test that, as part of the same step, removed a link from resource A->B and deleted resource B. However, terraform attempted to delete B prior to the update of A finishing.

This behaviour seems very counter intuitive to me. While it probably leads to faster running applys in some (many?) cases, it seems that without a lot more metadata being provided by either the end-user or the providers, terraform would have no way to know when this is safe and thus should default to the safer behaviour of waiting for updates to all resources which depend on a resource scheduled for removal before actually proceeding with that removal.

After some searching around I've found the following issues that I think are either this same issue, or are closely related. In some cases, create_before_destroy successfully works around this issue, but I don't believe it would help the case I've described above.
Similar/identical: hashicorp/terraform-provider-aws#4852
Related: #17614 #532

davivcgarcia · 2018-08-06T21:48:37Z

I'm facing this issue when destroying my cloud lab which is completely deployed using Terraform (VPC, Route Tables, Subnets, EIP, EC2, EBS, Route53). Looks like Terraform is trying to remove Route Table before removing the EC2 instances or Subnets.

afalko · 2018-10-04T02:03:36Z

I just hit this with * provider.aws: version = "~> 1.26" and terraform 0.11.8

namliz · 2018-10-15T12:12:16Z

Also experiencing this on:
Terraform v0.11.8
provider.aws v1.39.0

In my case there are left over network interfaces (not in-use), so terraform hangs trying to delete associated security groups.

nodomain · 2018-10-26T06:57:26Z

+1

made2591 · 2018-10-27T12:18:46Z

+1

danihodovic · 2018-10-30T12:43:33Z

I'm experiencing this issue with Terraform v0.11.8 and provider.AWS v1.41.0.

rastakajakwanna · 2018-11-08T16:11:35Z

Also in

Terraform v0.11.10
+ provider.aws v1.41.0

mihkelparna1 · 2018-11-13T14:59:58Z

Any chance this will be fixed in future releases? It's been an issue for a while now and even lifecycle hooks with 'create_before_destroy' won't work.

apparentlymart · 2018-11-13T17:11:48Z

Hi all,

From reading over the comment thread here it seems like there are a few subtly different problems being described:

The original issue was that during terraform destroy (presumably with a plan to destroy both the instance and the security group) Terraform attempted to destroy the security group first, even though the EC2 instance depends on it. This does seem like a Terraform core bug, since there's nothing the provider could do to help with this.
It seems like some of you, on the other hand, are talking about the situation where both an instance and a security group are already present in state and the security group has been removed from config while the instance remains. In this case, Terraform should be processing the update of the EC2 instance security groups first, before attempting to destroy the security group. This could either be a Terraform Core problem (dependency edges inverted) or it could be an AWS provider bug: I know from previous experience that detaching objects from a security group does not have an immediate effect in EC2, and so there can be a delay before the group becomes deleteable.

In order to understand better what's going on here, it would be helpful if at least one of the participants in this thread for each of those situations could capture a trace log during a failing operation. To do that:

Run terraform apply with the environment variable TF_LOG=trace set.
Capture all of the output (the log output and Terraform's usual CLI output) and paste it into a gist (since log output is too verbose for GitHub comments)
Share the link to the gist here along with an indication of which of the above two bugs you are seeing (or, if you're seeing some other situation with similar symptoms, a description of that situation.)

The log includes detailed information about how Terraform is constructing the graph, which will allow us to see whether there is indeed a bug in the graph construction (dependencies in the wrong order, or missing) or if something more subtle and resource-type-specific is going on here.

Thanks to everyone for sharing descriptions of the problem above, and sorry for the delay in responding here.

rastakajakwanna · 2018-11-13T19:21:36Z

My use case (simplified):

File my_ec2.tf

resource ec2_instance from slightly adjusted terraform ec2_instance module, therefore calling this module locally from modules folder.
- input vpc_security_group_ids = ["${aws_security_group.my_sg.id}"]
resource ec2_security_group "my_sg"

terraform plan creates resources correctly in proper order.
mv my_ec2.tf my_ec2.tf.removed
terraform plan -out remove.plan selects all missing resources successfully and marks them for removal.
terraform apply remove.plan Terraform tries to remove the security group first.

I've checked terraform.tfstate file in order to see if dependencies were correctly detected, however module ec2_cluster identified dependencies are only internal module resources:

            "path": [
                "root",
                "ec2_cluster"
            ],
            "outputs": {},
            "resources": {
                "aws_instance.this_t2": {
                    "type": "aws_instance",
                    "depends_on": [
                        "local.is_t2_instance_type"
                     ],

Security group dependency detected only on another security group referenced in the "my_sg".

My intention is to reuse my_ec2.tf as module later because it is a complete stack of ALB, EC2, EBS, RT53 for, let's say, standalone Apache httpd server.
In case this simplified explanation wouldn't help, I will try what I can do from the above traces. Unfortunately, I have no time available for further debugging, so I cannot promise anything right now.

CyrilDevOps · 2018-11-30T15:39:01Z

I hit the same problem with security group list in a rds vpc_security_group_ids.
I wanted to add an extra group based on a flag to add access to the developper in dev network.
when I put that flag to false, terraform try to destroy the security group before removing it from the rds object. and it failed because of the depency.

rgangopadhya · 2018-12-11T22:30:03Z

+1 -- running into this problem when I have an RDS instance with two security groups, and I would like to remove one. So, @apparentlymart this in essence the same as the latter case you mentioned: terraform tries to delete the security group before removing the security group from the RDS instance.

I am a bit hesitant to add the log output from TF_TRACE -- I need to audit it first to make sure no undesired information is included first (e.g., from connecting to the state S3 bucket, in my case). Ideally there would be some way to grep for exactly the lines that are relevant to seeing if there is a bug in building the dependency graph. I would feel more comfortable providing just that, if it is possible.

apparentlymart · 2018-12-11T22:51:21Z

In the logs you should find some lines containing the string TransitiveReductionTransformer. This will appear once for each graph Terraform built during the operation. I'm particularly interested in the lines that state something like "Graph after step *TransitiveReductionTransformer" (I don't have the exact wording to hand, but grepping for that type name should reduce it down), which will be followed by a list of graph nodes followed by the names of nodes they are connected to (indented).

If you can share the entire list of nodes after that initial log line, that will at least allow me to see whether there is the expected dependency edges between the resources, though if it turns out that there is then we may need additional detail to fully explain it.

If the configuration has other objects in it aside from the RDS instance and security groups, it would help also to know the addresses of the resources in question (so we can easily identify them in the list) and, ideally, the full configuration sources for those resources so we can see how the dependencies between them are declared.

Thanks!

schley2103 · 2019-01-24T23:57:59Z

Terraform v0.11.11 + provider.aws v1.45.0

My use case was to modify my security group and then apply. I'm hitting the same DependencyViolation while trying to destroy the security group. Trace log extract after TransitiveReductionTransformer are at https://gist.github.com/schley2103/5834f2f0b7c590352c2be4f7cb717594.js. It built 5 graphs.

Thanks!

mschuchard · 2019-02-20T13:50:58Z

Thanks @b-dean. This issue is the number one Google result for "terraform aws_security_group create before destroy" to see if that would resolve the problem where Terraform hangs indefinitely when you re-create a security group attached to running instances. Seems like a valid workaround in the interim.

AndreiDumaSys · 2019-05-29T16:24:44Z

I think @b-dean 's solution also works with name, not just with name-prefix.

lfventura · 2019-08-09T01:41:26Z

Also happens with aws_rds_cluster. Terraform 0.12

krestivo-kdinfotech · 2019-08-27T00:10:15Z

Happening to me too on 0.12. If an RDS is using a security group, that group cannot ever be destroyed. It's immortal.

joaquinmoreira · 2019-09-01T00:49:28Z

Happening to me with this config:

terraform -v
Terraform v0.12.7
+ provider.aws v2.25.0

In my use case, I already have an ec2 instance and a security group attached to it; its failing trying to destroy the sg when I made changes to it, without detaching it from the instance first.

B3QL · 2020-03-16T22:37:01Z

@jbardin can you reopen the issue? The bug is still present in:

Terraform v0.12.23
+ provider.aws v2.53.0

Example:

resource "aws_instance" "test" {
  ami           = "ami-077a5b1762a2dde35" # Ubuntu 10.04 Bionic
  instance_type = "t2.micro"
  vpc_security_group_ids = [aws_security_group.test_sg.id]
}

resource "aws_eip" "ip" {
  vpc      = true
  instance = aws_instance.test.id
}

resource "aws_vpc" "main" {
  cidr_block = "172.31.0.0/16"

  tags = {
    Name = "main"
  }
}

resource "aws_security_group" "test_sg" {
  name        = "rules"
  description = "Traffic rules"
  vpc_id      = aws_vpc.main.id

  ingress {
    description = "HTTP"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"

    cidr_blocks = [
      "0.0.0.0/0"
    ]
    ipv6_cidr_blocks = [
      "::/0",
    ]
  }
}

ghost · 2020-03-17T01:50:19Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

stack72 added bug core labels Sep 2, 2016

hydroxide closed this as completed Sep 28, 2016

hydroxide reopened this Mar 31, 2018

tombuildsstuff mentioned this issue May 10, 2018

Deleting route tables or security groups fails when still associated with a subnet hashicorp/terraform-provider-azurerm#114

Closed

jcarrothers-sap mentioned this issue Jul 7, 2018

Incorrect order of execution when deleting resource that participates in dependency, together with the dependency #18408

Closed

apparentlymart added the provider/aws label Nov 7, 2018

lfventura mentioned this issue Aug 9, 2019

Security Group tries to be destroyed before detaching from RDS Cluster hashicorp/terraform-provider-aws#9692

Open

hashibot added the v0.11 Issues (primarily bugs) reported against v0.11 releases label Aug 22, 2019

Chancebair mentioned this issue Aug 23, 2019

Fix Anubis Setup Destroy awslabs/benchmark-ai#768

Merged

robtayl0r mentioned this issue Aug 27, 2019

terraform 0.12 tries to remove busy aws_security_group yet ignoring HTTP 400 hashicorp/terraform-provider-aws#8809

Closed

nitzanm mentioned this issue Oct 24, 2019

Terraform deletes/modifies resource and its dependency simultaneously #23169

Closed

jbardin mentioned this issue Nov 1, 2019

store absolute addresses for resource dependencies in the state #23252

Merged

jbardin closed this as completed in #23252 Nov 8, 2019

ghost locked and limited conversation to collaborators Mar 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Terraform attempts to delete security groups before dependent EC2 instances #8617

Bug: Terraform attempts to delete security groups before dependent EC2 instances #8617

hydroxide commented Sep 1, 2016

mengesb commented Sep 2, 2016

sebbonnet commented Sep 9, 2016

evankroske commented Feb 21, 2017 •

edited

Loading

joestump commented Aug 30, 2017

christophetd commented Oct 26, 2017 •

edited

Loading

BastianM3 commented Nov 2, 2017

b-dean commented Nov 13, 2017

peterhallen commented Nov 29, 2017

antgel commented Mar 23, 2018

ndobbs commented Apr 9, 2018

vishwakumba commented Apr 27, 2018

shargal commented May 31, 2018

jcarrothers-sap commented Jun 28, 2018 •

edited

Loading

davivcgarcia commented Aug 6, 2018

afalko commented Oct 4, 2018

namliz commented Oct 15, 2018 •

edited

Loading

nodomain commented Oct 26, 2018

made2591 commented Oct 27, 2018

danihodovic commented Oct 30, 2018

rastakajakwanna commented Nov 8, 2018

mihkelparna1 commented Nov 13, 2018

apparentlymart commented Nov 13, 2018

rastakajakwanna commented Nov 13, 2018

CyrilDevOps commented Nov 30, 2018

rgangopadhya commented Dec 11, 2018 •

edited

Loading

apparentlymart commented Dec 11, 2018

schley2103 commented Jan 24, 2019

mschuchard commented Feb 20, 2019

AndreiDumaSys commented May 29, 2019

lfventura commented Aug 9, 2019

krestivo-kdinfotech commented Aug 27, 2019

joaquinmoreira commented Sep 1, 2019

B3QL commented Mar 16, 2020

ghost commented Mar 17, 2020

Bug: Terraform attempts to delete security groups before dependent EC2 instances #8617

Bug: Terraform attempts to delete security groups before dependent EC2 instances #8617

Comments

hydroxide commented Sep 1, 2016

mengesb commented Sep 2, 2016

sebbonnet commented Sep 9, 2016

evankroske commented Feb 21, 2017 • edited Loading

joestump commented Aug 30, 2017

christophetd commented Oct 26, 2017 • edited Loading

BastianM3 commented Nov 2, 2017

b-dean commented Nov 13, 2017

peterhallen commented Nov 29, 2017

antgel commented Mar 23, 2018

ndobbs commented Apr 9, 2018

vishwakumba commented Apr 27, 2018

shargal commented May 31, 2018

jcarrothers-sap commented Jun 28, 2018 • edited Loading

davivcgarcia commented Aug 6, 2018

afalko commented Oct 4, 2018

namliz commented Oct 15, 2018 • edited Loading

nodomain commented Oct 26, 2018

made2591 commented Oct 27, 2018

danihodovic commented Oct 30, 2018

rastakajakwanna commented Nov 8, 2018

mihkelparna1 commented Nov 13, 2018

apparentlymart commented Nov 13, 2018

rastakajakwanna commented Nov 13, 2018

CyrilDevOps commented Nov 30, 2018

rgangopadhya commented Dec 11, 2018 • edited Loading

apparentlymart commented Dec 11, 2018

schley2103 commented Jan 24, 2019

mschuchard commented Feb 20, 2019

AndreiDumaSys commented May 29, 2019

lfventura commented Aug 9, 2019

krestivo-kdinfotech commented Aug 27, 2019

joaquinmoreira commented Sep 1, 2019

B3QL commented Mar 16, 2020

ghost commented Mar 17, 2020

evankroske commented Feb 21, 2017 •

edited

Loading

christophetd commented Oct 26, 2017 •

edited

Loading

jcarrothers-sap commented Jun 28, 2018 •

edited

Loading

namliz commented Oct 15, 2018 •

edited

Loading

rgangopadhya commented Dec 11, 2018 •

edited

Loading