Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Terraform attempts to delete security groups before dependent EC2 instances #8617

Closed
hydroxide opened this issue Sep 1, 2016 · 34 comments · Fixed by #23252
Closed

Bug: Terraform attempts to delete security groups before dependent EC2 instances #8617

hydroxide opened this issue Sep 1, 2016 · 34 comments · Fixed by #23252
Labels
bug core provider/aws v0.11 Issues (primarily bugs) reported against v0.11 releases

Comments

@hydroxide
Copy link

I have a configuration of EC2 instances belonging to security groups in AWS:

resource "aws_instance" "foo" {
    # some config
    vpc_security_group_ids      = ["${aws_security_group.foo.id}", "${var.vpc-sg}"]
}

resource "aws_security_group" "foo" {
    # some config
    vpc_id = "${var.vpc-id}"
}

When running terraform destroy, Terraform attempts to destroy the security groups until timeout (5 minutes), at which point it prints the following error:

* aws_security_group.foo: DependencyViolation: resource sg-4f189d35 has a dependent object
        status code: 400, request id: 0a06f6a1-0792-42c5-9180-beac33fb9037

Indeed, the instances (which Terraform has not yet attempted to destroy) are dependent on the security group. Given that the configuration for these resources is maintained completely in Terraform, it seems to be some bug with the dependency resolution. I wonder if this may have anything to do with the VPC.

@mengesb
Copy link
Contributor

mengesb commented Sep 2, 2016

@hydroxide I don't experience this problem with 0.7.0; do you have any other useful information such as a debug log (gist link) or other information? I use this dependency quite extensively in a lot of plans and haven't run into this unless something is modifying the SGs outside of management.

@sebbonnet
Copy link

I am facing a similar issue with terraform 0.7.3, when conditionally adding an extra security group.
When the extra security group exists, atttempting to remove it fails due do a dependency violation.

The terraform plan shows that both the extra security group will be deleted and disassociated from the EC2 instances:

- aws_security_group.cassandra_overrides

~ module.aws_instance_zone_a.aws_instance.cassandra_node.0
    vpc_security_group_ids.#:          "2" => "1"
    vpc_security_group_ids.2978748095: "sg-4ccbcf2b" => "sg-4ccbcf2b"
    vpc_security_group_ids.3169002004: "sg-b6e3ebd1" => ""

~ module.aws_instance_zone_a.aws_instance.cassandra_node.1
    vpc_security_group_ids.#:          "2" => "1"
    vpc_security_group_ids.2978748095: "sg-4ccbcf2b" => "sg-4ccbcf2b"
    vpc_security_group_ids.3169002004: "sg-b6e3ebd1" => ""

~ module.aws_instance_zone_b.aws_instance.cassandra_node.0
    vpc_security_group_ids.#:          "2" => "1"
    vpc_security_group_ids.2978748095: "sg-4ccbcf2b" => "sg-4ccbcf2b"
    vpc_security_group_ids.3169002004: "sg-b6e3ebd1" => ""

~ module.aws_instance_zone_b.aws_instance.cassandra_node.1
    vpc_security_group_ids.#:          "2" => "1"
    vpc_security_group_ids.2978748095: "sg-4ccbcf2b" => "sg-4ccbcf2b"
    vpc_security_group_ids.3169002004: "sg-b6e3ebd1" => ""

Yet applying the changes attemps to remove the extra security group before it is removed from the dependent EC2 instance.

aws_security_group.cassandra_overrides: Still destroying... (5m0s elapsed)
Error applying plan:

1 error(s) occurred:

* aws_security_group.cassandra_overrides: DependencyViolation: resource sg-b6e3ebd1 has a dependent object
    status code: 400, request id: b7a55106-963a-4404-aa0b-2653bc70ccf1

Here is a simplified version of my terraform config.
The idea here is to use the allow_all_internal_ips variable to define whether an extra security group should be added

variables.tf

variable security_rule_overrides {
  type = "map"

  default = {
    allow_all_internal_ips = 1
  }
}

security_group.tf

resource "aws_security_group" "cassandra" {
  description = "Cassandra node security group"
  vpc_id      = "${data.terraform_remote_state.vpc.vpc_id}"

  ingress {
    protocol  = "tcp"
    from_port = 9042
    to_port   = 9042

    cidr_blocks = [
      "${data.terraform_remote_state.vpc.vpc_cidr_block}",
    ]
  }
}

resource "aws_security_group" "cassandra_overrides" {
  description = "Cassandra node security group overrides"
  vpc_id      = "${data.terraform_remote_state.vpc.vpc_id}"
  count       = "${lookup(var.security_rule_overrides, "allow_all_internal_ips", 0)}"

  ingress {
    protocol  = "tcp"
    from_port = 9042
    to_port   = 9042

    cidr_blocks = [
      "10.0.0.0/8",
    ]
  }
}

nodes.tf

module "aws_instance_zone_a" {
  source                 = "./zone_instances"
  zone_name              = "a"
  environment            = "${var.environment}"
  vpc_security_group_ids = "${aws_security_group.cassandra.id}, ${join(", ", compact(aws_security_group.cassandra_overrides.*.id))}"
  subnet_id              = "${data.terraform_remote_state.vpc.subnet_id_private_a}"
  instance_type          = "${var.instance_types[var.environment]}"
  instance_count         = "${lookup(var.instances, "${var.environment}_zone_a", 0)}"
  # other config
}

zone_instances/main.tf

resource "aws_instance" "cassandra_node" {
  ami                    = "${var.ami}"
  instance_type          = "${var.instance_type}"
  subnet_id              = "${var.subnet_id}"
  vpc_security_group_ids = ["${compact(split(", ", var.vpc_security_group_ids))}"]
  key_name               = "${var.key_name}"
  count                  = "${var.instance_count}"
  # other config
}

@evankroske
Copy link

evankroske commented Feb 21, 2017

This is happening to me, too. In my case, I'm trying to rename a security group, which requires that Terraform destroy the group and recreate it. However, all instances have to be removed from the group and all references to the group must be removed before it can be destroyed. Terraform doesn't remove instances from the group or remove references to the group before it tries to destroy it. Error message:

* aws_security_group.jump: DependencyViolation: resource sg-9691e2ee has a dependent object
	status code: 400, request id: 4de9318b-fee9-48a4-90dd-46fc6336298b

@joestump
Copy link

I believe the answer is in explicit dependencies:

Terraform ensures that dependencies are successfully created before a resource is created. During a destroy operation, Terraform ensures that this resource is destroyed before its dependencies.

Emphasis mine. Sadly, this doesn't seem to work for me (though I'm mentioning a few resources and only needing to destroy a couple).

@christophetd
Copy link

christophetd commented Oct 26, 2017

Same issue here with latest version (v0.10.8).

@BastianM3
Copy link

Ran into this issue on 0.10.8 as well.

@b-dean
Copy link

b-dean commented Nov 13, 2017

A combination of create before destroy and name_prefix (so the security groups don't have conflicting names) solves this for me:

resource "aws_security_group" "example" {
  name_prefix = "example-"
  // other stuff

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_instance" "example" {
  vpc_security_group_ids = ["${aws_security_group.example.id}"]
  // other stuff
}

Then the new SG gets created, swapped out on the ENI for the EC2 instance and then the old SG can be deleted.

@peterhallen
Copy link

Thanks @b-dean! This approach worked for me as well.

@antgel
Copy link

antgel commented Mar 23, 2018

@hydroxide Any chance you can re-open this? It's happening for lots of us, apparently (me included). It's very simple to reproduce, simply remove a security group that's assigned to at least one server. Terraform hangs until its timeout. It's possible to workaround by manually (AWS UI) removing the assignment in parallel, but that wouldn't be fun for a significant number of instances.

@hydroxide hydroxide reopened this Mar 31, 2018
@ndobbs
Copy link

ndobbs commented Apr 9, 2018

I am currently experiencing this same behavior. Renamed a security group attached to a instance resource; terraform detects I'm no longer using said security group and hangs until it times out in an attempt to delete it. If I remove the security group in aws console or destroy the instance, terraform returns successfully.

@vishwakumba
Copy link

I am also experiencing a similar issue in Terraform v0.11.7. I am unable to delete existing security groups that are no longer applicable to an EC2 instance dynamically based on a boolean variable for an environment. "Terraform plan" seems to be Ok, but "Terraform apply" seems to be producing an error after 10 mins. I think terraform is trying to delete the security groups first and then detach it from the EC2 instance. It should be the other way round. For now, I am now manually detaching the security groups from the EC2 instance from AWS Console and then running "terraform apply", which seems to work.

@shargal
Copy link

shargal commented May 31, 2018

Exactly the same happens when the security group is in use by ECS service.
Terraform doesn't detach it first, thus unable to destroy the security group.

@jcarrothers-sap
Copy link

jcarrothers-sap commented Jun 28, 2018

I have just run into this issue as well while writing tests for a custom terraform provider. I wrote a test that, as part of the same step, removed a link from resource A->B and deleted resource B. However, terraform attempted to delete B prior to the update of A finishing.

This behaviour seems very counter intuitive to me. While it probably leads to faster running applys in some (many?) cases, it seems that without a lot more metadata being provided by either the end-user or the providers, terraform would have no way to know when this is safe and thus should default to the safer behaviour of waiting for updates to all resources which depend on a resource scheduled for removal before actually proceeding with that removal.

After some searching around I've found the following issues that I think are either this same issue, or are closely related. In some cases, create_before_destroy successfully works around this issue, but I don't believe it would help the case I've described above.
Similar/identical: hashicorp/terraform-provider-aws#4852
Related: #17614 #532

@davivcgarcia
Copy link

I'm facing this issue when destroying my cloud lab which is completely deployed using Terraform (VPC, Route Tables, Subnets, EIP, EC2, EBS, Route53). Looks like Terraform is trying to remove Route Table before removing the EC2 instances or Subnets.

@afalko
Copy link

afalko commented Oct 4, 2018

I just hit this with * provider.aws: version = "~> 1.26" and terraform 0.11.8

@namliz
Copy link

namliz commented Oct 15, 2018

Also experiencing this on:
Terraform v0.11.8
provider.aws v1.39.0

In my case there are left over network interfaces (not in-use), so terraform hangs trying to delete associated security groups.

@nodomain
Copy link

+1

1 similar comment
@made2591
Copy link

+1

@danihodovic
Copy link

I'm experiencing this issue with Terraform v0.11.8 and provider.AWS v1.41.0.

@rastakajakwanna
Copy link

Also in

Terraform v0.11.10
+ provider.aws v1.41.0

@mihkelparna1
Copy link

Any chance this will be fixed in future releases? It's been an issue for a while now and even lifecycle hooks with 'create_before_destroy' won't work.

@apparentlymart
Copy link
Contributor

Hi all,

From reading over the comment thread here it seems like there are a few subtly different problems being described:

  • The original issue was that during terraform destroy (presumably with a plan to destroy both the instance and the security group) Terraform attempted to destroy the security group first, even though the EC2 instance depends on it. This does seem like a Terraform core bug, since there's nothing the provider could do to help with this.
  • It seems like some of you, on the other hand, are talking about the situation where both an instance and a security group are already present in state and the security group has been removed from config while the instance remains. In this case, Terraform should be processing the update of the EC2 instance security groups first, before attempting to destroy the security group. This could either be a Terraform Core problem (dependency edges inverted) or it could be an AWS provider bug: I know from previous experience that detaching objects from a security group does not have an immediate effect in EC2, and so there can be a delay before the group becomes deleteable.

In order to understand better what's going on here, it would be helpful if at least one of the participants in this thread for each of those situations could capture a trace log during a failing operation. To do that:

  • Run terraform apply with the environment variable TF_LOG=trace set.
  • Capture all of the output (the log output and Terraform's usual CLI output) and paste it into a gist (since log output is too verbose for GitHub comments)
  • Share the link to the gist here along with an indication of which of the above two bugs you are seeing (or, if you're seeing some other situation with similar symptoms, a description of that situation.)

The log includes detailed information about how Terraform is constructing the graph, which will allow us to see whether there is indeed a bug in the graph construction (dependencies in the wrong order, or missing) or if something more subtle and resource-type-specific is going on here.

Thanks to everyone for sharing descriptions of the problem above, and sorry for the delay in responding here.

@rastakajakwanna
Copy link

My use case (simplified):

File my_ec2.tf

  • resource ec2_instance from slightly adjusted terraform ec2_instance module, therefore calling this module locally from modules folder.
    • input vpc_security_group_ids = ["${aws_security_group.my_sg.id}"]
  • resource ec2_security_group "my_sg"

terraform plan creates resources correctly in proper order.
mv my_ec2.tf my_ec2.tf.removed
terraform plan -out remove.plan selects all missing resources successfully and marks them for removal.
terraform apply remove.plan Terraform tries to remove the security group first.

I've checked terraform.tfstate file in order to see if dependencies were correctly detected, however module ec2_cluster identified dependencies are only internal module resources:

            "path": [
                "root",
                "ec2_cluster"
            ],
            "outputs": {},
            "resources": {
                "aws_instance.this_t2": {
                    "type": "aws_instance",
                    "depends_on": [
                        "local.is_t2_instance_type"
                     ],

Security group dependency detected only on another security group referenced in the "my_sg".

My intention is to reuse my_ec2.tf as module later because it is a complete stack of ALB, EC2, EBS, RT53 for, let's say, standalone Apache httpd server.
In case this simplified explanation wouldn't help, I will try what I can do from the above traces. Unfortunately, I have no time available for further debugging, so I cannot promise anything right now.

@CyrilDevOps
Copy link

I hit the same problem with security group list in a rds vpc_security_group_ids.
I wanted to add an extra group based on a flag to add access to the developper in dev network.
when I put that flag to false, terraform try to destroy the security group before removing it from the rds object. and it failed because of the depency.

@rgangopadhya
Copy link

rgangopadhya commented Dec 11, 2018

+1 -- running into this problem when I have an RDS instance with two security groups, and I would like to remove one. So, @apparentlymart this in essence the same as the latter case you mentioned: terraform tries to delete the security group before removing the security group from the RDS instance.

I am a bit hesitant to add the log output from TF_TRACE -- I need to audit it first to make sure no undesired information is included first (e.g., from connecting to the state S3 bucket, in my case). Ideally there would be some way to grep for exactly the lines that are relevant to seeing if there is a bug in building the dependency graph. I would feel more comfortable providing just that, if it is possible.

@apparentlymart
Copy link
Contributor

In the logs you should find some lines containing the string TransitiveReductionTransformer. This will appear once for each graph Terraform built during the operation. I'm particularly interested in the lines that state something like "Graph after step *TransitiveReductionTransformer" (I don't have the exact wording to hand, but grepping for that type name should reduce it down), which will be followed by a list of graph nodes followed by the names of nodes they are connected to (indented).

If you can share the entire list of nodes after that initial log line, that will at least allow me to see whether there is the expected dependency edges between the resources, though if it turns out that there is then we may need additional detail to fully explain it.

If the configuration has other objects in it aside from the RDS instance and security groups, it would help also to know the addresses of the resources in question (so we can easily identify them in the list) and, ideally, the full configuration sources for those resources so we can see how the dependencies between them are declared.

Thanks!

@schley2103
Copy link

Terraform v0.11.11 + provider.aws v1.45.0

My use case was to modify my security group and then apply. I'm hitting the same DependencyViolation while trying to destroy the security group. Trace log extract after TransitiveReductionTransformer are at https://gist.github.com/schley2103/5834f2f0b7c590352c2be4f7cb717594.js. It built 5 graphs.

Thanks!

@mschuchard
Copy link
Contributor

Thanks @b-dean. This issue is the number one Google result for "terraform aws_security_group create before destroy" to see if that would resolve the problem where Terraform hangs indefinitely when you re-create a security group attached to running instances. Seems like a valid workaround in the interim.

@AndreiDumaSys
Copy link

I think @b-dean 's solution also works with name, not just with name-prefix.

@lfventura
Copy link

Also happens with aws_rds_cluster. Terraform 0.12

@krestivo-kdinfotech
Copy link

Happening to me too on 0.12. If an RDS is using a security group, that group cannot ever be destroyed. It's immortal.

@joaquinmoreira
Copy link

Happening to me with this config:

terraform -v
Terraform v0.12.7
+ provider.aws v2.25.0

In my use case, I already have an ec2 instance and a security group attached to it; its failing trying to destroy the sg when I made changes to it, without detaching it from the instance first.

@B3QL
Copy link

B3QL commented Mar 16, 2020

@jbardin can you reopen the issue? The bug is still present in:

Terraform v0.12.23
+ provider.aws v2.53.0

Example:

resource "aws_instance" "test" {
  ami           = "ami-077a5b1762a2dde35" # Ubuntu 10.04 Bionic
  instance_type = "t2.micro"
  vpc_security_group_ids = [aws_security_group.test_sg.id]
}

resource "aws_eip" "ip" {
  vpc      = true
  instance = aws_instance.test.id
}

resource "aws_vpc" "main" {
  cidr_block = "172.31.0.0/16"

  tags = {
    Name = "main"
  }
}

resource "aws_security_group" "test_sg" {
  name        = "rules"
  description = "Traffic rules"
  vpc_id      = aws_vpc.main.id

  ingress {
    description = "HTTP"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"

    cidr_blocks = [
      "0.0.0.0/0"
    ]
    ipv6_cidr_blocks = [
      "::/0",
    ]
  }
}

@ghost
Copy link

ghost commented Mar 17, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Mar 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug core provider/aws v0.11 Issues (primarily bugs) reported against v0.11 releases
Projects
None yet
Development

Successfully merging a pull request may close this issue.