Subnet chained to AWS ELB cannot be destroyed (DependencyViolation) #934

radeksimko · 2015-02-05T18:31:43Z

If you run following template:

provider "aws" {
  region = "eu-west-1"
}

resource "aws_vpc" "default" {
  cidr_block = "10.12.0.0/16"
}

resource "aws_subnet" "private" {
  availability_zone = "eu-west-1a"
  cidr_block = "10.12.0.0/24"
  vpc_id = "${aws_vpc.default.id}"
}

resource "aws_security_group" "sample" {
  description = "Sample sg"
  vpc_id = "${aws_vpc.default.id}"
  name = "sample"
}

resource "aws_launch_configuration" "sample" {
  name = "sample"
  image_id = "ami-8f0087f8"
  instance_type = "t2.micro"
  associate_public_ip_address = false
  key_name = "coreos-test"
  security_groups = ["${aws_security_group.sample.id}"]
}

resource "aws_elb" "sample" {
  name = "sample"
  cross_zone_load_balancing = false
  internal = true
  subnets = ["${aws_subnet.private.id}"]
  security_groups = ["${aws_security_group.sample.id}"]

  listener {
    instance_port = 80
    instance_protocol = "tcp"
    lb_port = 80
    lb_protocol = "tcp"
  }

  health_check {
    healthy_threshold = 2
    unhealthy_threshold = 5
    timeout = 3
    interval = 10
    target = "TCP:80"
  }
}

resource "aws_autoscaling_group" "sample" {
  name = "sample"
  availability_zones = ["eu-west-1a"]
  launch_configuration = "${aws_launch_configuration.sample.name}"
  min_size = 1
  max_size = 3
  desired_capacity = 1
  load_balancers = ["${aws_elb.sample.name}"]
  vpc_zone_identifier = ["${aws_subnet.private.id}"]
}

you'll see following error:

Error applying plan:

1 error(s) occurred:

* Error deleting subnet: The subnet 'subnet-4821f611' has dependencies and cannot be deleted. (DependencyViolation)

I think it could be similar to #357 except the dependency missing here is aws_subnet @ AWS instance which comes up through aws_autoscaling_group, not aws_instance which you run directly.

The text was updated successfully, but these errors were encountered:

radeksimko · 2015-03-10T23:37:42Z

The graph here seems to be all right

unless I'm missing something obvious?

Does this need any similar "nasty hack" like network_acl
https://github.com/hashicorp/terraform/blob/c7e536680d4cf71c895a218c42c11e06cca0ebee/builtin/providers/aws/resource_aws_network_acl.go#L259-271 or is this something that should be solved in upper level (graph generation/processing level)?

willmcg · 2015-03-11T15:06:51Z

Running into this same problem in 0.37 and master and I agree that is seems to be related to the instances launched by the autoscaling group not being fully terminated before Terraform tries to destroy the subnet in which they reside.

In case it provides more clues, this only seems to occur with my public subnets that have autoscaled instances with automatically assigned public IPs and routes to an internet gateway. My private subnets with autoscaled instances with only private IPs and routes to NAT instances always seem to destroy without any issues.

radeksimko · 2015-03-11T15:46:21Z

launched by the autoscaling group not being fully terminated before Terraform tries to destroy the subnet in which they reside.

That is a good catch! Maybe we could make Terraform wait until all instances in that ASG are fully terminated?

willmcg · 2015-03-11T16:31:19Z

I thought that the force_delete=false on the aws_autoscaling_group resource would ensure that the instances were terminated before returning from the API call but apparently there is still some dependency hanging around causing the subnet delete to fail. I'm still working through trying to debug exactly what the dependency problem is and its somewhat annoying that the AWS API calls cannot identify the dependencies as part of their error state in the response.

radeksimko · 2015-03-11T16:44:46Z

Apparently the current solution is already trying to drain the ASG before deleting it:
https://github.com/hashicorp/terraform/blob/master/builtin/providers/aws/resource_aws_autoscaling_group.go#L251

It's waiting up to 10mins for the ASG to return instances == 0, but I can easily imagine, that even though the ASG API endpoint says 0, it takes some extra time until all the instances are actually fully terminated and release all IP addresses, so that the subnet can be destroyed too.

It should be pretty easy to confirm that theory by collecting instance IDs first, then letting the existing code to drain the ASG and when it says 0, explicitly ask EC2 API for each instance separately.

willmcg · 2015-03-11T19:04:34Z

Huzzah! I found the problem and a solution that seems to work.

The problem in my case was security groups. The instances that were launched into the subnets are made members of security groups defined in the autoscaling launch configuration. When looking at the Terraform destroy pipelining, the security groups were being destroyed in parallel with the subnets and this was causing the subnet destroy failure. Not entirely sure what AWS thinks the dependency is because security groups don't have any direct association with the subnet except for instances, load balancers, etc. that use them, but that was the problem in my case.

Once I added explicit depends_on in the security group configs (for security groups referenced in autoscaling launch configurations) to make them depend on the subnets then everything gets torn down in the correct order successfully.

willmcg · 2015-03-11T20:52:15Z

There is a relevant comment from AWS support here regarding order of deletion of VPC resources that implies the security groups need to be deleted before the subnets.

https://forums.aws.amazon.com/thread.jspa?threadID=92407

Terminate all instances in your VPC
Delete all ENI's associated with subnets within your VPC
Detach all Internet and Virtual Private Gateways (you can then delete them and any VPN connections, but that's not required to delete the VPC object)
Disassociate all route tables from all the subnets in your VPC
Delete all route tables other than the "Main" table
Disassociate all Network ACL's from all the subnets in your VPC
Delete all Network ACL's other than the Default one
Delete all Security groups other than the Default one (note: if one group has a rule that references another, you have to delete that rule before you can delete the other security group)
Delete all subnets
Delete your VPC
Delete any DHCP Option Sets that had been used by the VPC

There's some flexibility in that list. For example, Instances, Gateways, Route Tables, NACLs and Sec. Groups can be done in any order.

willmcg · 2015-03-11T21:07:03Z

After more experimentation with different dependencies in my configuration I'm convinced that the security groups need to be deleted before the subnets. I can consistently reproduce subnet delete failures due to dependency violation when the only remaining non-destroyed resources were the subnets, security groups and the VPC itself.

Though not directly connected, trying to delete a subnet and security group in parallel can cause a dependency violation from the subnet, claiming there are dependencies. This commit fixes that by allowing subnet deletion to tolerate failure with a retry / refresh function. Fixes #934

catsby · 2015-03-19T20:17:28Z

I just opened #1252 to fix this.

Alternatively, you can add a simple depends_on = "aws_subnet.private" to "aws_security_group" "sample", which will produce this destroy graph:

which works as well.
#1252 should have you covered though, please check it out!

catsby · 2015-03-19T20:40:55Z

I guess merging #1252 auto-closed this because it mentions "fixes", so, yeah this is closed. Let us know if you're still seeing this behavior on master.
Thanks! And thanks for doing all the investigation for me 😄

ghost · 2020-05-04T01:54:28Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

mitchellh added provider/aws bug labels Feb 17, 2015

catsby mentioned this issue Mar 19, 2015

provider/aws: Fix dependency violation with subnets and security groups #1252

Merged

catsby closed this as completed in #1252 Mar 19, 2015

catsby mentioned this issue Apr 9, 2015

provider/aws: Limit the number of retry attempts when deleting a Subnet #1455

Closed

bitglue mentioned this issue Apr 29, 2015

thinking: dependency graphs are hard, so just don't do them #1732

Closed

rokka-n mentioned this issue Apr 23, 2016

Dependency violation when deleting a Subnet: ENI still attached to ELB #6318

Closed

hashibot mentioned this issue Jun 13, 2017

Dependency violation when deleting a Subnet: ENI still attached to ELB hashicorp/terraform-provider-aws#151

Closed

ghost locked and limited conversation to collaborators May 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subnet chained to AWS ELB cannot be destroyed (DependencyViolation) #934

Subnet chained to AWS ELB cannot be destroyed (DependencyViolation) #934

radeksimko commented Feb 5, 2015

radeksimko commented Mar 10, 2015

willmcg commented Mar 11, 2015

radeksimko commented Mar 11, 2015

willmcg commented Mar 11, 2015

radeksimko commented Mar 11, 2015

willmcg commented Mar 11, 2015

willmcg commented Mar 11, 2015

willmcg commented Mar 11, 2015

catsby commented Mar 19, 2015

catsby commented Mar 19, 2015

ghost commented May 4, 2020

Subnet chained to AWS ELB cannot be destroyed (DependencyViolation) #934

Subnet chained to AWS ELB cannot be destroyed (DependencyViolation) #934

Comments

radeksimko commented Feb 5, 2015

radeksimko commented Mar 10, 2015

willmcg commented Mar 11, 2015

radeksimko commented Mar 11, 2015

willmcg commented Mar 11, 2015

radeksimko commented Mar 11, 2015

willmcg commented Mar 11, 2015

willmcg commented Mar 11, 2015

willmcg commented Mar 11, 2015

catsby commented Mar 19, 2015

catsby commented Mar 19, 2015

ghost commented May 4, 2020