Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subnet chained to AWS ELB cannot be destroyed (DependencyViolation) #934

Closed
radeksimko opened this issue Feb 5, 2015 · 11 comments · Fixed by #1252
Closed

Subnet chained to AWS ELB cannot be destroyed (DependencyViolation) #934

radeksimko opened this issue Feb 5, 2015 · 11 comments · Fixed by #1252

Comments

@radeksimko
Copy link
Member

If you run following template:

provider "aws" {
  region = "eu-west-1"
}

resource "aws_vpc" "default" {
  cidr_block = "10.12.0.0/16"
}

resource "aws_subnet" "private" {
  availability_zone = "eu-west-1a"
  cidr_block = "10.12.0.0/24"
  vpc_id = "${aws_vpc.default.id}"
}

resource "aws_security_group" "sample" {
  description = "Sample sg"
  vpc_id = "${aws_vpc.default.id}"
  name = "sample"
}

resource "aws_launch_configuration" "sample" {
  name = "sample"
  image_id = "ami-8f0087f8"
  instance_type = "t2.micro"
  associate_public_ip_address = false
  key_name = "coreos-test"
  security_groups = ["${aws_security_group.sample.id}"]
}

resource "aws_elb" "sample" {
  name = "sample"
  cross_zone_load_balancing = false
  internal = true
  subnets = ["${aws_subnet.private.id}"]
  security_groups = ["${aws_security_group.sample.id}"]

  listener {
    instance_port = 80
    instance_protocol = "tcp"
    lb_port = 80
    lb_protocol = "tcp"
  }

  health_check {
    healthy_threshold = 2
    unhealthy_threshold = 5
    timeout = 3
    interval = 10
    target = "TCP:80"
  }
}

resource "aws_autoscaling_group" "sample" {
  name = "sample"
  availability_zones = ["eu-west-1a"]
  launch_configuration = "${aws_launch_configuration.sample.name}"
  min_size = 1
  max_size = 3
  desired_capacity = 1
  load_balancers = ["${aws_elb.sample.name}"]
  vpc_zone_identifier = ["${aws_subnet.private.id}"]
}

you'll see following error:

Error applying plan:

1 error(s) occurred:

* Error deleting subnet: The subnet 'subnet-4821f611' has dependencies and cannot be deleted. (DependencyViolation)

I think it could be similar to #357 except the dependency missing here is aws_subnet @ AWS instance which comes up through aws_autoscaling_group, not aws_instance which you run directly.

@radeksimko
Copy link
Member Author

The graph here seems to be all right
graph
unless I'm missing something obvious?

Does this need any similar "nasty hack" like network_acl
https://github.com/hashicorp/terraform/blob/c7e536680d4cf71c895a218c42c11e06cca0ebee/builtin/providers/aws/resource_aws_network_acl.go#L259-271 or is this something that should be solved in upper level (graph generation/processing level)?

@willmcg
Copy link

willmcg commented Mar 11, 2015

Running into this same problem in 0.37 and master and I agree that is seems to be related to the instances launched by the autoscaling group not being fully terminated before Terraform tries to destroy the subnet in which they reside.

In case it provides more clues, this only seems to occur with my public subnets that have autoscaled instances with automatically assigned public IPs and routes to an internet gateway. My private subnets with autoscaled instances with only private IPs and routes to NAT instances always seem to destroy without any issues.

@radeksimko
Copy link
Member Author

launched by the autoscaling group not being fully terminated before Terraform tries to destroy the subnet in which they reside.

That is a good catch! Maybe we could make Terraform wait until all instances in that ASG are fully terminated?

@willmcg
Copy link

willmcg commented Mar 11, 2015

I thought that the force_delete=false on the aws_autoscaling_group resource would ensure that the instances were terminated before returning from the API call but apparently there is still some dependency hanging around causing the subnet delete to fail. I'm still working through trying to debug exactly what the dependency problem is and its somewhat annoying that the AWS API calls cannot identify the dependencies as part of their error state in the response.

@radeksimko
Copy link
Member Author

Apparently the current solution is already trying to drain the ASG before deleting it:
https://github.com/hashicorp/terraform/blob/master/builtin/providers/aws/resource_aws_autoscaling_group.go#L251

It's waiting up to 10mins for the ASG to return instances == 0, but I can easily imagine, that even though the ASG API endpoint says 0, it takes some extra time until all the instances are actually fully terminated and release all IP addresses, so that the subnet can be destroyed too.

It should be pretty easy to confirm that theory by collecting instance IDs first, then letting the existing code to drain the ASG and when it says 0, explicitly ask EC2 API for each instance separately.

@willmcg
Copy link

willmcg commented Mar 11, 2015

Huzzah! I found the problem and a solution that seems to work.

The problem in my case was security groups. The instances that were launched into the subnets are made members of security groups defined in the autoscaling launch configuration. When looking at the Terraform destroy pipelining, the security groups were being destroyed in parallel with the subnets and this was causing the subnet destroy failure. Not entirely sure what AWS thinks the dependency is because security groups don't have any direct association with the subnet except for instances, load balancers, etc. that use them, but that was the problem in my case.

Once I added explicit depends_on in the security group configs (for security groups referenced in autoscaling launch configurations) to make them depend on the subnets then everything gets torn down in the correct order successfully.

@willmcg
Copy link

willmcg commented Mar 11, 2015

There is a relevant comment from AWS support here regarding order of deletion of VPC resources that implies the security groups need to be deleted before the subnets.

https://forums.aws.amazon.com/thread.jspa?threadID=92407

  • Terminate all instances in your VPC
  • Delete all ENI's associated with subnets within your VPC
  • Detach all Internet and Virtual Private Gateways (you can then delete them and any VPN connections, but that's not required to delete the VPC object)
  • Disassociate all route tables from all the subnets in your VPC
  • Delete all route tables other than the "Main" table
  • Disassociate all Network ACL's from all the subnets in your VPC
  • Delete all Network ACL's other than the Default one
  • Delete all Security groups other than the Default one (note: if one group has a rule that references another, you have to delete that rule before you can delete the other security group)
  • Delete all subnets
  • Delete your VPC
  • Delete any DHCP Option Sets that had been used by the VPC

There's some flexibility in that list. For example, Instances, Gateways, Route Tables, NACLs and Sec. Groups can be done in any order.

@willmcg
Copy link

willmcg commented Mar 11, 2015

After more experimentation with different dependencies in my configuration I'm convinced that the security groups need to be deleted before the subnets. I can consistently reproduce subnet delete failures due to dependency violation when the only remaining non-destroyed resources were the subnets, security groups and the VPC itself.

catsby added a commit that referenced this issue Mar 19, 2015
Though not directly connected, trying to delete a subnet and security group in
parallel can cause a dependency violation from the subnet, claiming there are
dependencies.

This commit fixes that by allowing subnet deletion to tolerate failure with a
retry / refresh function.

Fixes #934
@catsby
Copy link
Contributor

catsby commented Mar 19, 2015

I just opened #1252 to fix this.

Alternatively, you can add a simple depends_on = "aws_subnet.private" to "aws_security_group" "sample", which will produce this destroy graph:

out

which works as well.
#1252 should have you covered though, please check it out!

@catsby
Copy link
Contributor

catsby commented Mar 19, 2015

I guess merging #1252 auto-closed this because it mentions "fixes", so, yeah this is closed. Let us know if you're still seeing this behavior on master.
Thanks! And thanks for doing all the investigation for me 😄

@ghost
Copy link

ghost commented May 4, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators May 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants