-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws_ecs_cluster with capacity_providers cannot be destroyed #11409
Comments
Meanwhile here is a nasty workaround using a destroy provisioner, that worked for me to allow the resource "aws_ecs_cluster" "cluster" {
name = local.cluster_name
capacity_providers = [aws_ecs_capacity_provider.cp.name]
default_capacity_provider_strategy {
capacity_provider = aws_ecs_capacity_provider.cp.name
}
# We need to terminate all instances before the cluster can be destroyed.
# (Terraform would handle this automatically if the autoscaling group depended
# on the cluster, but we need to have the dependency in the reverse
# direction due to the capacity_providers field above).
provisioner "local-exec" {
when = destroy
command = <<CMD
# Get the list of capacity providers associated with this cluster
CAP_PROVS="$(aws ecs describe-clusters --clusters "${self.arn}" \
--query 'clusters[*].capacityProviders[*]' --output text)"
# Now get the list of autoscaling groups from those capacity providers
ASG_ARNS="$(aws ecs describe-capacity-providers \
--capacity-providers "$CAP_PROVS" \
--query 'capacityProviders[*].autoScalingGroupProvider.autoScalingGroupArn' \
--output text)"
if [ -n "$ASG_ARNS" ] && [ "$ASG_ARNS" != "None" ]
then
for ASG_ARN in $ASG_ARNS
do
ASG_NAME=$(echo $ASG_ARN | cut -d/ -f2-)
# Set the autoscaling group size to zero
aws autoscaling update-auto-scaling-group \
--auto-scaling-group-name "$ASG_NAME" \
--min-size 0 --max-size 0 --desired-capacity 0
# Remove scale-in protection from all instances in the asg
INSTANCES="$(aws autoscaling describe-auto-scaling-groups \
--auto-scaling-group-names "$ASG_NAME" \
--query 'AutoScalingGroups[*].Instances[*].InstanceId' \
--output text)"
aws autoscaling set-instance-protection --instance-ids $INSTANCES \
--auto-scaling-group-name "$ASG_NAME" \
--no-protected-from-scale-in
done
fi
CMD
}
} |
Any updates here? This is terribly annoying to deal with. (The workaround does not work in my particular case) |
Any news? I still waiting for this issue to be fixed |
If you don't link the cluster to the capacity provider as a dependency and just use the name as a string does that fix the issue? It's not great but as long as you can delete a capacity provider while the ASG it's linked to has instances then that would work. |
@tomelliff Unfortunately this has another issue during construction: |
any updates here ? |
Having this issue too. On
This started around Terraform 0.12, and we added retries to work around it. We're now upgrading to 0.15, and the retries no longer seem to help, so this is a blocker. |
Ah, turns out this is precisely the issue described in #11531. In short, the design of capacity providers is broken in Terraform right now, as it creates an invalid dependency chain: |
For a less invasive update, the aws_ecs_cluster resource could just implement the process provided by Luke in his provisioner shell script when the cluster has capacity providers. Perhaps not quite as elegant as introducing a new attachment kind of resource, but will fix the problem. It seems that the introduction of capacity provides has created all sorts of issues for AWS Cloudformation as well as Terraform engineers. |
Same as #4852 . Someone consolidate all these, this is really noisy. |
The workaround does not work if you're use terraform-aws-modules since we cannot add provisioners to modules. |
Thank you for the input on this issue! We are carefully considering work on this in the near future. (No guarantees on an exact date.) In order to facilitate the implementation, I've outlined some thoughts below. After looking through this, I agree with the suggested way forward:
Please provide any feedback, yay or nay. |
Hi all 👋 Just letting you know that this is issue is featured on this quarters roadmap. If a PR exists to close the issue a maintainer will review and either make changes directly, or work with the original author to get the contribution merged. If you have written a PR to resolve the issue please ensure the "Allow edits from maintainers" box is checked. Thanks for your patience and we are looking forward to getting this merged soon! |
provider "aws" {}
locals {
// cluster_name is a local to avoid the cyclical dependency:
// cluster -> capacity provider -> asg -> launch template -> user data -> cluster.
cluster_name = random_pet.name.id
}
data "aws_availability_zones" "current" {
state = "available"
filter {
name = "opt-in-status"
values = ["opt-in-not-required"]
}
}
resource "random_pet" "name" {}
data "aws_ami" "test" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-ecs-hvm-2.0.*-x86_64-ebs"]
}
}
resource "aws_vpc" "test" {
cidr_block = "10.0.0.0/16"
tags = {
Name = random_pet.name.id
}
}
resource "aws_subnet" "test" {
vpc_id = aws_vpc.test.id
cidr_block = "10.0.0.0/24"
map_public_ip_on_launch = true
tags = {
Name = random_pet.name.id
}
}
resource "aws_route_table" "main" {
vpc_id = aws_vpc.test.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.test.id
}
resource "aws_route_table_association" "main" {
route_table_id = aws_route_table.main.id
subnet_id = aws_subnet.test.id
}
resource "aws_security_group" "test" {
vpc_id = aws_vpc.test.id
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = random_pet.name.id
}
}
resource "aws_ecs_cluster" "test" {
name = local.cluster_name
capacity_providers = [aws_ecs_capacity_provider.test.name]
default_capacity_provider_strategy {
capacity_provider = aws_ecs_capacity_provider.test.name
}
}
resource "aws_ecs_capacity_provider" "test" {
name = random_pet.name.id
auto_scaling_group_provider {
auto_scaling_group_arn = aws_autoscaling_group.test.arn
}
}
resource "aws_iam_role" "test" {
name = random_pet.name.id
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = {
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
Action = "sts:AssumeRole"
}
})
}
resource "aws_iam_role_policy_attachment" "test" {
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
role = aws_iam_role.test.id
}
resource "aws_iam_instance_profile" "test" {
depends_on = [aws_iam_role_policy_attachment.test]
role = aws_iam_role.test.name
}
resource "aws_launch_template" "test" {
image_id = data.aws_ami.test.id
instance_type = "t3.micro"
instance_initiated_shutdown_behavior = "terminate"
vpc_security_group_ids = [aws_security_group.test.id]
iam_instance_profile {
name = aws_iam_instance_profile.test.name
}
user_data = base64encode(<<EOL
#!/bin/bash
echo "ECS_CLUSTER=${local.cluster_name}" >> /etc/ecs/ecs.config
EOL
)
}
resource "aws_autoscaling_group" "test" {
desired_capacity = 1
max_size = 2
min_size = 1
name = random_pet.name.id
vpc_zone_identifier = [aws_subnet.test.id]
instance_refresh {
strategy = "Rolling"
}
launch_template {
id = aws_launch_template.test.id
version = aws_launch_template.test.latest_version
}
tags = [{
key = "foo"
value = "bar"
propagate_at_launch = true
}]
}
|
Is it possible that your test did not hit the issue because the EC2 instances were not actually registering with the ECS cluster? |
@edmundcraske-bjss Yes, you are absolutely correct! Thank you. At this point, the best way forward looks like #22672. That will address the op recommended solution of an attachment resource (though named |
This functionality has been released in v3.74.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you! |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Community Note
Relates #5278
Relates #11351
Relates #11531
Relates #22672
Relates #22754
Maintainer Note
capacity_providers
anddefault_capacity_provider_strategy
arguments ofaws_ecs_cluster
(aws_ecs_cluster
: Deprecatecapacity_providers
anddefault_capacity_provider_strategy
#22754)capacity_providers
anddefault_capacity_provider_strategy
arguments fromaws_ecs_cluster
, which is a breaking changeTerraform Version
Terraform v0.12.18
Affected Resource(s)
Terraform Configuration Files
Debug Output
Panic Output
Expected Behavior
terraform destroy
should be able to destroy anaws_ecs_cluster
which hascapacity_providers
set.Actual Behavior
The problem is that this new
capacity_provider
property on theaws_ecs_cluster
introduces a new dependency:aws_ecs_cluster
depends on
aws_ecs_capacity_provider
depends on
aws_autoscaling_group
This causes terraform to destroy the ECS cluster before the autoscaling group, which is the wrong way around: the autoscaling group must be destroyed first because the cluster must contain zero instances before it can be destroyed.
A possible solution may be to introduce a new resource type representing the attachment of a capacity provider to a cluster (inspired by
aws_iam_role_policy_attachment
which is the attachment of an IAM policy to a role).This would allow the following dependency graph which would work beautifully:
aws_ecs_capacity_provider_cluster_attachment
depends on
aws_ecs_cluster
andaws_ecs_capacity_provider
;aws_ecs_capacity_provider
depends on
aws_autoscaling_group
depends on
aws_launch_template
depends on
aws_ecs_cluster
(e.g. via theuser_data
property which needs to set theECS_CLUSTER
environment variable to the name of the cluster).Steps to Reproduce
terraform apply
terraform destroy
Important Factoids
References
The problematic
capacity_providers
field onaws_ecs_cluster
was added recently in Add support for ECS capacity providers #11150Using
aws_ecs_capacity_provider
withmanaged_termination_protection = "ENABLED"
requires that theaws_autoscaling_group
hasprotect_from_scale_in
enabled, which has a separate issue with destroy: Terraform fails to destroy autoscaling group if scale in protection is enabled #5278The text was updated successfully, but these errors were encountered: