terraform attempts to destroy AWS ECS cluster before Deleting ECS Service #4852

ghost · 2018-06-16T02:20:03Z

This issue was originally opened by @jaloren as hashicorp/terraform#18263. It was migrated here as a result of the provider split. The original body of the issue is below.

I am using the aws_cloudformation_stack resource to provision an aws Elastic Container Service cluster and one or more services in that cluster. I used terraform graph -type=plan-destroy to verify that I successfully set up a dependency relationship in terraform between the TF resource for creating the service and the TF resource for creating the ECS cluster.

According to graphviz, the service is a child node of the ecs cluster node. Given that, I am expecting TF to delete the service and then delete the cluster. However, this seems to happen out of order, which causes the delete of the ECS cluster to fail since you can't delete a cluster that has services in it.

Terraform Version

Terraform v0.11.8

Expected Behavior

Terraform successfully delete aws ECS cluster and its associated services.

Actual Behavior

Terraform successfully deleted the service in the ECS cluster but failed to delete the ECS cluster itself with the following error:

* aws_cloudformation_stack.ecs-cluster: DELETE_FAILED: ["The following resource(s) failed to delete: [ECSCluster]. " "The Cluster cannot be deleted while Services are active. (Service: AmazonECS; Status Code: 400; Error Code: ClusterContainsServicesException; Request ID: 7bcbeae4-70ab-11e8-bd0b-3d3254c7f7d3)"]

Steps to Reproduce

Please list the full steps required to reproduce the issue, for example:

terraform init
terraform plan
terraform apply

The text was updated successfully, but these errors were encountered:

avengers009 · 2018-06-21T15:26:37Z

This Should work if we stop the ECS services and try deleting ECS Cluster.

radeksimko · 2018-06-21T15:52:46Z

@avengers009 you're right, but ideally Terraform should be able to schedule these actions accordingly, where possible, or if not possible the user should be able to hint Terraform via depends_on. TL;DR users shouldn't need to manually touch the infrastructure in order to run apply or destroy successfully.

@jaloren Do you mind sharing the configs with us to understand the relationships between resources and allow us reproduce the problem?

Thanks.

Kartstig · 2018-07-11T21:06:01Z

I am also seeing this issue:

Error: Error applying plan:

1 error(s) occurred:

* aws_ecs_cluster.ecs (destroy): 1 error(s) occurred:

* aws_ecs_cluster.ecs: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active or draining.
	status code: 400, request id: 30e1e812-854c-11e8-bec1-397064633d2b

Here is my configuration:

ecs_service

resource "aws_ecs_service" "authenticator" {
  name            = "authenticator"
  cluster         = "${aws_ecs_cluster.ecs.id}"
  task_definition = "${aws_ecs_task_definition.authenticator.arn}"
  desired_count   = 2

  load_balancer {
    target_group_arn = "${aws_lb_target_group.authenticator.arn}"
    container_name   = "authenticator"
    container_port   = 3030
  }
}

ecs_cluster

resource "aws_ecs_cluster" "ecs" {
  name = "${local.safe_name_prefix}"
}

bflad · 2018-07-12T14:14:56Z

@Kartstig is that error occurring for you after 10 minutes or so of trying?

Kartstig · 2018-07-12T15:48:31Z

Yes it does. I usually make an attempt to destroy twice to account for any timeouts

shusak · 2018-07-19T11:54:07Z

I'm seeing very similar behavior with Terraform 0.11.7/AWS provider 1.19. I am frequently (but not every time) seeing this behavior:

00:12:27.512 aws_ecs_cluster.ecs_cluster: Still destroying... (ID: arn:aws:ecs:us-east-1:<MYACCOUNT>:cluster/my-service, 9m50s elapsed)
00:12:36.041 
00:12:36.042 Error: Error applying plan:
00:12:36.043 
00:12:36.044 1 error(s) occurred:
00:12:36.045 
00:12:36.045 * aws_ecs_cluster.ecs_cluster (destroy): 1 error(s) occurred:
00:12:36.046 
00:12:36.046 * aws_ecs_cluster.ecs_cluster: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active or draining.
00:12:36.047 	status code: 400, request id: b920a9e3-8b45-11e8-8e1a-0751c6fe0d1a

jaloren · 2018-08-18T19:46:57Z

@radeksimko I am not sure how much of the configs you would like to see. Its a little bit involved. But here's the key part of the main.tf in the root module.

Each module is nothing but a wrapper for a cloudformation template. So by referring to the output from one module as input in another, I am establishing a dependency between the two resources encapsulated in each module. Ergo, I am expecting on a destroy that the cluster would be deleted after the service since the service depends on the cluster

module "public_load_balancer" {
  source         = "../../modules/aws/network/load_balancer/alb"
  environment    = "${var.environment}"
  security_group = "${module.network_acls.load_balancer_security_group}"
  vpc            = "${module.network.vpc}"
  subnets        = "${module.network.public_one_subnet},${module.network.public_two_subnet}"
}

module "ecs_cluster" {
  source      = "../../modules/aws/ecs/cluster"
  environment = "${var.environment}"
}

module "log_group" {
  source        = "../../modules/aws/logs/log_group"
  environment   = "${var.environment}"
  log_retention = 3
}

module "ecs_application" {
  source                   = "../../modules/aws/ecs/services/ecsapp"
  subnets                  = "${module.network.ecs_traffic_one},${module.network.ecs_traffic_two}"
  target_group             = "${module.public_load_balancer.enrollment_api_target_group}"
  environment              = "${var.environment}"
  security_group           = "${module.network_acls.container_security_group}"
  vpc                      = "${module.network.vpc}"
  tag                      = "v1.0.0"
  log_group                = "${module.log_group.id}"
  cluster_name             = "${module.ecs_cluster.name}"
}

swagatata · 2018-10-24T12:54:55Z

Any update on this issue? Is there a plan to fix this? Or at least provide/output a machine readable list of services to be destroyed before destroying the instances?

orlando · 2018-10-26T11:51:26Z

I think Terraform should stop/terminate the instances as part of the destroy process, right now you have to manually terminate instances in order for the destroy action to finish.

swagatata · 2018-10-31T06:21:23Z

hey, we are trying to automate this destruction of instances instead of doing it manually. Is there a recommended way to automate this? Our application code is in Java.

One way to do this could be to parse the generated terraform plan(by "terraform destroy" command). Can you help us find a way to parse the terraform plan to identify what instances/clusters need to be destroyed?

sozay · 2019-04-25T13:38:43Z

You can prevent that situation with splitting your terraform project in at least two. You can use remote_state for that. If you put ECS cluster and service creation in two different projects, when you want to destroy, you can call first destroy process of service, then ECS cluster can be destroyed without any problem

aaronjhoffman · 2019-08-15T20:48:05Z

Is there any solution here? Terraform was working great for me and now I'm having the same error "The Cluster cannot be deleted while Services are active" and don't understand why I need to manually stop/terminate the instances...

neXussT · 2019-08-27T14:37:09Z

I am seeing this with 0.12.7 in my company's production environment intermittently. Is there any way to specify a "depends_on" or "teardown_first" which works for teardown?

archenroot · 2020-01-21T09:28:32Z

I am seeing this still on latest version...

aaronsteers · 2020-02-05T21:54:35Z

I'm here for the same issue - has anyone found a workaround? Or can anyone confirm that this sometimes works (even after n retries)? Otherwise, it seems the aws_ecs_service resource is broken. The core promise is that terraform apply followed by terraform destroy will just work.

Hoping to better understand if this never works or if it's just a retry/interim issue or an issue particular to a set of configs.

UPDATE: In my particular instance, I can confirm upon retry that terraform destroy does not list the ECS cluster as something to be destroyed - meaning the destroy of the ECS service failed at some point but was logged as destroyed anyway. (Or conversely, I guess, I could have been created and not correctly confirmed as created.) I will post back here if I have additional test results.

fadhlirahim · 2020-02-15T02:01:25Z

+1 having the same issue here. Latest version on Terraform Cloud

soumialeghzaoui · 2020-04-09T14:16:36Z

I have the same issue with terraform 0.12.19

yingw787 · 2020-05-06T23:46:05Z

Hey everyone, I'm using AWS CloudFormation and I'm experiencing this issue as well. I'm currently suspecting that it's not an issue with either CloudFormation or Terraform, but possibly with the underlying EC2 AMI. I'm using the Amazon Linux 2 AMI, while an example I'm referencing is using Amazon Linux 1, and the latter deletes fine while my former does not (even with an explicit DependsOn and Refs sprinkled throughout). There were a good number of changes to Amazon Linux 2, which I'm guessing may have included a change to cfn-bootstrap which might impact /opt/aws/cfn-signal behaviors. I haven't tested this out though.

mikalai-t · 2020-05-14T12:32:58Z

Not sure if this is the right place to complain, but probably the same issue here:

Error: Error draining autoscaling group: Group still has 1 instances

Error: Error deleting ECS cluster: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active or draining

Surprisingly, 2 moments:

I logged into the AWS console, noticed that ECS instance is in "Active" state, but was able to remove ECS Cluster immediately, without any warning/error! That EC2 instance kept working until I terminated it manually.
somehow, sometimes, it worked before!

Terraform v0.12.20 code is being used:

data "aws_ami" "amazon2_ecs_optimized" {}

resource "aws_launch_template" "this" {}

resource "aws_autoscaling_group" "this" {}

resource "aws_ecs_task_definition" "this" {}

resource "aws_ecs_service" "default" {
  #  ...
  depends_on = [
    # consider note at https://www.terraform.io/docs/providers/aws/r/ecs_service.html
    aws_iam_role_policy.ecs_service
  ]
  # ...
}

resource "aws_ecs_cluster" "application" {}

p.s. will try to build workaround with null_resource and local-exec provisioner with when = destroy strategy running aws cli to find and deregister ECS EC2 instances... but it's sad in terms of "reliable" cloud services.

Amit30891 · 2020-06-01T13:24:14Z

I have also faced the exact similar issue as raised by mikalai-t.
@mikalai-t would you like to share what steps did you follow as workaround.

mikalai-t · 2020-06-02T08:20:28Z

Still didn't implement a workaround, but... I noticed that sometimes even termination process took a while, so I assumed our application becomes unresponsive and consumes too much CPU and therefore EC2 instance failed to respond in time.
I just configured t3a.small instead of t3a.micro and the issue hasn't appeared since then. Not sure if this is a final solution, but you can start from analyzing your application behavior on a different instance type.
Also I would recommend to check current instance's protect from scale-in setting. I had similar issue when I stopped using ECS Capacity Provider and forgot to set this setting to false.
btw... Even with capacity provider configured in the cluster I faced timeouts when destroying the ASG, but after a couple of repeated attempts it was always successful.

Forcing this has no effect, and it is a known bug in Terraform. hashicorp/terraform-provider-aws#4852

We are seeing dependency issues when running `terraform destroy` Two issues are preventing a clean destroy: 1. Terraform attempts to destroy network resources before other resources. This fails because you cannot destroy a VPC when you have services running in it. 2. Terraform attempts to destroy the ECS cluster before the auto scaling group that serves as the compute for the capacity provider. This PR addresses the first issue, by leveraging the module `depends_on` feature in Terraform 0.13. The second issue still needs to be addressed by extracting the auto scaling group into its own module and having the ECS cluster depend on it. hashicorp/terraform-provider-aws#4852 To use this for local development, run `make init`, which will reconfigure the state to use the new version of Terraform. A PR following this will remove the `-reconfigure` flag from the Makefile once everyone has upgraded.

* Upgrade Terraform to version 0.13 We are seeing dependency issues when running `terraform destroy` Two issues are preventing a clean destroy: 1. Terraform attempts to destroy network resources before other resources. This fails because you cannot destroy a VPC when you have services running in it. 2. Terraform attempts to destroy the ECS cluster before the auto scaling group that serves as the compute for the capacity provider. This PR addresses the first issue, by leveraging the module `depends_on` feature in Terraform 0.13. The second issue still needs to be addressed by extracting the auto scaling group into its own module and having the ECS cluster depend on it. hashicorp/terraform-provider-aws#4852 To use this for local development, run `make init`, which will reconfigure the state to use the new version of Terraform. A PR following this will remove the `-reconfigure` flag from the Makefile once everyone has upgraded. * Manually remove auto scaling groups before destroy Due to a bug in Terraform, ECS is unable to delete before the auto scaling group has been removed. Use the aws command line in combination with your current workspace to delete the auto scaling group as a separate step before running terraform destroy. This is wrapped up in `make destroy`, and `terraform destroy` should not be used. Because calling aws from the command line is unable to assume a role unless the arn is known, the `aws-vault` commands need to be hardcoded within the Makefile.

Zogoo · 2020-12-03T08:32:47Z

It's still happening Terraform 0.12.26 with aws provider 3.19.
Error:

Error: Error deleting ECS cluster: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active or draining.
Error: Error waiting for internet gateway (igw-0cab*******25) to detach: timeout while waiting for state to become 'detached' (last state: 'detaching', timeout: 15m0s)

Reason:
In my case I was using AWS Capacity provider for my ECS cluster and I have 90% for capacity (below 100% is same). As result instances were running even ECS services and tasks already deleted.

Work around:
So, I have to setup desired size and min size 0 for auto scaling group by AWS CLI. Then doing terraform destroy

aws autoscaling update-auto-scaling-group --auto-scaling-group-name "my-auto-scaling-group-name" --min-size 0 --desired-capacity 0

But I think this action should be handled by aws provider when do Terraform destroy.

jm4games · 2020-12-04T18:40:54Z

This issue still repos on terraform v0.14.0 and aws provider >= 3.16. Something I have noticed is that it spins on deleting the capacity provider. If I manually delete the capacity provider (from aws UI) it occurs right away. Maybe terraform is making an improper call to AWS API?

deeco · 2020-12-09T15:55:26Z

same issue in v0.14.0 for me also , i get it when more than 1 service and task definition is defined and created

(The Cluster cannot be deleted/renamed while Container Instances are active or draining. ) + attempt to inverse dependencies on efs_sg_ids and efs_id for ASG aws_launch_configuration

tiberiu89 · 2021-01-24T09:11:38Z

Any updates on this? I'm having one of the issues mentioned above, terraform cannot delete ECS cluster with active Container Instances. I'm using ECS managed ASG setup. I think the order of destruction is correct. So ASG is created before ECS, ECS depends on the ASG ARN. When running destroy, it tries to apply that on ECS first. Are there any means to bypass this check when destroying? maybe force the cluster to be removed, so that ASG removal can kick in. Right know I have to manually delete the ASG when terraform tries to remove the cluster

fix for hashicorp/terraform-provider-aws#4852

Axent96 · 2021-02-05T02:31:15Z

I have the same problem...

matt-brewster · 2021-02-15T12:55:00Z

We intermittently get this error too when destroying our infrastructure. We have a retry built into our wrapper scripts and on Friday our failure looked like this:

2021-02-12 18:52:56 Error: Error deleting ECS cluster: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active or draining.

<< retry the destroy>>
<< 20 minutes of module.ecs_cluster.aws_ecs_capacity_provider.this: Still destroying... >>
<< then >>

2021-02-12 19:13:15 Error: error waiting for ECS Capacity Provider (arn:aws:ecs:eu-west-2:XXXXXXXX:capacity-provider/my-asg) to delete: timeout while waiting for state to become 'INACTIVE' (last state: 'ACTIVE', timeout: 20m0s)

jybaek · 2021-02-17T05:14:50Z

same issue in v0.14.6 😭

baztian · 2021-04-08T21:12:08Z

For me the work around from @Zogoo did the trick.

aws autoscaling update-auto-scaling-group --auto-scaling-group-name "my-auto-scaling-group-name" --min-size 0 --desired-capacity 0

The other work around from @jm4games also works. To do it from aws cli:

aws ecs put-cluster-capacity-providers --cluster my-cluster --capacity-providers [] --default-capacity-provider-strategy []

brikis98 · 2021-04-28T11:47:00Z

Having this issue too. On destroy, I get the error:

Error deleting ECS cluster: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active or draining.

This started around Terraform 0.12, and we added retries to work around it. We're now upgrading to 0.15, and the retries no longer seem to help, so this is a blocker.

bharti8085 · 2021-09-24T03:20:05Z

I also have same issue and received below error.

Error: error waiting for ECS Capacity Provider (arn:aws:ecs:eu-west-1:account-id:capacity-provider/asg-ec2-cp) to delete: timeout while waiting for resource to be gone (last state: 'ACTIVE', timeout: 20m0s)

Error: Error deleting ECS cluster: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active or draining.

do we have any fix for this?

justinretzolk · 2021-10-13T20:08:47Z

Hi all 👋 Thanks for taking the time to submit this issue and for the ongoing discussion. It looks like this is a duplicate of #11409. We like to try to keep discussions consolidated, and while this issue was filed first, the other one has more reactions (something we use to help gauge community interest in an issue/PR), and a suggested workaround. With that in mind, we’re going to close this new issue in favor of #11409.

Deleting stacks using ECS clusters having capacityProviders (i.e. dual-primary and primary-replica recipes), fails with: ``` The Cluster cannot be deleted while Container Instances are active or draining. ``` This is an issue that manifests itself as well via terraform [1] or CDK [2]. Explicitly deleting the Autoscaling Groups _before_ the ECS cluster deletion fixes the problem, since it ensures that no instances are active or draining, as the error suggests. This is safe to do, because prior to deleting the Autoscaling Groups, every ECS service has already been destroyed, thus no instance is actually running. [1] hashicorp/terraform-provider-aws#4852 [2] aws/aws-cdk#14732 Bug: Issue 14698 Change-Id: I216307ef88bd7b7317706d2dc0a6a6e6fb367bd4 Change-Id: I27ece0f6971b157a474d91d7f3d9243dcff596e6

github-actions · 2022-06-03T02:29:29Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

ghost mentioned this issue Jun 16, 2018

terraform attempts to destroy AWS ECS cluster before Deleting ECS Service hashicorp/terraform#18263

Closed

radeksimko added bug Addresses a defect in current functionality. waiting-response Maintainers are waiting on response from community or contributor. service/ecs Issues and PRs that pertain to the ecs service. labels Jun 21, 2018

jcarrothers-sap mentioned this issue Jun 28, 2018

Bug: Terraform attempts to delete security groups before dependent EC2 instances hashicorp/terraform#8617

Closed

bflad removed the waiting-response Maintainers are waiting on response from community or contributor. label Oct 9, 2018

nywilken mentioned this issue Apr 2, 2019

Terminate ecs instances when renaming ecs cluster #8152

Closed

anGie44 mentioned this issue Jun 29, 2020

resource/ecs_service: Add support for customTimeout on delete #10452

Merged

anGie44 added this to the v2.69.0 milestone Jun 30, 2020

anGie44 removed this from the v2.69.0 milestone Jun 30, 2020

emileswarts added a commit to ministryofjustice/staff-device-dns-dhcp-infrastructure that referenced this issue Sep 14, 2020

Remove useless depends_on on auto scaling groups

6d899cb

Forcing this has no effect, and it is a known bug in Terraform. hashicorp/terraform-provider-aws#4852

emileswarts mentioned this issue Sep 14, 2020

Manage module dependencies by upgrading Terraform ministryofjustice/staff-device-dns-dhcp-infrastructure#26

Merged

provose-bot mentioned this issue Nov 24, 2020

Errors when deleting ECS containers provose/provose#9

Open

lazzurs added a commit to lazzurs/terraform-aws-ecs that referenced this issue Feb 2, 2021

Merge pull request #2 from mslipets/fix/asg-scale-to-0-on-destroy

d102361

fix for hashicorp/terraform-provider-aws#4852

pingleig mentioned this issue Jun 17, 2021

[ecs] Terraform destroy failed because it destroy cluster before scaling down asg aws-observability/aws-otel-test-framework#307

Closed

aamir-locus mentioned this issue Sep 9, 2021

Error while deleting ECS cluster pulumi/pulumi-aws#1621

Closed

ericb-summit mentioned this issue Oct 8, 2021

aws_ecs_cluster with capacity_providers cannot be destroyed #11409

Closed

justinretzolk closed this as completed Oct 13, 2021

github-actions bot locked as resolved and limited conversation to collaborators Jun 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

terraform attempts to destroy AWS ECS cluster before Deleting ECS Service #4852

terraform attempts to destroy AWS ECS cluster before Deleting ECS Service #4852

ghost commented Jun 16, 2018

avengers009 commented Jun 21, 2018

radeksimko commented Jun 21, 2018

Kartstig commented Jul 11, 2018

bflad commented Jul 12, 2018

Kartstig commented Jul 12, 2018

shusak commented Jul 19, 2018 •

edited

Loading

jaloren commented Aug 18, 2018 •

edited

Loading

swagatata commented Oct 24, 2018

orlando commented Oct 26, 2018

swagatata commented Oct 31, 2018 •

edited

Loading

sozay commented Apr 25, 2019

aaronjhoffman commented Aug 15, 2019

neXussT commented Aug 27, 2019

archenroot commented Jan 21, 2020

aaronsteers commented Feb 5, 2020 •

edited

Loading

fadhlirahim commented Feb 15, 2020

soumialeghzaoui commented Apr 9, 2020

yingw787 commented May 6, 2020

mikalai-t commented May 14, 2020

Amit30891 commented Jun 1, 2020

mikalai-t commented Jun 2, 2020

Zogoo commented Dec 3, 2020

jm4games commented Dec 4, 2020

deeco commented Dec 9, 2020

tiberiu89 commented Jan 24, 2021

Axent96 commented Feb 5, 2021

matt-brewster commented Feb 15, 2021

jybaek commented Feb 17, 2021

baztian commented Apr 8, 2021

brikis98 commented Apr 28, 2021

bharti8085 commented Sep 24, 2021

justinretzolk commented Oct 13, 2021

github-actions bot commented Jun 3, 2022

terraform attempts to destroy AWS ECS cluster before Deleting ECS Service #4852

terraform attempts to destroy AWS ECS cluster before Deleting ECS Service #4852

Comments

ghost commented Jun 16, 2018

Terraform Version

Expected Behavior

Actual Behavior

Steps to Reproduce

avengers009 commented Jun 21, 2018

radeksimko commented Jun 21, 2018

Kartstig commented Jul 11, 2018

bflad commented Jul 12, 2018

Kartstig commented Jul 12, 2018

shusak commented Jul 19, 2018 • edited Loading

jaloren commented Aug 18, 2018 • edited Loading

swagatata commented Oct 24, 2018

orlando commented Oct 26, 2018

swagatata commented Oct 31, 2018 • edited Loading

sozay commented Apr 25, 2019

aaronjhoffman commented Aug 15, 2019

neXussT commented Aug 27, 2019

archenroot commented Jan 21, 2020

aaronsteers commented Feb 5, 2020 • edited Loading

fadhlirahim commented Feb 15, 2020

soumialeghzaoui commented Apr 9, 2020

yingw787 commented May 6, 2020

mikalai-t commented May 14, 2020

Amit30891 commented Jun 1, 2020

mikalai-t commented Jun 2, 2020

Zogoo commented Dec 3, 2020

jm4games commented Dec 4, 2020

deeco commented Dec 9, 2020

tiberiu89 commented Jan 24, 2021

Axent96 commented Feb 5, 2021

matt-brewster commented Feb 15, 2021

jybaek commented Feb 17, 2021

baztian commented Apr 8, 2021

brikis98 commented Apr 28, 2021

bharti8085 commented Sep 24, 2021

justinretzolk commented Oct 13, 2021

github-actions bot commented Jun 3, 2022

shusak commented Jul 19, 2018 •

edited

Loading

jaloren commented Aug 18, 2018 •

edited

Loading

swagatata commented Oct 31, 2018 •

edited

Loading

aaronsteers commented Feb 5, 2020 •

edited

Loading