-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
terraform attempts to destroy AWS ECS cluster before Deleting ECS Service #4852
Comments
This Should work if we stop the ECS services and try deleting ECS Cluster. |
@avengers009 you're right, but ideally Terraform should be able to schedule these actions accordingly, where possible, or if not possible the user should be able to hint Terraform via @jaloren Do you mind sharing the configs with us to understand the relationships between resources and allow us reproduce the problem? Thanks. |
I am also seeing this issue:
Here is my configuration: ecs_service resource "aws_ecs_service" "authenticator" {
name = "authenticator"
cluster = "${aws_ecs_cluster.ecs.id}"
task_definition = "${aws_ecs_task_definition.authenticator.arn}"
desired_count = 2
load_balancer {
target_group_arn = "${aws_lb_target_group.authenticator.arn}"
container_name = "authenticator"
container_port = 3030
}
} ecs_cluster resource "aws_ecs_cluster" "ecs" {
name = "${local.safe_name_prefix}"
} |
@Kartstig is that error occurring for you after 10 minutes or so of trying? |
Yes it does. I usually make an attempt to destroy twice to account for any timeouts |
I'm seeing very similar behavior with Terraform 0.11.7/AWS provider 1.19. I am frequently (but not every time) seeing this behavior:
|
@radeksimko I am not sure how much of the configs you would like to see. Its a little bit involved. But here's the key part of the main.tf in the root module. Each module is nothing but a wrapper for a cloudformation template. So by referring to the output from one module as input in another, I am establishing a dependency between the two resources encapsulated in each module. Ergo, I am expecting on a destroy that the cluster would be deleted after the service since the service depends on the cluster
|
Any update on this issue? Is there a plan to fix this? Or at least provide/output a machine readable list of services to be destroyed before destroying the instances? |
I think Terraform should stop/terminate the instances as part of the destroy process, right now you have to manually terminate instances in order for the destroy action to finish. |
hey, we are trying to automate this destruction of instances instead of doing it manually. Is there a recommended way to automate this? Our application code is in Java. One way to do this could be to parse the generated terraform plan(by "terraform destroy" command). Can you help us find a way to parse the terraform plan to identify what instances/clusters need to be destroyed? |
You can prevent that situation with splitting your terraform project in at least two. You can use remote_state for that. If you put ECS cluster and service creation in two different projects, when you want to destroy, you can call first destroy process of service, then ECS cluster can be destroyed without any problem |
Is there any solution here? Terraform was working great for me and now I'm having the same error "The Cluster cannot be deleted while Services are active" and don't understand why I need to manually stop/terminate the instances... |
I am seeing this with 0.12.7 in my company's production environment intermittently. Is there any way to specify a "depends_on" or "teardown_first" which works for teardown? |
I am seeing this still on latest version... |
I'm here for the same issue - has anyone found a workaround? Or can anyone confirm that this sometimes works (even after n retries)? Otherwise, it seems the Hoping to better understand if this never works or if it's just a retry/interim issue or an issue particular to a set of configs. UPDATE: In my particular instance, I can confirm upon retry that |
+1 having the same issue here. Latest version on Terraform Cloud |
I have the same issue with terraform 0.12.19 |
Hey everyone, I'm using AWS CloudFormation and I'm experiencing this issue as well. I'm currently suspecting that it's not an issue with either CloudFormation or Terraform, but possibly with the underlying EC2 AMI. I'm using the Amazon Linux 2 AMI, while an example I'm referencing is using Amazon Linux 1, and the latter deletes fine while my former does not (even with an explicit DependsOn and Refs sprinkled throughout). There were a good number of changes to Amazon Linux 2, which I'm guessing may have included a change to |
Not sure if this is the right place to complain, but probably the same issue here:
Surprisingly, 2 moments:
data "aws_ami" "amazon2_ecs_optimized" {}
resource "aws_launch_template" "this" {}
resource "aws_autoscaling_group" "this" {}
resource "aws_ecs_task_definition" "this" {}
resource "aws_ecs_service" "default" {
# ...
depends_on = [
# consider note at https://www.terraform.io/docs/providers/aws/r/ecs_service.html
aws_iam_role_policy.ecs_service
]
# ...
}
resource "aws_ecs_cluster" "application" {} p.s. will try to build workaround with |
I have also faced the exact similar issue as raised by mikalai-t. |
Still didn't implement a workaround, but... I noticed that sometimes even termination process took a while, so I assumed our application becomes unresponsive and consumes too much CPU and therefore EC2 instance failed to respond in time. |
Forcing this has no effect, and it is a known bug in Terraform. hashicorp/terraform-provider-aws#4852
We are seeing dependency issues when running `terraform destroy` Two issues are preventing a clean destroy: 1. Terraform attempts to destroy network resources before other resources. This fails because you cannot destroy a VPC when you have services running in it. 2. Terraform attempts to destroy the ECS cluster before the auto scaling group that serves as the compute for the capacity provider. This PR addresses the first issue, by leveraging the module `depends_on` feature in Terraform 0.13. The second issue still needs to be addressed by extracting the auto scaling group into its own module and having the ECS cluster depend on it. hashicorp/terraform-provider-aws#4852 To use this for local development, run `make init`, which will reconfigure the state to use the new version of Terraform. A PR following this will remove the `-reconfigure` flag from the Makefile once everyone has upgraded.
We are seeing dependency issues when running `terraform destroy` Two issues are preventing a clean destroy: 1. Terraform attempts to destroy network resources before other resources. This fails because you cannot destroy a VPC when you have services running in it. 2. Terraform attempts to destroy the ECS cluster before the auto scaling group that serves as the compute for the capacity provider. This PR addresses the first issue, by leveraging the module `depends_on` feature in Terraform 0.13. The second issue still needs to be addressed by extracting the auto scaling group into its own module and having the ECS cluster depend on it. hashicorp/terraform-provider-aws#4852 To use this for local development, run `make init`, which will reconfigure the state to use the new version of Terraform. A PR following this will remove the `-reconfigure` flag from the Makefile once everyone has upgraded.
* Upgrade Terraform to version 0.13 We are seeing dependency issues when running `terraform destroy` Two issues are preventing a clean destroy: 1. Terraform attempts to destroy network resources before other resources. This fails because you cannot destroy a VPC when you have services running in it. 2. Terraform attempts to destroy the ECS cluster before the auto scaling group that serves as the compute for the capacity provider. This PR addresses the first issue, by leveraging the module `depends_on` feature in Terraform 0.13. The second issue still needs to be addressed by extracting the auto scaling group into its own module and having the ECS cluster depend on it. hashicorp/terraform-provider-aws#4852 To use this for local development, run `make init`, which will reconfigure the state to use the new version of Terraform. A PR following this will remove the `-reconfigure` flag from the Makefile once everyone has upgraded. * Manually remove auto scaling groups before destroy Due to a bug in Terraform, ECS is unable to delete before the auto scaling group has been removed. Use the aws command line in combination with your current workspace to delete the auto scaling group as a separate step before running terraform destroy. This is wrapped up in `make destroy`, and `terraform destroy` should not be used. Because calling aws from the command line is unable to assume a role unless the arn is known, the `aws-vault` commands need to be hardcoded within the Makefile.
It's still happening Terraform 0.12.26 with aws provider 3.19. Error: Error deleting ECS cluster: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active or draining.
Error: Error waiting for internet gateway (igw-0cab*******25) to detach: timeout while waiting for state to become 'detached' (last state: 'detaching', timeout: 15m0s) Reason: Work around: aws autoscaling update-auto-scaling-group --auto-scaling-group-name "my-auto-scaling-group-name" --min-size 0 --desired-capacity 0 But I think this action should be handled by aws provider when do Terraform destroy. |
This issue still repos on terraform v0.14.0 and aws provider >= 3.16. Something I have noticed is that it spins on deleting the capacity provider. If I manually delete the capacity provider (from aws UI) it occurs right away. Maybe terraform is making an improper call to AWS API? |
same issue in v0.14.0 for me also , i get it when more than 1 service and task definition is defined and created |
(The Cluster cannot be deleted/renamed while Container Instances are active or draining. ) + attempt to inverse dependencies on efs_sg_ids and efs_id for ASG aws_launch_configuration
Any updates on this? I'm having one of the issues mentioned above, terraform cannot delete ECS cluster with active Container Instances. I'm using ECS managed ASG setup. I think the order of destruction is correct. So ASG is created before ECS, ECS depends on the ASG ARN. When running destroy, it tries to apply that on ECS first. Are there any means to bypass this check when destroying? maybe force the cluster to be removed, so that ASG removal can kick in. Right know I have to manually delete the ASG when terraform tries to remove the cluster |
I have the same problem... |
We intermittently get this error too when destroying our infrastructure. We have a retry built into our wrapper scripts and on Friday our failure looked like this:
|
same issue in v0.14.6 😭 |
For me the work around from @Zogoo did the trick.
The other work around from @jm4games also works. To do it from aws cli:
|
Having this issue too. On
This started around Terraform 0.12, and we added retries to work around it. We're now upgrading to 0.15, and the retries no longer seem to help, so this is a blocker. |
I also have same issue and received below error. Error: error waiting for ECS Capacity Provider (arn:aws:ecs:eu-west-1:account-id:capacity-provider/asg-ec2-cp) to delete: timeout while waiting for resource to be gone (last state: 'ACTIVE', timeout: 20m0s) Error: Error deleting ECS cluster: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active or draining. do we have any fix for this? |
Hi all 👋 Thanks for taking the time to submit this issue and for the ongoing discussion. It looks like this is a duplicate of #11409. We like to try to keep discussions consolidated, and while this issue was filed first, the other one has more reactions (something we use to help gauge community interest in an issue/PR), and a suggested workaround. With that in mind, we’re going to close this new issue in favor of #11409. |
Deleting stacks using ECS clusters having capacityProviders (i.e. dual-primary and primary-replica recipes), fails with: ``` The Cluster cannot be deleted while Container Instances are active or draining. ``` This is an issue that manifests itself as well via terraform [1] or CDK [2]. Explicitly deleting the Autoscaling Groups _before_ the ECS cluster deletion fixes the problem, since it ensures that no instances are active or draining, as the error suggests. This is safe to do, because prior to deleting the Autoscaling Groups, every ECS service has already been destroyed, thus no instance is actually running. [1] hashicorp/terraform-provider-aws#4852 [2] aws/aws-cdk#14732 Bug: Issue 14698 Change-Id: I216307ef88bd7b7317706d2dc0a6a6e6fb367bd4 Change-Id: I27ece0f6971b157a474d91d7f3d9243dcff596e6
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
This issue was originally opened by @jaloren as hashicorp/terraform#18263. It was migrated here as a result of the provider split. The original body of the issue is below.
I am using the aws_cloudformation_stack resource to provision an aws Elastic Container Service cluster and one or more services in that cluster. I used terraform graph -type=plan-destroy to verify that I successfully set up a dependency relationship in terraform between the TF resource for creating the service and the TF resource for creating the ECS cluster.
According to graphviz, the service is a child node of the ecs cluster node. Given that, I am expecting TF to delete the service and then delete the cluster. However, this seems to happen out of order, which causes the delete of the ECS cluster to fail since you can't delete a cluster that has services in it.
Terraform Version
Expected Behavior
Terraform successfully delete aws ECS cluster and its associated services.
Actual Behavior
Terraform successfully deleted the service in the ECS cluster but failed to delete the ECS cluster itself with the following error:
Steps to Reproduce
Please list the full steps required to reproduce the issue, for example:
terraform init
terraform plan
terraform apply
The text was updated successfully, but these errors were encountered: