Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS plan fails if cluster has been deleted outside Terraform #15917

Closed
adam-tylr opened this issue Oct 29, 2020 · 4 comments · Fixed by #15927
Closed

ECS plan fails if cluster has been deleted outside Terraform #15917

adam-tylr opened this issue Oct 29, 2020 · 4 comments · Fixed by #15927
Labels
bug Addresses a defect in current functionality. service/ecs Issues and PRs that pertain to the ecs service.
Milestone

Comments

@adam-tylr
Copy link
Contributor

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

Terraform v0.13.2
+ provider registry.terraform.io/hashicorp/aws v3.12.0

Affected Resource(s)

  • aws_ecs_service
  • aws_ecs_cluster

Terraform Configuration Files

resource "aws_ecs_cluster" "foo" {
  name = "my-cluster"
}

resource "aws_ecs_task_definition" "task" {
  family                = "service"
  container_definitions = file("service.json")
}

resource "aws_ecs_service" "service" {
  name            = "my-service"
  cluster         = aws_ecs_cluster.foo.id
  task_definition = aws_ecs_task_definition.task.arn
}

Service.json is taken straight from the example in the docs https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/ecs_task_definition

Debug Output

I cannot provide the full debug output because of security restrictions with my employer but this is the relevant section:

2020-10-29T14:11:51.026-0400 [DEBUG] plugin.terraform-provider-aws_v3.12.0_x5.exe: 2020/10/29 14:11:51 [DEBUG] [aws-sdk-go] DEBUG: Response ecs/DescribeServices Details:
2020-10-29T14:11:51.027-0400 [DEBUG] plugin.terraform-provider-aws_v3.12.0_x5.exe: ---[ RESPONSE ]--------------------------------------
2020-10-29T14:11:51.027-0400 [DEBUG] plugin.terraform-provider-aws_v3.12.0_x5.exe: HTTP/1.1 400
2020-10-29T14:11:51.027-0400 [DEBUG] plugin.terraform-provider-aws_v3.12.0_x5.exe: Connection: close
2020-10-29T14:11:51.027-0400 [DEBUG] plugin.terraform-provider-aws_v3.12.0_x5.exe: Content-Length: 68
2020-10-29T14:11:51.027-0400 [DEBUG] plugin.terraform-provider-aws_v3.12.0_x5.exe: Content-Type: application/x-amz-json-1.1
2020-10-29T14:11:51.027-0400 [DEBUG] plugin.terraform-provider-aws_v3.12.0_x5.exe: Date: Thu, 29 Oct 2020 18:11:50 GMT
2020-10-29T14:11:51.027-0400 [DEBUG] plugin.terraform-provider-aws_v3.12.0_x5.exe: X-Amzn-Requestid: 
2020-10-29T14:11:51.027-0400 [DEBUG] plugin.terraform-provider-aws_v3.12.0_x5.exe:
2020-10-29T14:11:51.027-0400 [DEBUG] plugin.terraform-provider-aws_v3.12.0_x5.exe:
2020-10-29T14:11:51.027-0400 [DEBUG] plugin.terraform-provider-aws_v3.12.0_x5.exe: -----------------------------------------------------
2020-10-29T14:11:51.027-0400 [DEBUG] plugin.terraform-provider-aws_v3.12.0_x5.exe: 2020/10/29 14:11:51 [DEBUG] [aws-sdk-go] {"__type":"ClusterNotFoundException","message":"Cluster not found."}
2020-10-29T14:11:51.027-0400 [DEBUG] plugin.terraform-provider-aws_v3.12.0_x5.exe: 2020/10/29 14:11:51 [DEBUG] [aws-sdk-go] DEBUG: Validate Response ecs/DescribeServices failed, attempt 0/25, error ClusterNotFoundException: Cluster not found.
2020/10/29 14:11:51 [ERROR] eval: *terraform.EvalRefresh, err: Error reading ECS service: ClusterNotFoundException: Cluster not found.
2020/10/29 14:11:51 [ERROR] eval: *terraform.EvalSequence, err: Error reading ECS service: ClusterNotFoundException: Cluster not found.

Panic Output

Expected Behavior

I created an ECS cluster with an associated service then manually deleted the cluster and the service in the console (or as part of a regular clean up script). When I run a plan again, I expect it to produce a valid plan to re-create the cluster and the service.

Actual Behavior

I created an ECS cluster with an associated service then manually deleted the cluster and the service in the console. When I run a plan again, terraform outputs Error: Error reading ECS service: ClusterNotFoundException: Cluster not found.

Steps to Reproduce

  1. terraform apply to create the cluster and service
  2. Manually delete the cluster in the AWS console
  3. Wait some undetermined amount of time for cluster to actually be removed (See factoids) or manually update the terraform state for the service to point to a non-existent cluster to simulate the same
  4. terraform plan

Important Factoids

When an ECS cluster and service are deleted, they are put in an inactive state and disappear from the UI but are not actually removed from the account. Described Here. As long as they exist in an inactive state there is no issue. What we've seen happen is the cluster being removed completely such that aws ecs describe-clusters --clusters <cluster-arn> produces an error instead of returning an inactive cluster. During the failed plan I see a sequence of events like:

  1. Call ecs/DescribeClusters with the expected cluster ARN from state
  2. Return code of 200 but with message saying cluster is missing
  3. Terraform output [WARN] ECS Cluster (arn:aws:ecs:us-east-1::cluster/my-cluster) not found, removing from state
  4. Call ecs/DescribeServices with the expected service and cluster ARN from state
  5. Return code of 400 with message saying ClusterNotFoundException
  6. Plan fails

So it seems like terraform needs to interpret a ClusterNotFoundException as a sign of needing to re-create the service.

It's difficult to fully replicate the issue because it depends on the cluster being removed from the account. I'm not sure how long that takes. I've had two internal customers come to me with this issue within 2 weeks of an account clean up. I was able to re-create for my simple example by updating the state of the service to point to a cluster that never existed.

References

@ghost ghost added the service/ecs Issues and PRs that pertain to the ecs service. label Oct 29, 2020
@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Oct 29, 2020
@adam-tylr
Copy link
Contributor Author

I should also add our current work around is to run terraform state rm aws_ecs_service.service after seeing this failure then running the plan again.

@anGie44 anGie44 added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Oct 30, 2020
@bflad bflad added this to the v3.15.0 milestone Nov 9, 2020
@bflad
Copy link
Contributor

bflad commented Nov 9, 2020

The fix for this has been merged and will release in version 3.15.0 of Terraform AWS Provider, later this week. Thanks to @adam-tylr for the implementation. 👍

@ghost
Copy link

ghost commented Nov 12, 2020

This has been released in version 3.15.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

@ghost
Copy link

ghost commented Dec 9, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/ecs Issues and PRs that pertain to the ecs service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants