Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS ECS service is replaced when changing adding a capacity provider #22408

Open
josiegd-verrency opened this issue Jan 5, 2022 · 5 comments
Labels
bug Addresses a defect in current functionality. service/ecs Issues and PRs that pertain to the ecs service.

Comments

@josiegd-verrency
Copy link

josiegd-verrency commented Jan 5, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

Terraform v0.12.31

  • provider.aws v3.70.0
  • provider.template v2.2.0

Affected Resource(s)

  • aws_ecs_service

Expected Behavior

The ECS service should have been updated/modified in place.

Actual Behavior

The ECS service was scheduled for replacement, which would cause an outage.

Plan output:

  # module.service.aws_ecs_service.service must be replaced
-/+ resource "aws_ecs_service" "service" {
        cluster                            = <cluster_arn>
        deployment_maximum_percent         = 200
        deployment_minimum_healthy_percent = 100
      ~ desired_count                      = 1 -> 10
        enable_ecs_managed_tags            = false
        enable_execute_command             = false
        health_check_grace_period_seconds  = 67
      ~ iam_role                           = "aws-service-role" -> (known after apply)
      ~ id                                 = <id> -> (known after apply)
      ~ launch_type                        = "FARGATE" -> (known after apply)
        name                               = "my-service"
      ~ platform_version                   = "LATEST" -> (known after apply)
      - propagate_tags                     = "NONE" -> null
        scheduling_strategy                = "REPLICA"
      - tags                               = {} -> null
      ~ tags_all                           = {} -> (known after apply)
        task_definition                    = <task_definition>
        wait_for_steady_state              = false

      + capacity_provider_strategy { # forces replacement
          + base              = 0
          + capacity_provider = "FARGATE"
          + weight            = 1
        }
      + capacity_provider_strategy { # forces replacement
          + base              = 0
          + capacity_provider = "FARGATE_SPOT"
          + weight            = 1
        }
...

Steps to Reproduce

  1. Create a vanilla aws_ecs_service without a capacity provider specified, launch_type = "FARGATE"
  2. terraform apply
  3. Change aws_ecs_service to specify a capacity_provider_strategy as follows:
      capacity_provider_strategy {
          capacity_provider = "FARGATE"
          weight            = 1
        }
      capacity_provider_strategy {
          capacity_provider = "FARGATE_SPOT"
          weight            = 1
        }
  1. terraform apply

Important Factoids

References

AWS docs: https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_UpdateService.html

@github-actions github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/ecs Issues and PRs that pertain to the ecs service. labels Jan 5, 2022
@anGie44 anGie44 added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Jan 5, 2022
@mkielar
Copy link
Contributor

mkielar commented Feb 3, 2022

Observing the same thing. with:

Terraform v1.1.4
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v3.74.0

I can successfully switch from launchType to capacityProviders using AWS CLI:

aws ecs update-service --cluster dev --service hello-world \
    --capacity-provider-strategy=capacityProvider=FARGATE,base=0,weight=100 \
    --force-new-deployment

but if I try doing this with terraform, I get service recreation warning.

We were just about to introduce FARGATE_SPOT in our services, and now we need to push these plans back a bit, because we cannot possible downtime like this on our production environments.

@mkielar
Copy link
Contributor

mkielar commented Feb 3, 2022

Looks like the bug is in this line:

if (ol == 0 && nl > 0) || (ol > 0 && nl == 0) {

The (ol == 0 && nl > 0) part of the if statement condition is unnecessary - AWS is capable of applying change from launchType to capacityProviders. The only additional requirement is to stop passing explicit launchType = "FARGATE" from terraform.

Also, the whole thing requires force_new_deployment flag to be set to true, otherwise terraform will recreate the service on any change to capacity providers:

// to be backward compatible, should ForceNew almost always (previous behavior), unless:
// force_new_deployment is true and
// neither the old set nor new set is 0 length
if v := d.Get("force_new_deployment").(bool); !v {
return capacityProviderStrategyForceNew(d)
}

I'm not entirely sure what kind of "backward compatibility" this commend mentions, but if that's something that terraform offered, and now would not, I think that the previous behaviour was a bug, and persisting it like this is not the greatest idea. In general, you don't want your ECS Service to get recreated when changes are applied, because that causes downtime.

@djakielski
Copy link

I see this behavior also in one of the latest versions. Is it possible to fix it in the provider?

$: terraform version
Terraform v1.5.4
on darwin_arm64
+ provider registry.terraform.io/hashicorp/aws v5.13.1
+ provider registry.terraform.io/hashicorp/http v3.4.0
+ provider registry.terraform.io/hashicorp/random v3.5.1
+ provider registry.terraform.io/hashicorp/tls v4.0.4

@ash5gfrankie
Copy link

I'm wondering if it's worth adding a new option, something like the below. Pretty much the same as the code block for force_new_deployment but limited to just allowing capacity provider updates without destroying the service if there are no other destructive changes

if v := d.Get("non_destructive_capacity_provider_update").(bool); !v { return capacityProviderStrategyForceNew(d) }

@dejanzele
Copy link

dejanzele commented Sep 24, 2024

Hi all,

I am interested in submiting a fix for this issue as it is impacting our internal usage also.

Is the community in agreement what are the latest requirements on how the update should work, as in the comments a couple of ideas are mentioned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Addresses a defect in current functionality. service/ecs Issues and PRs that pertain to the ecs service.
Projects
None yet
Development

No branches or pull requests

6 participants