intermittent issue with aws_appautoscaling_target and aws_appautoscaling_policy creation #427

hashibot · 2017-06-13T19:20:40Z

This issue was originally opened by @brikis98 as hashicorp/terraform#10737. It was migrated here as part of the provider split. The original body of the issue is below.

Terraform Version

Terraform v0.7.7

Affected Resource(s)

aws_appautoscaling_target
aws_appautoscaling_policy

Terraform Configuration Files

resource "aws_appautoscaling_target" "target" {
  name = "MyService"
  resource_id = "service/MyCluster/MyService"
  role_arn = "${aws_iam_role.role.arn}"
  min_capacity = 1
  max_capacity = 10
}

resource "aws_appautoscaling_policy" "scale_out" {
  name = "MyServiceScaleOut"
  resource_id = "service/MyCluster/MyService"

  adjustment_type = "ChangeInCapacity"
  cooldown = 60
  metric_aggregation_type = "Average"

  step_adjustment {
    metric_interval_lower_bound = 0
    scaling_adjustment = 1
  }

  depends_on = ["aws_appautoscaling_target.target"]
}

Expected Behavior

The auto scaling policy and target are both created successfully.

Actual Behavior

Intermittently, you get an error like this:

Error putting scaling policy: ObjectNotFoundException: No scalable target registered for service namespace: ecs (... sorry, I forgot to copy/paste the rest of the error ...)

If you re-run terraform apply, the error usually goes away.

Steps to Reproduce

terraform apply

Important Factoids

If you re-run terraform apply, sometimes multiple times, it works. I suspect this is a classic issue with the fact that AWS is asynchronous and eventually consistent and Terraform is not properly waiting for the aws_appautoscaling_target resource to be created.

The text was updated successfully, but these errors were encountered:

aleerizw-zz · 2017-08-22T08:27:07Z

I am still experiencing the issue in Terraform v0.10.2. Creating an aws_appautoscaling_policy fails with Error putting scaling policy: ObjectNotFoundException: No scalable target registered for service namespace: ecs, resource ID: service/signin-service/signin-service-test, scalable dimension: ecs:service:DesiredCount

As can be seen by the following log:

module.signin-service-test.aws_appautoscaling_target.target: Creating...
  max_capacity:       "" => "3"
  min_capacity:       "" => "1"
  resource_id:        "" => "service/signin-service/signin-service-test"
  role_arn:           "" => "arn:aws:iam::489198589229:role/ecsAutoscaleRole"
  scalable_dimension: "" => "ecs:service:DesiredCount"
  service_namespace:  "" => "ecs"
module.signin-service-test-auto-scale-down.aws_appautoscaling_policy.policy: Creating...
  adjustment_type:                                        "" => "ChangeInCapacity"
  arn:                                                    "" => "<computed>"
  cooldown:                                               "" => "60"
  metric_aggregation_type:                                "" => "Average"
  name:                                                   "" => "signin-service-test-CPU_util_low-scale-down"
  policy_type:                                            "" => "StepScaling"
  resource_id:                                            "" => "service/signin-service/signin-service-test"
  scalable_dimension:                                     "" => "ecs:service:DesiredCount"
  service_namespace:                                      "" => "ecs"
  step_adjustment.#:                                      "" => "1"
  step_adjustment.2173517692.metric_interval_lower_bound: "" => ""
  step_adjustment.2173517692.metric_interval_upper_bound: "" => "0"
  step_adjustment.2173517692.scaling_adjustment:          "" => "-1"
module.signin-service-test-auto-scale-up.aws_appautoscaling_policy.policy: Creating...
  adjustment_type:                                        "" => "ChangeInCapacity"
  arn:                                                    "" => "<computed>"
  cooldown:                                               "" => "60"
  metric_aggregation_type:                                "" => "Average"
  name:                                                   "" => "signin-service-test-CPU_util_high-scale-up"
  policy_type:                                            "" => "StepScaling"
  resource_id:                                            "" => "service/signin-service/signin-service-test"
  scalable_dimension:                                     "" => "ecs:service:DesiredCount"
  service_namespace:                                      "" => "ecs"
  step_adjustment.#:                                      "" => "1"
  step_adjustment.2087484785.metric_interval_lower_bound: "" => "0"
  step_adjustment.2087484785.metric_interval_upper_bound: "" => ""
  step_adjustment.2087484785.scaling_adjustment:          "" => "1"
module.signin-service-test.aws_appautoscaling_target.target: Creation complete (ID: service/signin-service/signin-service-test)
Error applying plan:

2 error(s) occurred:

* module.signin-service-test-auto-scale-down.aws_appautoscaling_policy.policy: 1 error(s) occurred:

* aws_appautoscaling_policy.policy: Error putting scaling policy: ObjectNotFoundException: No scalable target registered for service namespace: ecs, resource ID: service/signin-service/signin-service-test, scalable dimension: ecs:service:DesiredCount
	status code: 400, request id: fe5e66a1-8712-11e7-af2f-f9c0a5cbf2c7
* module.signin-service-test-auto-scale-up.aws_appautoscaling_policy.policy: 1 error(s) occurred:

* aws_appautoscaling_policy.policy: Error putting scaling policy: ObjectNotFoundException: No scalable target registered for service namespace: ecs, resource ID: service/signin-service/signin-service-test, scalable dimension: ecs:service:DesiredCount
	status code: 400, request id: fe5f9ee6-8712-11e7-92e4-8b76b81521b8

The aws_appautoscaling_target is created before the policy but the policy creation still fails. Re running terraform apply solves the problem.

spanktar · 2017-11-30T23:27:44Z

I can confirm in 0.10.7

Added a depends_on block to no avail.:

depends_on = ["aws_alb_target_group.alb_tg_external"]

spanktar · 2017-12-04T21:13:59Z

I have an absolutely horrifying hack that fixes this for now. Read it and weep:
/cc @apparentlymart (thought you'd appreciate this)

resource "aws_appautoscaling_target" "ecs_asp_target" {
  max_capacity       = "${var.globals["container_max"]}"
  min_capacity       = "${var.globals["container_min"]}"
  resource_id        = "service/${var.globals["ecs_name"]}/${aws_ecs_service.ecs_service.name}"
  role_arn           = "${var.globals["iamrole_ecs_autoscaling_arn"]}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"

  lifecycle {
    ignore_changes = ["max_capacity", "min_capacity"]
  }
}

resource "aws_appautoscaling_policy" "ecs_asp_down" {
  count = "${local.enabled}"

  name               = "tf_${var.globals["environment"]}_${var.globals["app_name"]}_scale_down"
  resource_id        = "service/${var.globals["ecs_name"]}/${aws_ecs_service.ecs_service.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"

  step_scaling_policy_configuration {
    adjustment_type         = "ChangeInCapacity"
    cooldown                = 120
    metric_aggregation_type = "Maximum"

    step_adjustment {
      metric_interval_upper_bound = 0
      scaling_adjustment          = -1
    }
  }

  depends_on = ["aws_alb_target_group.alb_tg_external","null_resource.timer"]

}

resource "null_resource" "timer" {
  provisioner "local-exec" {
    command = "sleep 9"
  }
  triggers {
    trigger = "${aws_appautoscaling_target.ecs_asp_target.resource_id}"
  }
}

luc-tielen · 2018-04-30T12:16:35Z

Any updates on this? Also running into this?

indranil272 · 2018-09-07T20:36:29Z

I'm running into this issue with Terraform version v.0.11.0

Edit: i had typo in my policy

tpoindessous · 2018-11-20T15:24:19Z

Hi

I just got hit by this issue with

Terraform v0.11.5

provider.aws v1.46.0

And just like you, I re-ran terraform apply and it fixed the problem.

Thomas

…d` updates and ignore `ObjectNotFoundException` on deletion References: * #7963 * #5747 * #538 * #486 * #427 * #404 Previously the documentation recommended an ECS setup that used `depends_on` combined with an updateable `resource_id` attribute, that could introduce very subtle bugs in the operation of the `aws_appautoscaling_policy` resource when the either underlying Application AutoScaling Target or target resource (e.g. ECS service) was updated or recreated. Given the scenario with an `aws_appautoscaling_policy` configuration: * No direct attributes references to its `aws_appautoscaling_target` parent (usage with or without `depends_on` is inconsequential except without its usage in this case, it would generate errors that the target does not exist due to lack of proper ordering) * `resource_id` directly references the target resource (e.g. an ECS service) * The underlying `resource_id` target resource (e.g. an ECS service) is pointed to a new location or the resource is recreated The `aws_appautoscaling_policy` resource would plan as an resource update of just the `resource_id` attribute instead of resource recreation. Several consquences could occur in this situation depending on the exact ordering and Terraform configuration: * Since the Application AutoScaling Policy API only supports a `PUT` type operation for creation and update, a new policy would create successfully (given the Application AutoScaling Target was already in place), hiding any coding errors that might have been found if it was attempting to update a non-created policy * Usage of only `depends_on` to reference the Application AutoScaling Target could miss creating the Application AutoScaling Policy in a single apply since `depends_on` is purely for ordering * The lack of Application AutoScaling Policy deletion could leave dangling policies on the previous Application AutoScaling Target unless it was updated (which it correctly recreates the resource in Terraform) or otherwise deleted * The Terraform resource would not know to properly update the value of other computed attributes during plan, such as `arn`, potentially only noticing these attribute values as a new applied value different from the planned value These situations could surface as Terraform bugs in multiple ways: * In friendlier cases, a second apply would be required to create the missing policy or update downstream computed references * In worse cases, Terraform would report errors (depending on the Terraform version) such as `Resource 'aws_appautoscaling_policy.example' does not have attribute 'arn'` and `diffs didn't match during apply` for downstream attribute references to those computed attributes To prevent these situations, the `ResourceId` of the Application AutoScaling Policy needs be treated as part of the API object ID, similar to Application AutoScaling Targets, and marked `ForceNew: true` in the Terraform resource schema. We also ensure the documentation examples always recommend direct references to the upstream `aws_appautoscaling_target` instead of using `depends_on` so Terraform properly handles recreations when necessary, e.g. ```hcl resource "aws_appautoscaling_target" "example" { # ... other configuration ... } resource "aws_appautoscaling_policy" "example" { # ... other configuration ... resource_id = "${aws_appautoscaling_target.example.resource_id}" scalable_dimension = "${aws_appautoscaling_target.example.scalable_dimension}" service_namespace = "${aws_appautoscaling_target.example.service_namespace}" } ``` During research of this bug, it was also similarly discovered that the `aws_appautoscaling_policy` resource did not gracefully handle external deletions of the Application AutoScaling Policy without a refresh or potential deletion race conditions with the following error: ``` ObjectNotFoundException: No scaling policy found for service namespace: ecs, resource ID: service/tf-acc-test-9190521664283069857/tf-acc-test-9190521664283069857, scalable dimension: ecs:service:DesiredCount, policy name: tf-acc-test-9190521664283069857 ``` We include ignoring this potential error on deletion as part of the comprehesive solution to ensuring resource recreations are successful. Output from acceptance testing before code update: ``` --- FAIL: TestAccAWSAppautoScalingPolicy_ResourceId_ForceNew (54.69s) testing.go:538: Step 1 error: After applying this step, the plan was not empty: DIFF: UPDATE: aws_cloudwatch_metric_alarm.test alarm_actions.3359603714: "arn:aws:autoscaling:us-west-2:--OMITTED--:scalingPolicy:065d03ea-a7a4-4047-9a43-c92ec1871170:resource/ecs/service/tf-acc-test-2456603151506624334/tf-acc-test-2456603151506624334-1:policyName/tf-acc-test-2456603151506624334" => "" alarm_actions.4257611624: "" => "arn:aws:autoscaling:us-west-2:--OMITTED--:scalingPolicy:cdc6d280-8a93-4c67-9790-abb47fd167c6:resource/ecs/service/tf-acc-test-2456603151506624334/tf-acc-test-2456603151506624334-2:policyName/tf-acc-test-2456603151506624334" ``` Output from acceptance testing: ``` --- PASS: TestAccAWSAppautoScalingPolicy_disappears (26.48s) --- PASS: TestAccAWSAppautoScalingPolicy_scaleOutAndIn (28.53s) --- PASS: TestAccAWSAppautoScalingPolicy_ResourceId_ForceNew (43.25s) --- PASS: TestAccAWSAppautoScalingPolicy_basic (46.47s) --- PASS: TestAccAWSAppautoScalingPolicy_spotFleetRequest (61.26s) --- PASS: TestAccAWSAppautoScalingPolicy_dynamoDb (115.02s) --- PASS: TestAccAWSAppautoScalingPolicy_multiplePoliciesSameResource (116.06s) --- PASS: TestAccAWSAppautoScalingPolicy_multiplePoliciesSameName (116.80s) ```

bflad · 2019-03-20T16:32:20Z

Hi folks 👋 Sorry for the unexpected behavior here. It turns out there was a very subtle bug that prevented proper recreation of Application AutoScaling Policies when only the resource_id argument changed. We also now properly ignore ObjectNotFoundException errors during Terraform resource destroy. A full writeup of these changes can be found in #7982. These fixes will be released in version 2.3.0 of the Terraform AWS Provider in the next day or two.

Even without upgrading your Terraform AWS Provider to the newer version, you may be able to workaround the original issue with a simple configuration update. Using direct references to the relevant aws_appautoscaling_target resource for the resource_id, scalable_dimension, and service_namespace arguments in the aws_appautoscaling_policy resource configuration should be enough in many cases to ensure Terraform has proper ordering information to handle recreations of the Application AutoScaling Policy resource when the underlying Application AutoScaling Target or resource associated with the target changes:

resource "aws_appautoscaling_target" "example" {
  # ... other configuration ...
}

resource "aws_appautoscaling_policy" "example" {
 # ... other configuration ...

  resource_id        = "${aws_appautoscaling_target.example.resource_id}"
  scalable_dimension = "${aws_appautoscaling_target.example.scalable_dimension}"
  service_namespace  = "${aws_appautoscaling_target.example.service_namespace}"
}

If you are still having trouble after upgrading to version 2.3.0 of the Terraform AWS Provider (when its released) and with a configuration looking similar to the above, please create a new GitHub issue with the relevant details from the issue template and we can further triage. Thanks!

bflad · 2019-03-21T20:58:19Z

This has been released in version 2.3.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

tomelliff · 2019-04-10T14:15:46Z

Oddly I've just hit this issue in one of our temporary environments when the target group changed.

We're using my fork from #6314 which is based off 2.4.0 (with my change applied over the top) and the above change in https://github.com/terraform-providers/terraform-provider-aws/pull/7982/files looks good to me so I'm not too sure why this has failed.

Partial output from the failed apply:

module.frontend_service.aws_appautoscaling_policy.ecs_web_policy: Modifying... (ID: service/test/frontend-feat-website-ALBRequestCountPerTarget)
  target_tracking_scaling_policy_configuration.0.predefined_metric_specification.0.resource_label: "app/ecs-external-test/REDACTED/targetgroup/feat-website-frontend-ext/REDACTED" => "app/ecs-internal-test/REDACTED/targetgroup/feat-website-frontend-int/REDACTED"
module.frontend_service.aws_lb_listener_rule.host_based_routing: Creating...
  action.#:                      "" => "1"
  action.0.order:                "" => "<computed>"
  action.0.target_group_arn:     "" => "arn:aws:elasticloadbalancing:eu-west-1:REDACTED:targetgroup/feat-website-frontend-int/REDACTED"
  action.0.type:                 "" => "forward"
  arn:                           "" => "<computed>"
  condition.#:                   "" => "1"
  condition.3210165143.field:    "" => "host-header"
  condition.3210165143.values.#: "" => "1"
  condition.3210165143.values.0: "" => "REDACTED"
  listener_arn:                  "" => "arn:aws:elasticloadbalancing:eu-west-1:REDACTED:listener/app/ecs-internal-test/REDACTED/REDACTED"
  priority:                      "" => "<computed>"
module.frontend_service.aws_lb_listener_rule.host_based_routing: Creation complete after 0s (ID: arn:aws:elasticloadbalancing:eu-west-1:...ca27/REDACTED/REDACTED)
module.frontend_service.aws_ecs_service.web_service: Creation complete after 0s (ID: arn:aws:ecs:eu-west-1:REDACTED:service/test/frontend-feat-website)
module.frontend_service.null_resource.wait_for_service_deploy: Creating...
  triggers.%:               "" => "1"
  triggers.task_definition: "" => "arn:aws:ecs:eu-west-1:REDACTED:task-definition/frontend-feat-website:17"
module.frontend_service.null_resource.wait_for_service_deploy: Provisioning with 'local-exec'...
module.frontend_service.null_resource.wait_for_service_deploy (local-exec): Executing: ["/bin/sh" "-c" "aws ecs wait services-stable --services frontend-feat-website --cluster test --region eu-west-1"]
module.frontend_service.null_resource.wait_for_service_deploy: Still creating... (10s elapsed)
module.frontend_service.null_resource.wait_for_service_deploy: Still creating... (20s elapsed)
module.frontend_service.null_resource.wait_for_service_deploy: Still creating... (30s elapsed)
module.frontend_service.null_resource.wait_for_service_deploy: Creation complete after 31s (ID: 5160621590471942553)

Error: Error applying plan:

1 error(s) occurred:

* module.frontend_service.aws_appautoscaling_policy.ecs_web_policy: 1 error(s) occurred:

* aws_appautoscaling_policy.ecs_web_policy: Failed to update scaling policy: ObjectNotFoundException: No scalable target registered for service namespace: ecs, resource ID: service/test/frontend-feat-website, scalable dimension: ecs:service:DesiredCount
	status code: 400, request id: c377d376-5b97-11e9-9fd1-07a87949a6fb

tomelliff · 2019-04-10T14:24:18Z

Oh, derp, misread the change.

I guess we need to catch the ObjectNotFoundException during update and if so retry it?

Closes hashicorp#427.

bflad · 2019-04-10T16:40:42Z

Thanks for the additional fix here @tomelliff, which looks more appropriate for this particular issue. I've updated the milestone here to version 2.6.0 since it will likely be more helpful. Version 2.6.0 is planned to ship later today.

ghost · 2020-03-30T17:47:26Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

hashibot added the bug Addresses a defect in current functionality. label Jun 13, 2017

radeksimko added the service/applicationautoscaling label Jan 25, 2018

bflad mentioned this issue Mar 17, 2019

resource/aws_appautoscaling_policy: Recreate resource for resource_id updates and ignore ObjectNotFoundException on deletion #7982

Merged

bflad added this to the v2.3.0 milestone Mar 20, 2019

bflad closed this as completed in #7982 Mar 20, 2019

tomelliff added a commit to tomelliff/terraform-provider-aws that referenced this issue Apr 10, 2019

Retry app autoscaling policy create/update on ObjectNotFound error

d082965

Closes hashicorp#427.

tomelliff mentioned this issue Apr 10, 2019

Retry app autoscaling policy create/update on ObjectNotFound error #8273

Merged

bflad modified the milestones: v2.3.0, v2.6.0 Apr 10, 2019

ghost locked and limited conversation to collaborators Mar 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

intermittent issue with aws_appautoscaling_target and aws_appautoscaling_policy creation #427

intermittent issue with aws_appautoscaling_target and aws_appautoscaling_policy creation #427

hashibot commented Jun 13, 2017

aleerizw-zz commented Aug 22, 2017 •

edited

Loading

spanktar commented Nov 30, 2017

spanktar commented Dec 4, 2017 •

edited

Loading

luc-tielen commented Apr 30, 2018

indranil272 commented Sep 7, 2018 •

edited

Loading

tpoindessous commented Nov 20, 2018

bflad commented Mar 20, 2019

bflad commented Mar 21, 2019

tomelliff commented Apr 10, 2019 •

edited

Loading

tomelliff commented Apr 10, 2019

bflad commented Apr 10, 2019

ghost commented Mar 30, 2020

intermittent issue with aws_appautoscaling_target and aws_appautoscaling_policy creation #427

intermittent issue with aws_appautoscaling_target and aws_appautoscaling_policy creation #427

Comments

hashibot commented Jun 13, 2017

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

aleerizw-zz commented Aug 22, 2017 • edited Loading

spanktar commented Nov 30, 2017

spanktar commented Dec 4, 2017 • edited Loading

luc-tielen commented Apr 30, 2018

indranil272 commented Sep 7, 2018 • edited Loading

tpoindessous commented Nov 20, 2018

bflad commented Mar 20, 2019

bflad commented Mar 21, 2019

tomelliff commented Apr 10, 2019 • edited Loading

tomelliff commented Apr 10, 2019

bflad commented Apr 10, 2019

ghost commented Mar 30, 2020

aleerizw-zz commented Aug 22, 2017 •

edited

Loading

spanktar commented Dec 4, 2017 •

edited

Loading

indranil272 commented Sep 7, 2018 •

edited

Loading

tomelliff commented Apr 10, 2019 •

edited

Loading