Preserve current desired counts on deploy of auto scaled resources #5215

kkelk · 2019-11-27T14:11:20Z

When configuring an ec2 AutoScalingGroup or ECS service autoscaling (and likely other examples), the CDK requires a desired count value or will set it to a fixed default (1 for ECS, more complex behaviour for EC2). In general, this is annoying for autoscaling, as it means that a cdk deploy will trample over any actions taken by the scaling policy - possibly causing an outage or overspend, depending on the value set.

I propose that an option is added to these constructs to set the desired count to the current value for the resource if it exists. e.g. if there are 20 instances registered in my autoscaling group at the start of the cdk deploy, it should preserve this value.

Implementing this feature would also provide a solution for this bug report.

Use Case

We perform a cdk deploy for each of our 10s of ECS services in pipelines, triggered by a code push of our CDK app package. This means that we're doing many frequent deployments, and it's either dangerous or costly to have our resources scale far away from where the scaling policy placed them.

Proposed Solution

We're working around this issue by querying the current value using the AWS APIs, and explicitly setting the desired count to this value in an attempt to leave it unchanged. My proposal (unless there is a better way) is for the CDK to do this work at deploy time.

Rough Python ECS example:

ecs_client = boto3.client('ecs', region_name=self.region)
existing_services = ecs_client.describe_services(cluster=cluster_name,
                                                 services=[service_name])['services']
if len(existing_services) == 0:
    desired_task_count = 3
else:
    assert len(existing_services) == 1
    desired_task_count = existing_services[0]['desiredCount']

service = ecs.Ec2Service(self, 'EcsService', service_name=service_name, task_definition=task_definition, cluster=ecs_cluster, desired_count=desired_task_count)
scaling = service.auto_scale_task_count(min_capacity=1, max_capacity=100)

scaling.scale_to_track_custom_metric('ServiceAutoScaling',
                                     metric=cloudwatch.Metric(metric_name='Utilization', namespace='ModelServing',
                                                              dimensions={'service': service_name},
                                                              period=core.Duration.minutes(1)),
                                     target_value=0.5)

We have similar logic for EC2 autoscaling, but it's less generalizable due to not knowing the physical name of the ASG at execution time, but happy to discuss if it's helpful.

Other

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

The text was updated successfully, but these errors were encountered:

rix0rrr · 2019-11-28T09:49:09Z

What happens if we leave DesiredCapacity undefined in the generated CloudFormation template? Would that do it?

kkelk · 2019-11-28T12:32:33Z

According to AWS::AutoScaling::AutoScalingGroup:

DesiredCapacity
... If you do not specify a desired capacity, the default is the minimum size of the group.

A thread from a couple of years ago claims that not setting it actually means CloudFormation won't change it for an existing group — but that sounds like relying on undefined behaviour to me.

For AWS::ECS::Service, DesiredCount is a required property for the REPLICA schedulingStrategy, so it wouldn't work there anyway, and a general solution for all auto-scaled resources would be nice.

Arguably, this feature request is better pushed upstream to CloudFormation, but I figured that it's much easier to implement a solution in code in the CDK than to fight that battle — what are your thoughts?

I note that I've found discussions on the problem in CloudFormation in at least a couple of places - and in both the suggested workaround is to get the current value from the AWS APIs as I do in my example above.

rix0rrr · 2019-11-28T19:59:38Z

The problem here is that CDK doesn't really have a mechanism to query the API on every deploy invocation, so we'd have to make one.

In addition, I'm concerned about CI/CD pipelines, where the moment of querying is potentially far removed in time from the moment of actual deployment.

kkelk · 2019-11-29T10:49:14Z

I see. I hadn't thought of pipelines working that way — good point on that one — although I think it being optional would help, and I'd argue querying at the wrong time is still better than the behaviour today.

Do you see any sensible path through this? Do you think it's at all likely that a solution could be implemented in CloudFormation? e.g. if all resources with desired count/capacity allowed omitting the property and defined it to mean leaving the value untouched, that would be ideal, and require only a minimal change in the CDK.

rix0rrr · 2019-11-29T16:48:54Z

I think the current CloudFormation behavior for ASGs is actually intended, though underdocumented. The note about "missing defaults to minimum size" probably only applies to creation of the resource, not to updates.

For resources in which leaving desiredcount out does not work, we should treat that as a bug in the CloudFormation behavior, and report it upstream.

kkelk · 2019-12-02T12:37:52Z

Sounds like a good solution, thanks. I've opened a ticket against CloudFormation requesting a change to AWS::ECS::Service, as well as documenting this behaviour clearly in all cases.

For now, this feature request on the CDK is limited to having some way of omitting DesiredCapacity in the generated CloudFormation templates for AWS::AutoScaling::AutoScalingGroup. That could be a breaking change, and is perhaps related to this pull request.

If `DesiredCapacity` is specified in the CloudFormation template, on every deployment the capacity of the AutoScalingGroup is reset to that number, even if the group had been scaled out at that point. The solution is to leave DesiredCapacity empty, in which case it will remain untouched during a deployment. Previously, CDK would use some logic to always calculate a DesiredCapacity for you, even if you left the `desiredCapacity` property unset, leading to the undesirable behavior--which frankly represents an availability risk. Now, if you don't specify `desiredCapacity`, we won't set `DesiredCapacity` either, avoiding the availability risk that we introduced beforehand. In fact, if you *do* set `desiredCapacity`, we will warn you that you probably shouldn't using a construct warning. Fixes #5215, closes #5208. BREAKING CHANGE: AutoScalingGroups without `desiredCapacity` are now initially scaled to their minimum capacity (instead of their maximum capaciety).

* fix(autoscaling): every deployment resets capacity If `DesiredCapacity` is specified in the CloudFormation template, on every deployment the capacity of the AutoScalingGroup is reset to that number, even if the group had been scaled out at that point. The solution is to leave DesiredCapacity empty, in which case it will remain untouched during a deployment. Previously, CDK would use some logic to always calculate a DesiredCapacity for you, even if you left the `desiredCapacity` property unset, leading to the undesirable behavior--which frankly represents an availability risk. Now, if you don't specify `desiredCapacity`, we won't set `DesiredCapacity` either, avoiding the availability risk that we introduced beforehand. In fact, if you *do* set `desiredCapacity`, we will warn you that you probably shouldn't using a construct warning. Fixes #5215, closes #5208. BREAKING CHANGE: AutoScalingGroups without `desiredCapacity` are now initially scaled to their minimum capacity (instead of their maximum capaciety). * Add links Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

As described in #5215, `desiredCapacity` is not the recommended way to configure an auto scaling group since it will cause the ASG to reset the number of nodes in every CloudFormation deployment. Since EKS's default capacity uses `desiredCapacity` instead of `minCapacity`, as of #5507 this would emit a warning: "desiredCapacity has been configured. Be aware this will reset the size of your AutoScalingGroup on every deployment". This change modifies the behavior of the default capacity such that it will configure the ASG using `minCapacity` instead of `desiredCapacity` as recommended by ASG. Fixes #5650

…ern (#5651) * fix(eks): default capacity uses desiredCapacity which is an anti-pattern As described in #5215, `desiredCapacity` is not the recommended way to configure an auto scaling group since it will cause the ASG to reset the number of nodes in every CloudFormation deployment. Since EKS's default capacity uses `desiredCapacity` instead of `minCapacity`, as of #5507 this would emit a warning: "desiredCapacity has been configured. Be aware this will reset the size of your AutoScalingGroup on every deployment". This change modifies the behavior of the default capacity such that it will configure the ASG using `minCapacity` instead of `desiredCapacity` as recommended by ASG. Fixes #5650 * Update integ.eks-cluster.defaults.expected.json Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

andreprawira · 2024-01-11T02:21:25Z

so any updates here? how do we preserve the desired capacity?

kkelk added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Nov 27, 2019

SomayaB added the @aws-cdk/aws-autoscaling Related to Amazon EC2 Auto Scaling label Nov 27, 2019

SomayaB assigned rix0rrr Nov 27, 2019

rix0rrr assigned shivlaks Nov 28, 2019

SomayaB removed the needs-triage This issue or PR still needs to be triaged. label Dec 17, 2019

rix0rrr mentioned this issue Dec 20, 2019

fix(autoscaling): every deployment resets capacity #5507

Merged

SomayaB added the in-progress This issue is being actively worked on. label Dec 20, 2019

mergify bot closed this as completed in #5507 Dec 23, 2019

eladb mentioned this issue Jan 5, 2020

EKS: default capacity uses "desiredCapacity" instead of "minCapacity" #5650

Closed

eladb mentioned this issue Jan 5, 2020

fix(eks): default capacity uses desiredCapacity which is an anti-pattern #5651

Merged

alvyn279 mentioned this issue Jan 8, 2021

Road Map alvyn279/discord-events#1

Open

64 tasks

moltar mentioned this issue Jan 19, 2024

Warning: desiredCapacity has been configured. Be aware this will reset the size of your AutoScalingGroup on every deployment AndrewGuenther/cdk-fck-nat#286

Closed

dmac-cloud mentioned this issue Apr 18, 2024

CDK Deploy Error aws-samples/cost-effective-aws-deployment-of-comfyui#2

Closed

phuhung273 mentioned this issue Nov 5, 2024

(ec2): Unable to run integ.machine-image test #32017

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve current desired counts on deploy of auto scaled resources #5215

Preserve current desired counts on deploy of auto scaled resources #5215

kkelk commented Nov 27, 2019

rix0rrr commented Nov 28, 2019

kkelk commented Nov 28, 2019

rix0rrr commented Nov 28, 2019

kkelk commented Nov 29, 2019

rix0rrr commented Nov 29, 2019

kkelk commented Dec 2, 2019

andreprawira commented Jan 11, 2024

Preserve current desired counts on deploy of auto scaled resources #5215

Preserve current desired counts on deploy of auto scaled resources #5215

Comments

kkelk commented Nov 27, 2019

Use Case

Proposed Solution

Other

rix0rrr commented Nov 28, 2019

kkelk commented Nov 28, 2019

rix0rrr commented Nov 28, 2019

kkelk commented Nov 29, 2019

rix0rrr commented Nov 29, 2019

kkelk commented Dec 2, 2019

andreprawira commented Jan 11, 2024