Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve current desired counts on deploy of auto scaled resources #5215

Closed
2 tasks
kkelk opened this issue Nov 27, 2019 · 7 comments · Fixed by #5507
Closed
2 tasks

Preserve current desired counts on deploy of auto scaled resources #5215

kkelk opened this issue Nov 27, 2019 · 7 comments · Fixed by #5507
Assignees
Labels
@aws-cdk/aws-autoscaling Related to Amazon EC2 Auto Scaling feature-request A feature should be added or improved. in-progress This issue is being actively worked on.

Comments

@kkelk
Copy link

kkelk commented Nov 27, 2019

When configuring an ec2 AutoScalingGroup or ECS service autoscaling (and likely other examples), the CDK requires a desired count value or will set it to a fixed default (1 for ECS, more complex behaviour for EC2). In general, this is annoying for autoscaling, as it means that a cdk deploy will trample over any actions taken by the scaling policy - possibly causing an outage or overspend, depending on the value set.

I propose that an option is added to these constructs to set the desired count to the current value for the resource if it exists. e.g. if there are 20 instances registered in my autoscaling group at the start of the cdk deploy, it should preserve this value.

Implementing this feature would also provide a solution for this bug report.

Use Case

We perform a cdk deploy for each of our 10s of ECS services in pipelines, triggered by a code push of our CDK app package. This means that we're doing many frequent deployments, and it's either dangerous or costly to have our resources scale far away from where the scaling policy placed them.

Proposed Solution

We're working around this issue by querying the current value using the AWS APIs, and explicitly setting the desired count to this value in an attempt to leave it unchanged. My proposal (unless there is a better way) is for the CDK to do this work at deploy time.

Rough Python ECS example:

ecs_client = boto3.client('ecs', region_name=self.region)
existing_services = ecs_client.describe_services(cluster=cluster_name,
                                                 services=[service_name])['services']
if len(existing_services) == 0:
    desired_task_count = 3
else:
    assert len(existing_services) == 1
    desired_task_count = existing_services[0]['desiredCount']

service = ecs.Ec2Service(self, 'EcsService', service_name=service_name, task_definition=task_definition, cluster=ecs_cluster, desired_count=desired_task_count)
scaling = service.auto_scale_task_count(min_capacity=1, max_capacity=100)

scaling.scale_to_track_custom_metric('ServiceAutoScaling',
                                     metric=cloudwatch.Metric(metric_name='Utilization', namespace='ModelServing',
                                                              dimensions={'service': service_name},
                                                              period=core.Duration.minutes(1)),
                                     target_value=0.5)

We have similar logic for EC2 autoscaling, but it's less generalizable due to not knowing the physical name of the ASG at execution time, but happy to discuss if it's helpful.

Other

  • 👋 I may be able to implement this feature request
  • ⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

@kkelk kkelk added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Nov 27, 2019
@SomayaB SomayaB added the @aws-cdk/aws-autoscaling Related to Amazon EC2 Auto Scaling label Nov 27, 2019
@rix0rrr
Copy link
Contributor

rix0rrr commented Nov 28, 2019

What happens if we leave DesiredCapacity undefined in the generated CloudFormation template? Would that do it?

@kkelk
Copy link
Author

kkelk commented Nov 28, 2019

According to AWS::AutoScaling::AutoScalingGroup:

DesiredCapacity
... If you do not specify a desired capacity, the default is the minimum size of the group.

A thread from a couple of years ago claims that not setting it actually means CloudFormation won't change it for an existing group — but that sounds like relying on undefined behaviour to me.

For AWS::ECS::Service, DesiredCount is a required property for the REPLICA schedulingStrategy, so it wouldn't work there anyway, and a general solution for all auto-scaled resources would be nice.

Arguably, this feature request is better pushed upstream to CloudFormation, but I figured that it's much easier to implement a solution in code in the CDK than to fight that battle — what are your thoughts?

I note that I've found discussions on the problem in CloudFormation in at least a couple of places - and in both the suggested workaround is to get the current value from the AWS APIs as I do in my example above.

@rix0rrr
Copy link
Contributor

rix0rrr commented Nov 28, 2019

The problem here is that CDK doesn't really have a mechanism to query the API on every deploy invocation, so we'd have to make one.

In addition, I'm concerned about CI/CD pipelines, where the moment of querying is potentially far removed in time from the moment of actual deployment.

@kkelk
Copy link
Author

kkelk commented Nov 29, 2019

I see. I hadn't thought of pipelines working that way — good point on that one — although I think it being optional would help, and I'd argue querying at the wrong time is still better than the behaviour today.

Do you see any sensible path through this? Do you think it's at all likely that a solution could be implemented in CloudFormation? e.g. if all resources with desired count/capacity allowed omitting the property and defined it to mean leaving the value untouched, that would be ideal, and require only a minimal change in the CDK.

@rix0rrr
Copy link
Contributor

rix0rrr commented Nov 29, 2019

I think the current CloudFormation behavior for ASGs is actually intended, though underdocumented. The note about "missing defaults to minimum size" probably only applies to creation of the resource, not to updates.

For resources in which leaving desiredcount out does not work, we should treat that as a bug in the CloudFormation behavior, and report it upstream.

@kkelk
Copy link
Author

kkelk commented Dec 2, 2019

Sounds like a good solution, thanks. I've opened a ticket against CloudFormation requesting a change to AWS::ECS::Service, as well as documenting this behaviour clearly in all cases.

For now, this feature request on the CDK is limited to having some way of omitting DesiredCapacity in the generated CloudFormation templates for AWS::AutoScaling::AutoScalingGroup. That could be a breaking change, and is perhaps related to this pull request.

@SomayaB SomayaB removed the needs-triage This issue or PR still needs to be triaged. label Dec 17, 2019
rix0rrr added a commit that referenced this issue Dec 20, 2019
If `DesiredCapacity` is specified in the CloudFormation template, on every
deployment the capacity of the AutoScalingGroup is reset to that number,
even if the group had been scaled out at that point. The solution is to
leave DesiredCapacity empty, in which case it will remain untouched
during a deployment.

Previously, CDK would use some logic to always calculate a
DesiredCapacity for you, even if you left the `desiredCapacity` property
unset, leading to the undesirable behavior--which frankly represents
an availability risk.

Now, if you don't specify `desiredCapacity`, we won't set
`DesiredCapacity` either, avoiding the availability risk that we
introduced beforehand. In fact, if you *do* set `desiredCapacity`, we
will warn you that you probably shouldn't using a construct warning.

Fixes #5215, closes #5208.

BREAKING CHANGE: AutoScalingGroups without `desiredCapacity` are now
initially scaled to their minimum capacity (instead of their maximum
capaciety).
@SomayaB SomayaB added the in-progress This issue is being actively worked on. label Dec 20, 2019
@mergify mergify bot closed this as completed in #5507 Dec 23, 2019
mergify bot added a commit that referenced this issue Dec 23, 2019
* fix(autoscaling): every deployment resets capacity

If `DesiredCapacity` is specified in the CloudFormation template, on every
deployment the capacity of the AutoScalingGroup is reset to that number,
even if the group had been scaled out at that point. The solution is to
leave DesiredCapacity empty, in which case it will remain untouched
during a deployment.

Previously, CDK would use some logic to always calculate a
DesiredCapacity for you, even if you left the `desiredCapacity` property
unset, leading to the undesirable behavior--which frankly represents
an availability risk.

Now, if you don't specify `desiredCapacity`, we won't set
`DesiredCapacity` either, avoiding the availability risk that we
introduced beforehand. In fact, if you *do* set `desiredCapacity`, we
will warn you that you probably shouldn't using a construct warning.

Fixes #5215, closes #5208.

BREAKING CHANGE: AutoScalingGroups without `desiredCapacity` are now
initially scaled to their minimum capacity (instead of their maximum
capaciety).

* Add links

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
eladb pushed a commit that referenced this issue Dec 24, 2019
* fix(autoscaling): every deployment resets capacity

If `DesiredCapacity` is specified in the CloudFormation template, on every
deployment the capacity of the AutoScalingGroup is reset to that number,
even if the group had been scaled out at that point. The solution is to
leave DesiredCapacity empty, in which case it will remain untouched
during a deployment.

Previously, CDK would use some logic to always calculate a
DesiredCapacity for you, even if you left the `desiredCapacity` property
unset, leading to the undesirable behavior--which frankly represents
an availability risk.

Now, if you don't specify `desiredCapacity`, we won't set
`DesiredCapacity` either, avoiding the availability risk that we
introduced beforehand. In fact, if you *do* set `desiredCapacity`, we
will warn you that you probably shouldn't using a construct warning.

Fixes #5215, closes #5208.

BREAKING CHANGE: AutoScalingGroups without `desiredCapacity` are now
initially scaled to their minimum capacity (instead of their maximum
capaciety).

* Add links

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
eladb pushed a commit that referenced this issue Jan 5, 2020
As described in #5215, `desiredCapacity` is not the recommended way to configure an auto scaling group since it will cause the ASG to reset the number of nodes in every CloudFormation deployment. Since EKS's default capacity uses `desiredCapacity` instead of `minCapacity`, as of #5507 this would emit a warning: "desiredCapacity has been configured. Be aware this will reset the size of your AutoScalingGroup on every deployment".

This change modifies the behavior of the default capacity such that it will configure the ASG using `minCapacity` instead of `desiredCapacity` as recommended by ASG.

Fixes #5650
mergify bot added a commit that referenced this issue Jan 6, 2020
…ern (#5651)

* fix(eks): default capacity uses desiredCapacity which is an anti-pattern

As described in #5215, `desiredCapacity` is not the recommended way to configure an auto scaling group since it will cause the ASG to reset the number of nodes in every CloudFormation deployment. Since EKS's default capacity uses `desiredCapacity` instead of `minCapacity`, as of #5507 this would emit a warning: "desiredCapacity has been configured. Be aware this will reset the size of your AutoScalingGroup on every deployment".

This change modifies the behavior of the default capacity such that it will configure the ASG using `minCapacity` instead of `desiredCapacity` as recommended by ASG.

Fixes #5650

* Update integ.eks-cluster.defaults.expected.json

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
@andreprawira
Copy link

so any updates here? how do we preserve the desired capacity?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-autoscaling Related to Amazon EC2 Auto Scaling feature-request A feature should be added or improved. in-progress This issue is being actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants