Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ecs-patterns): remove default desiredCount to align with cfn behaviour (under feature flag) #13130

Merged
merged 5 commits into from
Feb 25, 2021

Conversation

piradeepk
Copy link
Contributor

@piradeepk piradeepk commented Feb 18, 2021

AWS CloudFormation previously required a desiredCount to be specified to create an Amazon ECS Service. They have recently removed that required field, and now provide the following default behaviour, if the desired count is not specified.

  • when creating a service, the service is created and starts up with a desired count of 1.
  • when updating a service, the service is updated and starts up with the same desired count as it had prior to the deployment.

The AWS-CDK currently adds a default desiredCount of 1 when creating or updating an Amazon ECS service, and this could lead to issues when a service has scaled up due to autoscaling. When updating a service using the AWS-CDK, the service would then be scaled down to the original desiredCount, leading to potential outages.

This change introduces a new feature flag REMOVE_DEFAULT_DESIRED_COUNT to flip the default behaviour of setting the desiredCount for any service created using any of the following ECS Patterns constructs. Without the feature flag set, the desiredCount of any service created is automatically defaulted to 1 (which is the current behaviour). With the feature flag set (to true), the default will be removed and the desiredCount will only be passed to the service if it is passed in as input.

  • ApplicationLoadBalancedEc2Service
  • ApplicationLoadBalancedFargateService
  • NetworkLoadBalancedEc2Service
  • NetworkLoadBalancedFargateService
  • QueueProcessingEc2Service
  • QueueProcessingFargateService

BREAKING CHANGE: the desiredCount property stored on the above constructs will be optional, allowing them to be undefined. This is enabled through the @aws-cdk/aws-ecs-patterns:removeDefaultDesiredCount feature flag. We would recommend all aws-cdk users to set the REMOVE_DEFAULT_DESIRED_COUNT flag to true for all of their existing applications.

Fixes: #12990


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@piradeepk piradeepk added the pr/do-not-merge This PR should not be merged at this time. label Feb 18, 2021
@piradeepk piradeepk self-assigned this Feb 18, 2021
@gitpod-io
Copy link

gitpod-io bot commented Feb 18, 2021

@SoManyHs SoManyHs added the @aws-cdk/aws-ecs-patterns Related to ecs-patterns library label Feb 18, 2021
@piradeepk
Copy link
Contributor Author

piradeepk commented Feb 18, 2021

@MrArnoldPalmer as we discussed yesterday, the build is failing since the desiredCount property is now optional instead of required. Any thoughts on how to get around it?

Copy link
Contributor

@SoManyHs SoManyHs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of mostly refactoring comments. Also, when I run the tests locally I'm getting changes in the expected integ test outputs (DesiredCount is removed). Is that not happening for you?

@MrArnoldPalmer
Copy link
Contributor

After more discussion with @piradeepk and @SoManyHs, it seems like the best solution here is to do the following.

  1. Mark the desiredCount property (on the construct not on input props) as deprecated in favor of a new property serviceDesiredCount with type number | undefined.
  2. Create a feature flag that when enabled sets this.serviceDesiredCount = props.desiredCount, which is possibly undefined. When the flag is disabled it should set it as this.serviveDesiredCount = this.desiredCount, essentially keeping the old behavior but just with a type that is possibly undefined.
  3. Mark the desiredCount input property on the QueueProcessingService as deprecated as only (min|max)ScalingCapacity should be used for this construct.

Tell me if I'm wrong anywhere or if you think there are better alternatives.

@rix0rrr
Copy link
Contributor

rix0rrr commented Feb 22, 2021

Without having looked into this much (apologies, bit stretched for time :) ), isn't this more or less the exact same issue as 0adf6c7 ?

There, we DID have a breaking change because the initial capacity would be different (our default vs CFN's default)... but in this case it seems our default behavior and CFN's default behavior are the same, are they not?

Also, a docstring that says "*if (blabla, flag) is true, then 1, else no default is defined" is lying. There's always a default. Whether it's determined by CDK or CFN or the service is immaterial to the user. Describe the end result, not just your tiny little slice of the world. The user doesn't care, you will only frustrate them if they have go looking somewhere else for the answer.

@MrArnoldPalmer
Copy link
Contributor

MrArnoldPalmer commented Feb 22, 2021

@rix0rrr my understanding is that it IS in fact the same issue. When we specify a desiredCount of 1 that resets scaling during deploy even if a service has been scaled up. So our default behavior and CFN's default behavior are indeed different, even though CFN docs say that 'desiredCount defaults to 1 when not specified', which if that were the case we would be able to just continue passing the default. @piradeepk correct me if I'm wrong.

@rix0rrr Is your opinion that we should break because we did for the same issue in autoscaling? I would say that this feature-flag + property deprecation is still preffered in my mind.

@piradeepk
Copy link
Contributor Author

Without having looked into this much (apologies, bit stretched for time :) ), isn't this more or less the exact same issue as 0adf6c7 ?

There, we DID have a breaking change because the initial capacity would be different (our default vs CFN's default)... but in this case it seems our default behavior and CFN's default behavior are the same, are they not?

In the case of this PR the behaviour that cfn expects and the behaviour of the CDK are different. In the case where it's a brand new service being created, cfn will take in a null (or cdk can pass in) and reset it to 1, but in the case of an existing service if cfn is passed in a null, it will look at the current service and use that count, where as the cdk is always resetting it to 1 (similar to the PR fix for auto scaling).

Also, a docstring that says "*if (blabla, flag) is true, then 1, else no default is defined" is lying. There's always a default. Whether it's determined by CDK or CFN or the service is immaterial to the user. Describe the end result, not just your tiny little slice of the world. The user doesn't care, you will only frustrate them if they have go looking somewhere else for the answer.

Fair point, I'll update that once we settle on an approach.

@rix0rrr my understanding is that it IS in fact the same issue. When we specify a desiredCount of 1 that resets scaling during deploy even if a service has been scaled up. So our default behavior and CFN's default behavior are indeed different, even though CFN docs say that 'desiredCount defaults to 1 when not specified', which if that were the case we would be able to just continue passing the default. @piradeepk correct me if I'm wrong.

This is correct.

@rix0rrr Is your opinion that we should break because we did for the same issue in autoscaling? I would say that this feature-flag + property deprecation is still preffered in my mind.

@MrArnoldPalmer, in this case, I would agree with @rix0rrr. Since there's already a precedent for adding a breaking change to fix cfn behaviour, we wouldn't be reinventing the wheel (process). Also, from my original perspective, adding all of these new properties that are used for the same purpose, but used/determined differently adds a lot more confusion for customers. And my opinion would be to add a feature flag, but to move the original desiredCount property on the construct from a required property to an optional one, thus allowing us to fix the behaviour to what is correct/expected.

@piradeepk
Copy link
Contributor Author

In general though, @rix0rrr @MrArnoldPalmer it's not quite clear in the contribution docs in which scenarios a breaking change would be allowed vs when we should try to work around it. Can we add something to the docs, to clarify the known scenarios in which we're okay with accepting a breaking change (this will continue to be updated/modified as things change, but it'll provide guidance to future contributors)?

@MrArnoldPalmer
Copy link
Contributor

@piradeepk I think things have changed a bit since that autoscaling PR was merged. Notably all BREAKING CHANGE: notations in PRs are assumed to be to experimental modules. Meaning that the above annotation will display in the changelog under the breaking changes to experimental modules: section which is not appropriate so we would need to manually update the changelog during release to account for this. This is not a big deal (assuming we remember to do it), but the motivation to do this was "we should take all possible steps to avoid breaking stable modules" so I would argue the feature flag is the right solution.

As @piradeepk mentions, if we do allow breaking changes to stable code under certain scenarios, we should define/document those scenarios and make sure the release tooling supports it. @eladb would be curious to get your opinion here.

@piradeepk
Copy link
Contributor Author

piradeepk commented Feb 23, 2021

@MrArnoldPalmer that makes sense, I think the first thing we need to settle on, before I change this PR, is what is the teams stance on breaking changes. My initial understanding was that breaking changes were not permitted for stable modules (no matter the circumstance), but if there are any scenarios at all, in which they'd be allowed, then I believe this fix should be considered one of those scenarios (fixing a broken experience/incorrect user experience created by a difference in the expected behaviour btwn CFN and CDK).

@rix0rrr @eladb thoughts?

@piradeepk
Copy link
Contributor Author

In addition, one thing we should also consider for future high level constructs (not specific to the ECS module), is are we adding a default because it's an opinionated value that we would like users to use, or are we adding a default for the sake of convenience. In this case, adding a default was for convenience, and is actually negatively impacting the user experience, potentially leading to services scaling down on deployment, causing outages.

@MrArnoldPalmer
Copy link
Contributor

Had a side conversation with @rix0rrr about this. He is onboard with the new property + feature flag. Specifically because of the change in property type that is required which can break existing users. His point about this not being a breaking change from an actual behavior standpoint is correct, but the way we have the code structured, specifically having this value as an output property, is what makes it a breaking change.

@piradeepk piradeepk added contribution/core This is a PR that came from AWS. and removed pr/do-not-merge This PR should not be merged at this time. labels Feb 23, 2021
@piradeepk
Copy link
Contributor Author

As per my discussion with @nija-at, updated all ecs integ tests to remove the desired count, since the feature flags are enabled for all integ tests.

@piradeepk piradeepk added the pr/do-not-merge This PR should not be merged at this time. label Feb 24, 2021
Copy link
Contributor

@MrArnoldPalmer MrArnoldPalmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the couple small changes related to manually mutating L1 properties in the constructors. Otherwise looks pretty good.

This was referenced Mar 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-ecs-patterns Related to ecs-patterns library contribution/core This is a PR that came from AWS.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(aws-ecs-patterns): Some service patterns still inline desired count to 1 when not present
6 participants