-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(cli): cannot hotswap ECS task definitions containing certain intrinsics #26404
Conversation
This reverts commit 652489e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.
A comment requesting an exemption should contain the text Exemption Request
. Additionally, if clarification is needed add Clarification Request
to a comment.
Exemption Request: hotswap code does not support integration tests. fwiw I tested it manually by deploying a stack with the below code: import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as cdk from 'aws-cdk-lib';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import { StringParameter } from 'aws-cdk-lib/aws-ssm';
const app = new cdk.App();
const stack = new cdk.Stack(app, 'my-test-stack');
const vpc = new ec2.Vpc(stack, 'Vpc', {
subnetConfiguration: [
{
name: 'Public',
subnetType: ec2.SubnetType.PUBLIC,
cidrMask: 18,
},
{
name: 'Isolated',
subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
cidrMask: 18,
},
],
});
const cluster = new ecs.Cluster(stack, 'EcsCluster', {
vpc,
});
const taskDefinition = new ecs.FargateTaskDefinition(stack, 'Task', {});
taskDefinition.addContainer('EcsApp', {
// change latest to stable and run hotswap
image: ecs.ContainerImage.fromRegistry('nginx:latest'),
environment: {
// Previously we couldn't hotswap the task definition referencing properties like vpcCidrBlock.
VPC_ID: vpc.vpcCidrBlock,
},
});
cdk.Tags.of(taskDefinition).add('Key', 'Value');
new ecs.FargateService(stack, 'Service', {
cluster,
taskDefinition,
assignPublicIp: true,
}); Place the file to somewhere like |
Running the cli integ tests right now, just so I can exempt you from the linter |
@kaizencc Thank you very much! By the way, how about removing the cli integ test check from the PR linter? I believe the PR linter is for a contributor to validate if the PR is ready for review, but as for the cli integ check, there is nothing they can do to make the linter pass, which is a bit frustrating. I think the need to run the cli integ test can be expressed with another label. |
✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.
@rix0rrr I added/fixed unit tests and now it's ready for review :) Hope you can check this, thanks. |
I like this a lot, thanks! Ready to ship it, but final check: did you do manual testing on the latest revision with all the requested changes in it? Cheers! |
@rix0rrr Thanks :) Yes, I confirmed that I can successfully hotswap this CDK code: #26404 (comment) |
➡️ PR build request submitted to A maintainer must now check the pipeline and add the |
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
@rix0rrr mergify was failing for some reason:
I guess merging with main manually will fix this issue (because updating workflow files from mergify is prohibited?). Can you approve it again? Thanks 🙏 |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
…tain intrinsics" (#27358) Closes #27343 From v2.93.0 (#26404), we hotswap ECS task definition in the following steps: 1. Calculate patch from old/new CFn template 2. Apply the patch to the task definition fetched from describeTaskDefinition API 3. register the new task definition and update services The root cause of the issue #27343 is the order of containerDefinitions array is somehow shuffled when deployed to ECS, so the patch calculated from CFn template becomes invalid. For example, when the containerDefinitions in a CFn template is like below: ```json "ContainerDefinitions": [ { "Name": "main", "Image": "imageA" }, { "Name": "sidecar", "Image": "imageB" } ], ``` the deployed task definition can sometimes become like this: ```json "ContainerDefinitions": [ { "Name": "sidecar", "Image": "imageB" }, { "Name": "main", "Image": "imageA" } ], ``` This makes a patch calculated from CFn template diff completely invalid. We can sort both CFn template and the response of describeTaskDefinition API in a deterministic order, but it is still unreliable because there can be more arrays whose order will be shuffled. [The response of describeTaskDefinition](https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_DescribeTaskDefinition.html#API_DescribeTaskDefinition_ResponseSyntax) has many array fields, and it is not documented if they may be shuffled or not. I guess we should completely abandon this approach, because it cannot be reliable enough. I have an idea for more reliable approach, but at least it should be reverted asap as it's breaking the ECS hotswap feature. I'm really sorry for me not being aware with this behavior 🙏 ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Reason for this change
Currently an ECS task definition cannot be hotswapped in many cases, for example when it contains references to certain AWS resources as environment variables. This PR removes the limitation and make hotswaps possible in much more various situations.
Specifically, if there is any token that is not part of the following, we cannot hotswap a task definition:
Although it is not supported much, using
GetAtt
function in a task definition is very common (imagine you reference other resource's property as an environment variable). This PR allows to hotswap a task definition even if it contains these tokens.Solution
To hotswap a task definition, we need to construct a task definition to call
registerTaskDefinition
API. For this, we have to evaluate CloudFormation template locally to resolve all the intrinsics in the template. However, some intrinsics such asFn::GetAtt
is not fully supported by CDK CLI because we have to manually support them for each AWS resource type.The reason why some task definitions are unhotswappable is that there are such intrinsics in the template and the CDK fails to evaluate it locally. So the basic idea to overcome this limitation in this PR is that we don't try to evaluate it locally, but we fetch the latest task definition already deployed and get the required values from it.
Here's how we can implement the idea.
How we determine if changes to a task definition can be hotswapped
In the hotswap process, we have to decide whether the change can be hotswapped or not. Now we can hotswap the task definition if
ContainerDefinitions
field, and all the fields in the task definition can be evaluated locally. (original behavior) OR,ContainerDefinitions
field, and all the updated field can be evaluated locally (added in this PR).The first condition can actually be included in the second condition, but for now I keep it as-is to avoid the possibility of breaking the existing behavior.
If the second condition is true, we fetch the latest task definition from AWS account, override the updated fields, and register a new task definition to update a service. By this way, we don't have to evaluate tokens in unchanged fields locally, allowing to use hotswap in much more situations.
How we compare the old and new task definition
Here is an example task definition:
We compare the old and new task definition in the following steps:
ContainerDefinitions
field. If not, we cannot hotswap.evaluateCfnExpression
on the containerDefinitons. If it can be evaluated, proceed to hotswap. If not, proceed to step 3.ContainerDefinitions
is the same. If not, we cannot hotswap.Environment
,Image
,Name
, etc)evaluateCfnExpression
on the value. If the evaluation fails, we cannot hotswap.Imagine if there is a change only in
Image
field (container image tag) butEnvironment
field contains unsupported intrinsics (e.g."Fn::GetAtt": ["Vpc8378EB38", "CidrBlock"]
). In the previous CDK CLI we cannot hotswap it due to an evaluation error. We can now hotswap it because we don't have to evaluate theEnvironment
field when it has no diffs.Closes #25563
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license