Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(lambda-layer-awscli): s3-deployment, stepfunctions-tasks, eks constructs create multiple identical AwsCliLayers (#32907) #33085

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

wimlewis-amazon
Copy link

@wimlewis-amazon wimlewis-amazon commented Jan 23, 2025

Fixes #32907

Reason for this change

Multiple identical and possibly unused AwsCliLayers in a single stack are wasteful, and can cause deployments to fail if Lambda throttles layer creation requests.

Description of changes

  1. Created a utility method AwsCliLayer.getOrCreate() for getting or creating the singleton layer construct
  2. Updated callers to use it
  • aws-s3-deployment (the motivating caller)
  • aws-eks and aws-eks-v2-alpha
  • aws-stepfunctions-tasks

As a result of this change, the single AwsCliLayer is now created at the root of the stack, instead of each one being created near the construct that uses it. This requires changes to the test expectations which depend on the logical ID of the AwsCliLayer resource.

Describe any new or updated permissions being added

None

Description of how you validated changes

Added unit tests to cover new cases in aws-s3-deployments. Updated unit tests in other cdk modules.

The snapshot tests also need to be updated. I've run yarn integ and verified that the changes are all expected, but I'm not sure what the proper way to update the snapshots in git is.

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. effort/medium Medium work item – several days of effort p2 labels Jan 23, 2025
@aws-cdk-automation aws-cdk-automation requested a review from a team January 23, 2025 03:22
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter fails with the following errors:

❌ Fixes must contain a change to an integration test file and the resulting snapshot.

If you believe this pull request should receive an exemption, please comment and provide a justification. A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed, add Clarification Request to a comment.

✅ A exemption request has been requested. Please wait for a maintainer's review.

@wimlewis-amazon wimlewis-amazon changed the title fix(s3-deployment,stepfunctions-tasks,eks): cdk creates multiple identical, unused AwsCliLayers (#32907) fix(lambda-layer-awscli): s3-deployment, stepfunctions-tasks, eks constructs create multiple identical AwsCliLayers (#32907) Jan 24, 2025
@wimlewis-amazon wimlewis-amazon marked this pull request as ready for review January 24, 2025 18:05
@wimlewis-amazon
Copy link
Author

Clarification Request: I'm not sure what the proper way to update the snapshots in git is, or maybe my build environment isn't set up properly?

@aws-cdk-automation aws-cdk-automation added the pr/reviewer-clarification-requested The contributor has requested clarification on feedback, a failing build, or a failing PR Linter run label Jan 27, 2025
@aaythapa
Copy link
Contributor

aaythapa commented Feb 7, 2025

Thank you for contributing!

To update snapshots you can use the --update-on-failed flag, so something like yarn integ-runner --directory packages/@aws-cdk --update-on-failed. This doc on integration testing could be helpful. Hope this helps

As for the change, while I agree that creating extra AwsCliLayers resources is wasteful this seems like it'll be a backwards incompatible change as it changes the template significantly (will double check with someone else on the team about this). Was backwards compatibility taken into account while making this change?

@aws-cdk-automation aws-cdk-automation added pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback. and removed pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback. labels Feb 7, 2025
@aaythapa aaythapa removed the pr/reviewer-clarification-requested The contributor has requested clarification on feedback, a failing build, or a failing PR Linter run label Feb 7, 2025
@wimlewis-amazon
Copy link
Author

Re integ tests: Thanks, that gets me in the right direction. So --update-on-failed is the correct way to update the integ snapshots?

Re backwards compatibility: It does move the AwsCliLayer to a different location in the synthesized cloundformation stack, and removes a bunch of extraneous AwsCliLayers (that's the whole point of the fix, after all). My assumption is that this is not a compatibility problem, because the location (and even existence) of that resource is an implementation detail that presumably no users of the L2 constructs are making use of. But perhaps the new behavior should be gated behind a feature flag?

…entical AwsCliLayers (aws#32907)

Created a utility method AwsCliLayer.getOrCreate() for getting or creating
the singleton layer construct; updated s3-deployment to use it. As a result, all
S3 deployments needing this layer will share a single construct at the
root of the stack.

Fixes aws#32907
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 6c06f47
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@aaythapa
Copy link
Contributor

Re integ tests: Thanks, that gets me in the right direction. So --update-on-failed is the correct way to update the integ snapshots?

Yep! Something like yarn integ-runner --directory packages/@aws-cdk --update-on-failed will update the failed snapshot tests.

Re backwards compatibility: It does move the AwsCliLayer to a different location in the synthesized cloundformation stack, and removes a bunch of extraneous AwsCliLayers (that's the whole point of the fix, after all). My assumption is that this is not a compatibility problem, because the location (and even existence) of that resource is an implementation detail that presumably no users of the L2 constructs are making use of. But perhaps the new behavior should be gated behind a feature flag?

I agree that the resource is not being used by the user but presumably it will change existing templates without the users changing anything in their CDK app which will cause their stacks to re-deploy. I think in this case it's better to be safe then sorry so we should gate behind a feature flag. I'll double check with the team (tmwr) to get a second opinion

@wimlewis-amazon
Copy link
Author

Thanks. It will definitely change existing stacks, because it's fixing a bug. But it will also change some stacks which are using BucketDeployment in a way that doesn't evince the bug, so maybe that's an argument for a feature flag?

From CONTRIBUTING.md, the usual reasons for a feature flag:

  1. Resources replacement leading to service disruption; or
  2. Users could have taken assumptions on the old setup and the change will break them.

For 1, a resource is replaced, but it's the lambda layer used by the lambda which implements a custom resource within the stack, so it doesn't seem like there would be a service disruption — but I get your point that an unexpected stack deployment might worry users. For 2, it's technically possible for any change of course, but it's not something I would expect for this change.

LMK if I should rework this under a feature flag.

@aaythapa
Copy link
Contributor

aaythapa commented Feb 13, 2025

Thanks. It will definitely change existing stacks, because it's fixing a bug. But it will also change some stacks which are using BucketDeployment in a way that doesn't evince the bug, so maybe that's an argument for a feature flag?

From CONTRIBUTING.md, the usual reasons for a feature flag:

  1. Resources replacement leading to service disruption; or
  2. Users could have taken assumptions on the old setup and the change will break them.

For 1, a resource is replaced, but it's the lambda layer used by the lambda which implements a custom resource within the stack, so it doesn't seem like there would be a service disruption — but I get your point that an unexpected stack deployment might worry users. For 2, it's technically possible for any change of course, but it's not something I would expect for this change.

LMK if I should rework this under a feature flag.

Sorry forgot to update you. Had someone from the team look at this to get a 2nd opinion and we agree that using feature flag should be the path forward.

Maybe we should update the contributing guide but IMO only changes that change templates can only be merged in without feature flag if they are fixing a broken feature. E.x if a property was not working and we fix it then its technically not breaking since it wasn't working in the first place. In this case the constructs and custom resource are working correctly (correct me if I'm wrong), but they create unnecessary resources which is why we need to put this behind the feature flag.

If needed, this PR is a good example of how to use feature flags and this section of the contributing guide should help

@aws-cdk-automation
Copy link
Collaborator

This PR has been in the CHANGES REQUESTED state for 3 weeks, and looks abandoned. Note that PRs with failing linting check or builds are not reviewed, please ensure your build is passing

To prevent automatic closure:

  • Resume work on the PR
  • OR request an exemption by adding a comment containing 'Exemption Request' with justification e.x "Exemption Request: "
  • OR request clarification by adding a comment containing 'Clarification Request' with a question e.x "Clarification Request: "

This PR will automatically close in 7 days if no action is taken.

@wimlewis-amazon
Copy link
Author

Exemption Request: Hey, @aws-cdk-automation bot, this PR is obviously active.

@aws-cdk-automation aws-cdk-automation added the pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback. label Feb 14, 2025
@aaythapa aaythapa added pr-linter/do-not-close The PR linter will not close this PR while this label is present and removed pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback. labels Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. effort/medium Medium work item – several days of effort p2 pr-linter/do-not-close The PR linter will not close this PR while this label is present
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BucketDeployment creates multiple identical, unused AwsCliLayers
3 participants