- Original Author(s):: @madeline-k
- Tracking Issue: #39
- API Bar Raiser: @otaviomacedo
The AWS CDK v2 framework library, aws-cdk-lib
contains all framework packages
from AWS CDK v1 in one monolithic package. While this is a great value for
customers in making dependency management much easier, it has also resulted in
increasing the unpacked npm package size to 234 MB, at the time of writing. The
goal of this project is to significantly reduce this size.
Aside from the change in package size, the only customer-facing changes
resulting from this RFC will be the changes made to dynamically load dependencies
for the lambda-layer
submodules. Since these changes may cause a breaking
change for customers depending on their environment, we will release in two
steps.
- lambda-layer-awscli, lambda-layer-kubectl, lambda-layer-node-proxy-agent: synthesis might fail in certain environments, see #22470 for more information
- feat(lambda-layer-awscli, lambda-layer-kubectl, lambda-layer-node-proxy-agent): reduce aws-cdk-lib package size, by removing layer.zip files
- feat(lambda-layer-awscli): Dynamically load asset for AwsCliLayer, with bundled fallback
- feat(lambda-layer-kubectl): Dynamically load asset for KubectLayer, with bundled fallback
- feat(lambda-layer-node-proxy-agent): Dynamically load asset for AwsCliLayer, with bundled fallback
Starting with 2.X.0, we will deliver a notice to customers based on if their
application is using the layer.zip that was bundled into aws-cdk-lib
as a
fallback. This will happen if their application is using KubectlLayer
,
AwsCliLayer
, or NodeProxyAgentLayer
and their environment is not able to
download packages from npm. This could happen if the environment is network
restricted, does not have npm installed, or has npm configured to use a private
registry that does not include @aws-cdk/asset-
packages. Customers in this
situation will see:
NOTICES
22470 (lambda-layer-awscli): [ACTION REQUIRED] Your CDK application is using AwsCliLayer
Overview: Your app is using the AwsCliLayer construct in an environment
where npm is inaccessible. Add a dependency on @aws-cdk/asset-awscli-v1,
or the equivalent in your language, to remove this notice.
Affected versions: aws-cdk-lib.lambda_layer_awscli.Notice: ^2.X.0
More information at: https://github.com/aws/aws-cdk/issues/22470
Repeat the same for lambda-layer-kubectl
, and lambda-layer-node-proxy-agent
.
Ticking the box below indicates that the public API of this RFC has been
signed-off by the API bar raiser (the status/api-approved
label was applied to the
RFC pull request):
- Signed-off by API Bar Raiser @otaviomacedo
Today, we are launching a smaller version of the AWS CDK v2 framework library,
aws-cdk-lib
. Before today’s release, the unpacked size of this package was 234
MB. This large size prevented many customers from being able to use it, and
caused performance problems for others. It also prevented the CDK team from
releasing certain new features. Now, the size of the package is less than 100
MB. This smaller version also has a mechanism for adding new Lambda Layer
dependencies. Because of this launch, the AWS CDK framework will be able to add
support for more versions of kubectl, and awscliv2.
You should upgrade your dependencies on aws-cdk-lib
to get the benefits of a
smaller package size. If you use aws-cdk-lib
in a Lambda function, now you can
include more code and dependencies before hitting the Lambda function code size
limit of 250 MB. If you use aws-cdk-lib
in your CI/CD pipelines and are
frequently downloading it with a fresh npm install, you will see performance
improvements in the download times.
There are many reasons this is an issue for customers. The following items need to be addressed.
- The package managers that
aws-cdk-lib
is published on have real or recommended filesize limits. For example:- The v2 release currently gets this warning when publishing for Go.
Go: remote: warning: File awscdk/jsii/aws-cdk-lib-2.17.0.tgz is 55.42 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
- NPM itself does not have a package size limit, but other services that host NPM packages do. For example, Azure Artifacts has a limit of 500 MB.
- PyPi has an unofficially documented limit of 60 MB.
- The v2 release currently gets this warning when publishing for Go.
- A large download size might be problematic for customers with weak internet connections or limited bandwidth.
- Customers who download and install
aws-cdk-lib
on every CI/CD pipeline run may be frustrated to spend resources frequently downloading a large package. - AWS Lambda limits the size of your function code to 250 MB.
- The AWS Lambda limit for all uploaded assets for a Function is 75 GB. If a
customer is using Lambda Versions, and wants to maintain a Version for every
version they have published, they might end up hitting this limit because of
the large size of
aws-cdk-lib
. - The CDK project is currently blocked from including more 'large dependencies', such as multiple versions of kubectl, multiple versions of the AWS CLI, and dedicated versions of the AWS SDK for use in custom resources. This is a decision we’ve made to avoid increasing the size of the package further until a more scalable solution for these kinds of dependencies is implemented.
Most of these above issues have workarounds. For running CDK apps in Lambda, you
can use a Docker image for your Lambda code, and the limit becomes 10GB. For
package manager size limits, we can request
increases, and those have
historically been approved. However, publishing a large package is a bad
practice in the open-source software ecosystem. Even if a customer has no
technical limitations for adding a 234 MB dependency to their package, it is a
red flag to them when considering using aws-cdk-lib
. We should be
customer-obsessed, and solve the problem of aws-cdk-lib
being a huge
dependency, before customers start complaining about it.
There are some large assets contributing to the size of aws-cdk-lib
, like
.jsii files, and zip files of dependencies used in custom resources. The library
also contains a huge volume of source code now that aws-cdk-lib
combines 231
(and increasing) CDK modules. The table below breaks down the percentage that
each category of files contributes to the size of aws-cdk-lib.
Note: The percentages do not add up to 100. Some files, e.g. various .json files, are excluded and contribute very little to the size.
Category | Percentage of aws-cdk-lib size |
---|---|
.jsii files | 41.64% |
Source Maps (.js.map) | 21.93% |
Lambda Layer zip files | 15.48% |
Javascript code (.js) | 9.17% |
Type Declaration files (.d.ts) | 9.15% |
README, etc (.md) | 0.76% |
bundled npm dependencies | 0.69% |
.ts-fixture | 0.37% |
.jsiirc | 0.36% |
The CDK team is currently blocked on adding more Lambda Layer zip files to the
AWS CDK framework, because adding them would dramatically increase the size of
the package. Each of these zips are in a module whose only purpose is to bundle
a dependency (or two) into a Lambda Layer, which is then used in custom
resources that are part of the AWS CDK framework. We need a solution that is
scalable, so that more Lambda Layer dependencies can be added without bundling
them into aws-cdk-lib
and significantly impacting its size. See section
Lambda Layer Zip Files for the proposed solution.
There are more dependencies like these that need to be added to the framework, but we are currently blocked on adding them with the current design because they would increase the size too much. The current known list is:
- Additional versions of kubectl. This is a pressing problem for customers using the EKS module, who are limited to a single version of the kubectl CLI today.
- Additional versions of awscli. See issue.
- Boto3, and aws-sdk. All custom resources in the AWS CDK framework currently rely on whatever version of the AWS SDK that is available in the Lambda runtime by default. To make the custom resources more robust, and allow us to choose the exact version of the AWS SDK used by them, we should bundle these dependencies into each custom resource’s Lambda code.
If we add all of these with the current design, that would increase the size of
aws-cdk-lib
by approximately 130 MiB*, and make the Lambda Layers the biggest
contributor of size at 39%.
*See Appendix A to see where this estimate came from.
Category | Estimated future percentage of aws-cdk-lib size |
---|---|
Lambda Layer zip files | 38.83% |
.jsii files | 30.14% |
Source Maps (.js.map) | 15.87% |
Javascript code (.js) | 6.63% |
Type Declaration files (.d.ts) | 6.62% |
README, etc (.md) | 0.55% |
bundled npm dependencies | 0.50% |
.ts-fixture | 0.27% |
.jsiirc | 0.26% |
The main reason to not do this is that it will inevitably break up the
monolothic aws-cdk-lib
package in some manner, and we will lose some of the
wonderful simplicity of releasing and using a single package. But, we can't keep
everything in one place, and reduce the size enough to have a significant
positive impact.
There are different categories of files contributing to the size of
aws-cdk-lib
, and each category will have its own solution. The following
sections cover the top five categories in size contribution in order from
largest to smallest potential size.
Use separate npm packages for each lambda-layer-X
module. aws-cdk-lib
will
dynamically load in these packages using require()
, and handle installing them
if they are not available. With this solution, customers will not have to figure
out when they do or do not need to include these dependencies, and they will not
be required to access a public endpoint besides npm during runtime of their CDK
apps. Practically, this solution breaks down into the following steps.
- Publish
@aws-cdk/asset-aws-cli-v1
,@aws-cdk/asset-kubectl-v20
,@aws-cdk/lambda-layer-node-proxy-agent
as separate v2-compatible Construct libraries. Each package will expose a Construct such asAwsCliAsset
which is ans3_assets.Asset
and bundles the relevant dependency as an Asset construct. This publishing process will be done in cdklabs repositories. Each package will get its own repository. See this issue for a detailed discussion of the options, and why we have decided to use separate repositories. - Modify the
lambda-layer-X
submodules inaws-cdk-lib
to dynamically load the appropriate@aws-cdk/asset-X
package from step 1. This implementation should usenpm
commands to download and cache the package's archive in the CDK home directory, and then install the package into the current process'snode_modules
. This logic should be implemented once, and then reused in eachlambda-layer-X
submodule.- I have implemented a POC that shows the proposed dynamic loading mechanism will work with node languages and jsii target languages. See this repo for more details.
- We need to keep the dependencies up-to-date to get security updates and new
features. Each asset package will need to be treated slightly differently.
- Kubectl - This one has two dependencies to keep
updated. All updates described below will result in a minor version bump
to this package.
- kubectl
- Kubernetes users will need to be able to choose the minor version of kubectl that they want to use. And the aws-eks module will need to as well.
- We will release multiple packages, one for each currently
supported minor version of kubernetes, e.g.
@aws-cdk/lambda-layer-kubectl-v20
. Each one will expose a Construct calledKubectlLayer
. - A weekly automated task will check the Kubernetes API for a new patch version of each currently supported minor version, and automatically update the version of kubectl used in each package. This update will be performed with an auto-approved PR.
- A weekly automated task will check for new supported minor
versions of kubectl. If there is one, a PR will be automatically
be created which creates a new
@aws-cdk/lambda-layer-kubectl-XYZ
package for the new minor version. A human, the on-call engineer, will need to review this one.
- helm is also included in each asset in
these packages
- Helm has a somewhat complicated version support policy documented here.
- For each
@aws-cdk/lambda-layer-kubectl-XYZ
package, we will include the latest available version of Helm which is compatible with that version of Kubernetes. - A weekly automated task will update the minor and patch versions of helm within the supported ranges.
- The automated task that checks for new supported minor versions of kubectl, will also attempt to figure out what is the latest version of helm compatible with that version of kubectl. This will be verified by a human in PR review to make sure it is correct.
- kubectl
- AWS CLI - We will release two separate packages to make major versions 1
and 2 both available, and automatically update to the latest minor or
patch version.
- Two packages:
@aws-cdk/lambda-layer-aws-cli-v1
, and@aws-cdk/lambda-layer-aws-cli-v2
. - AWS CLI v1: automatically check for new minor and patch versions from PyPi - awscli. Each update will result in a minor version bump of the corresponding package.
- AWS CLI v2: automatically check for new minor and patch versions from offical awscliv2 Dockerfile. AWS CLI v2 does not expose an official PyPi package, but Dependabot can automatically check new Dockerfile versions. Each update will result in a minor version bump of the corresponding package.
- Two packages:
- Node proxy-agent - New minor versions will
automatically be picked up and available with a minor version bump of the
library.
- This can be implemented with
npm-check-updates
, sinceproxy-agent
is an npm package.
- This can be implemented with
- AWS CDK v2 will automatically pick up new minor versions of the
@aws-cdk/asset-XYZ
packages with our current dependency upgrade automation.
- Kubectl - This one has two dependencies to keep
updated. All updates described below will result in a minor version bump
to this package.
After the above is implemented, this mechanism can be extended to include additional Lambda Layers with different dependencies. The solution should be implemented in such a way that new dependencies can be added with the below steps.
- Create a cdklabs repo and publish a Construct library called
@aws-cdk/asset-MyNewDependency
. - Add a new directory in the
aws-cdk
monorepo:@aws-cdk/lambda-layer-MyNewDependency
. - Implement the MyNewDependencyLayer API.
- If the dependency is from npm, then updates will happen automatically. If it is from some other source, then extend the automation from the above steps to also update the new dependency.
Once this solution is implemented, it will be possible to write more concrete steps. And those will be included in the aws-cdk Contributing Guide.
This question raises an important drawback of this solution. It will result in a
breaking change for customers who are using these Lambda Layers in
network-restricted environments. If customers are able to use the CDK in their
network-restricted environments, then it is reasonable to assume they were able
to acquire the aws-cdk
package from npm somehow. This implies customers will
be able to acquire the @aws-cdk/asset-X
packages from npm. There are a
few options for these customers:
- If using TypeScript or JavaScript, add each necessary package to the dependencies of their CDK app.
- If using a target jsii language, add each necessary package as a dependency
as they normally would for other construct libraries. Each
@aws-cdk/asset-X
package will be a jsii construct library, that is also vended to the package managers for each target jsii language. For example, Python users can install aws-cdk.asset-awscli-v2 from pip. - Include them in a private artifact registry.
Unfortunately, there is no way to remove these large files and host them somewhere else without causing a breaking change for these customers. We will publish a CLI notice directly to affected customers of this change well in advance of releasing it. And, advertise the change in advance of release on the aws-cdk GitHub repository.
Today, there is a mechanism for publishing AWS CDK into ADC (Amazon Dedicated
Cloud) regions. This mechanism includes bundling the direct dependencies of the
CDK and publishing them into the ADC regions, e.g. constructs
. We will add the
@aws-cdk/asset-X
npm packages, and their corresponding non-Node
language versions to this process. When customers use the lambda-layers in ADC
regions, the right dependencies will already be available.
There are two files, .jsii.tabl.json and .jsii, bundled in aws-cdk-lib
for the
jsii runtime to work for any non-NodeJS languages. And, to enable code and
documentation generation from the published npm package. Today, they make up 42%
of the package size. And in the future, after adding more Lambda Layers, the
.jsii files would compose 30% of the package size.
We will compress the .jsii.tabl.json and .jsii files. Compressing with gzip currently creates a 4 MB and 4.7 MB file, respectively. This change will impact the jsii runtime, and Construct Hub. NodeJS jsii runtimes don't need to use either file, and non-NodeJS runtimes would decompress the .jsii file when needed. Construct Hub generates documentation from the .jsii.tabl.json file, and it will also decompress this file as needed.
These files currently makeup 22% of the total size of aws-cdk-lib
. Source map
files map from the transformed source to the original source. In the case of
aws-cdk-lib
, these are not actually useful today, since we do not ship the
original .ts
files. We will remove the source maps from the released package.
The Javascript files make up about 10% of the overall package size. Today, they are already minified in the released package. There is not much else that can be done to make them smaller.
We can minify these files, which could result in a space saving of up to 20 MB.
Minifying is the process of removing new-lines and whitespace characters from
the files. The 20 MB number was collected by removing all new-line and
whitespace characters. In reality, we can not actually remove all whitespace. We
will need to maintain whitespace in documentation comments, so that
type-checking in IDEs still has readable type references. As well as the minimum
whitespace necessary for the files to still be readable when navigating to them
in an IDE from usage of aws-cdk-lib
.
For the type declaration files, we will also pursue combining them into one file
per submodule. Currently, there is one .d.ts
file per .ts
file in the
aws-cdk-lib
source code. This could lead to better compression of the library
overall. And, would improve performance for tsc
invocations, since the
compiler would not need to read as many files. These improvements have not been
measured yet. They will be measured during the implementation to make sure the
changes are worth it.
Yes, see CHANGELOG and CLI Notice sections. The solution for the Lambda Layer zip files will have a breaking change for customers executing CDK applications that use the Lambda Layers in some network-restricted environments. Most alternative solutions will also have a breaking change for these customers. The proposed solution will have the easiest migration path for these customers.
Each category of files has different alternative solutions to reduce the size.
- Vend the Rosetta tablet file (.jsii.tabl.json) separately. On each release, we would publish this file as a separate artifact on GitHub releases, and reference the URI from the jsii assembly (.jsii). This solution will not be pursued at this time. There are several issues with de-coupling the Rosetta tablet file from the artifacts that customers install, including adding a requirement that the jsii runtime to run in an environment that has access to the public endpoint where it would be hosted.
- The .jsii.tabl.json and .jsii files are both “pretty-printed.” We can reduce their size to 42 MB (-23%) and 30 MB (-33%) by removing all the whitespace and newlines. Since we are going to be compressing these files, removing whitespace doesn't have much benefit. It will not impact either the unpacked npm package size, or the compressed artifact size for all package manager. Since the files will already be compressed in both versions.
- Minifying the files, instead of removing them. Since the source maps are not useful in their current form, it is best to just remove them. In the future, we could consider adding them back in a minified form.
- Releasing an additional 2.X.Y-debug package with every release, that will contain the source maps and source files. In addition to reducing the size of the aws-cdk-lib package, we want to reduce the volume of artifacts we publish to npm and other package managers in total. This would be in direct conflict with that goal by doubling the amount of content that is published on each release. And, there is currently no demonstrated customer need for a debug package like this.
- Host the zip files in a static website. This would be implemented as a CDK
app in a new GitHub repository. The CDK CLI will download and cache these zip
files at synth time. The existing
lambda-layer-X
modules will be modified to reference the large dependencies from a local cache. The CDK CLI will be modified to update that cache at synth time, if needed. Then, the rest of their usage will remain the same. The URLs for each dependency would follow a predictable format, including each dependency’s version. Some automated mechanism would be required to add new artifacts to the website as new versions of each dependency are released. And, to add new versions of each dependency to aws-cdk-lib. The problem with this solution is that it requires access to a public endpoint during synthesis of CDK applications and does not lend itself to a straightforward workaround for customers who run CDK apps in network-restricted environments (Construct Hub, ADC region customers, etc). - Download the the zip files from a static website in the framework
implementation of each
Layer
class, instead of in the CLI. We should not do this, because the CLI already has logic for managing credentials and making network calls. This type of work should remain in the CLI, and not overflow into the framework. - Use regionalized S3 buckets to host the zip files. With this solution, the customer does not need to download the zip file and then re-upload it to their deployment environment. Instead, the Lambda Layer or Lambda Function resource definition in their CloudFormation template references the zip hosted by us in the same region as their deployment environment. The maintenance burden of keeping up with deploying these artifacts and keeping them up to date in every AWS region is too high. This would effectively make our team double as a hosted artifact repository for artifacts that we don't own. This solution is the same as the one described in this comment.
- Use public Lambda Layers. This would essentially be the same as alternative #3, because the code for each public Lambda Layer would need to reside in S3. This solution adds additional complexity for the naming and versioning convention of the artifacts, so we would likely choose S3 over this.
- Separate packages that
aws-cdk-lib
peer depends on. With this solution, customers do not need to include the largelambda-layer-X
packages in their dependencies unless they are actually using one of the Constructs that requires them. This would be confusing for customers to know when they need these peer dependencies, and when they do not. It would also be difficult for customers to manage versions and to know which versions of each package are compatible with each other. - Upstream the custom resources where the zip files are used. Since all of
these zip files are used in custom resources, another way to remove them from
aws-cdk-lib
is to remove the custom resources themselves. We will create AWS resources that are available in the CloudFormation public registry under the namespaceAWS::CDK
. This option would not require customers in network-restricted to perform a workaround to get these dependencies. In addition to reducing the size of the package, this solution also has another benefit of addressing pain points with CDK publishing custom resources into customer accounts. TheXXLayer
APIs on their own are also valuable to customers. If we are able to remove their dependencies withinaws-cdk-lib
by removing the custom resources, then we could vend these libraries separately for customers who use them explicitly. We are not going to pursue this solution right now, since this will increase the operational burden on our team by an unknown amount. The proposed solution does not prevent us from also pursuing this in the future.
No alternatives were considered yet.
Remove these files. We can’t remove the type declaration files, because that would break the amazing type-checking features of CDK!
The auto-generated L1 code overlaps with the Javascript, Source Map, and Type Declaration categories already mentioned, and it composes about 25% of the overall package. The size of this code grows linearly as the volume of CloudFormation resources and features grows. Alternative solutions focusing on this category:
- Separate out the L1 constructs to their own library. This would compromise one of the major values of AWS CDK v2, which is that all AWS CDK constructs are in a single library, and customers no longer have to manage a complicated dependency tree for their CDK apps and libraries. We might consider this in the future as a logical way of breaking up the library into multiple components, if we are not able to keep the size under control with other solutions described in this document.
- Invent a new way of generating this code that has a smaller footprint. This could be a huge undertaking, and needs a lot more investigation into whether there are possible solutions for this.
-
This solution does not address the fact that the size of aws-cdk-lib increases with each weekly release, because we are adding code and new features at a rapid pace, and AWS is always adding new services and features. It is possible that after all of the proposed solutions are implemented, that the package size will still continue to increase past a reasonable size. At that point, we might need to invent more creative solutions to reduce the size, or implement some of the alternative solutions outlined here.
-
This solution has a breaking change for some customers. Until we get some feedback from the community on this RFC, we don't know for sure if the proposed migration for those customers is reasonable.
Each category of files can be worked on in parallel and addressed separately.
The highest priority category to start with will be the Lambda Layer zip files. Even though the Lambda Layers are not the biggest contributor to size today, they are the biggest potential contributor to size. And, there is customer need to add more Lambda Layers, which we are not able to do until the large dependencies are not required to be bundled in aws-cdk-lib.
The next category will be the .jsii files. After both of these solutions are implemented, the the aws-cdk-lib unpacked npm package size will be about 100 MB.
The lowest priority category will be removing the Source Map files and minifying the Type Declaration files. The final size of the package after this work is still to be determined.
Work will be tracked on this project board.
-
All size calculations in this document were done based on
aws-cdk-lib@2.19.0
. -
du -k
was used to get sizes in Kb (Kibibytes), for calculating the percentages. -
The total size of the package according to
du
that was used to calculate percentages is 247008 Kb. -
The total size of 234 MB that is referenced in the doc, is the unpacked size of version 2.19.0 according to npm.
-
The estimate for the 130 MiB increase in size for future Lambda Layer zips was calculated from these estimates:
Size (KiB) Source kubectl 23740 actual size of zip in released package awscli 13088 '' node-proxy-agent 1400 '' 3 minor versions of kubectl 71220 = 3 * 23740 boto3 133 pypi awscliv2 13088 estimate same size as awscli aws-sdk 9876 zipped size of aws-sdk after a fresh npm install --------------------------- ---------- --------------------------------------------- Total KiB 132545 Total MiB 129.44 = 132545 / 1024