-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(core): assets are duplicated between nested Cloud Assemblies #11008
Conversation
We stage assets into the Cloud Assembly directory. If there are multiple nested Cloud Assemblies, the same asset will be staged multiple times. This leads to an N-fold increase in size of the Cloud Assembly when used in combination with CDK Pipelines (where N is the number of stages deployed), and may even lead the Cloud Assembly to exceed CodePipeline's maximum artifact size of 250MB. Add the concept of an `assetOutdir` next to a regular Cloud Assembly `outDir`, so that multiple Cloud Assemblies can share an asset directory. As an initial implementation, the `assetOutdir` of nested Cloud Assemblies is just the regular `outdir` of the root Assembly. We are playing a bit fast and loose with the semantics of file paths across our code base; many properties just say "the path of X" without making clear whether it's absolute or relative, and if it's relative what it's relative to (`cwd()`? Or the Cloud Assembly directory?). Turns out that especially in dealing with assets, the answer is "can be anything" and things just happen to work out based on who is providing the path and who is consuming it. In order to limit the scope of the changes I needed to make I kept modifications to the `AssetStaging` class: * `stagedPath` now consistently returns an absolute path. * `relativeStagedPath()` a path relative to the Cloud Assembly or an absolute path, as appropriate. Related changes in this PR: - Refactor the *copying* vs. *bundling* logic in `AssetStaging`. I found the current maze of `if`s and member variable changes too hard to follow to convince myself the new code would be doing the right thing, so I refactored it to reduce the branching factor. - Switch the tests of `aws-ecr-assets` over to Jest using `nodeunitShim`. Fixes #10877, fixes #9627, fixes #9917.
const old = this.cache.get(cacheKey); | ||
if (old) { return old; } | ||
|
||
const noo = calcFn(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing several oooooooooo
s. http://nooooooooooooooo.com/
Maybe to remove the pronunciation ambiguity:
const noo = calcFn(); | |
const calcValue = calcFn(); |
And old
can become cachedValue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I was trying to be cute
* returns an absolute path if it was not. | ||
*/ | ||
public relativeStagedPath(stack: Stack) { | ||
const thisAsmDir = Stage.of(stack)?.outdir; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why outdir
here and not assetOutdir
(or just this.outdir
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to make that more clear through naming.
// "Staging disabled" only applies to copying. If we've already done the work for bundling, | ||
// it's hardly any work to rename a directory afterwards. | ||
skip = this.node.tryGetContext(cxapi.DISABLE_ASSET_STAGING_CONTEXT); | ||
stageThisAsset = () => this.stageByCopying(skip); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When staging is disabled and bundling is used, the staged path should be the (absolute) path to the bundling directory, is it still the case now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #10367, users using this expect the aws:asset:path
property of Metadata
to be an absolute path to the bundling directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch.
Staging disabled combined with bundling is not really a thing anymore.
It was not clear to me what the point of disabling staging was, because that wasn't explained in the docs or comments, so I assumed it was for performance reasons and concluded that if you've gone to the effort of bundling, one more or fewer mv
s doesn't really make a difference.
If the point of "disabling staging" is to get an absolute path into aws:asset:path
... well that seems a little silly. If aws:asset:path
must always be an absolute path, why don't we then always stick an absolute path into it regardless? Why would it be relative if staging is enabled (thereby making it useless, apparently) and then have a flag to make it absolute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not clear to me what the point of disabling staging was
Also not 100% clear to me, not using this feature. It was introduced to allow SAM to interact with CDK: https://docs.aws.amazon.com/cdk/latest/guide/sam.html
If the point of "disabling staging" is to get an absolute path into
aws:asset:path
... well that seems a little silly.
Agree, maybe the right solution as you say is to always output an absolute path. @eladb?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe we don't want to have absolute paths in a deployed CF template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably the reason. We always insert aws:asset:path
, even though we're only interested in it if people also supply --no-staging
(which is intended to be interpreted as --integrate-with-sam-cli
).
Oh well. I'll clarify some more and fix the behavior. Thanks for the catch Jonathan!
Co-authored-by: Nick Lynch <nlynch@amazon.com>
Co-authored-by: Jonathan Goldwasser <jogold@users.noreply.github.com>
* Must be declared at the top of the file because we're going to use it statically in the | ||
* AssetStaging class. | ||
*/ | ||
class Cache<A> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move to a separate private file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I can, but I didn't because I was afraid people were going to insist I'd write unit tests for it :)
But sigh I guess I will then
Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
If the asset uses OUTPUT or BUNDLE it's generally because its source is a very large directory. This is the case for the `NodejsFunction` which mounts the project root (along with the `node_modules` folder!). Use a custom hash in this case when skipping bundling. Otherwise running `cdk ls` can result in heavy fingerprinting operations (again this is the case for the `NodejsFunction`). Regression introduced in aws#11008 (https://github.com/aws/aws-cdk/pull/11008/files#diff-62eef996be8abeb157518522c3cbf84a33dd4751c103304df87b04eb6d7bbab6L160)
…ling is skipped (#11440) If the asset uses OUTPUT or BUNDLE it's generally because its source is a very large directory. This is the case for the `NodejsFunction` which mounts the project root (along with the `node_modules` folder!). Use a custom hash in this case when skipping bundling. Otherwise running `cdk ls` can result in heavy fingerprinting operations (again this is the case for the `NodejsFunction`) and can be much slower than running `cdk synth` or `cdk diff`, making it pointless to skip bundling. Regression introduced in #11008 (https://github.com/aws/aws-cdk/pull/11008/files#diff-62eef996be8abeb157518522c3cbf84a33dd4751c103304df87b04eb6d7bbab6L160) Before #11008: https://github.com/aws/aws-cdk/blob/c1453147e7c484c54f1380d2fa9ba0cdf2859002/packages/%40aws-cdk/core/lib/asset-staging.ts#L159-L160 Closes #11459 Closes #11460 ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
We stage assets into the Cloud Assembly directory. If there are multiple
nested Cloud Assemblies, the same asset will be staged multiple times.
This leads to an N-fold increase in size of the Cloud Assembly when used
in combination with CDK Pipelines (where N is the number of stages
deployed), and may even lead the Cloud Assembly to exceed CodePipeline's
maximum artifact size of 250MB.
Add the concept of an
assetOutdir
next to a regular Cloud AssemblyoutDir
, so that multiple Cloud Assemblies can share an assetdirectory. As an initial implementation, the
assetOutdir
of nestedCloud Assemblies is just the regular
outdir
of the root Assembly.We are playing a bit fast and loose with the semantics of file paths
across our code base; many properties just say "the path of X" without
making clear whether it's absolute or relative, and if it's relative
what it's relative to (
cwd()
? Or the Cloud Assembly directory?).Turns out that especially in dealing with assets, the answer is
"can be anything" and things just happen to work out based on who is
providing the path and who is consuming it. In order to limit the
scope of the changes I needed to make I kept modifications to the
AssetStaging
class:stagedPath
now consistently returns an absolute path.relativeStagedPath()
a path relative to the Cloud Assembly or anabsolute path, as appropriate.
Related changes in this PR:
Refactor the copying vs. bundling logic in
AssetStaging
. I foundthe current maze of
if
s and member variable changes too hard tofollow to convince myself the new code would be doing the right thing,
so I refactored it to reduce the branching factor.
Switch the tests of
aws-ecr-assets
over to Jest usingnodeunitShim
.Fixes #10877, fixes #9627, fixes #9917.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license