Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): assets are duplicated between nested Cloud Assemblies #11008

Merged
merged 13 commits into from
Oct 26, 2020

Conversation

rix0rrr
Copy link
Contributor

@rix0rrr rix0rrr commented Oct 21, 2020

We stage assets into the Cloud Assembly directory. If there are multiple
nested Cloud Assemblies, the same asset will be staged multiple times.
This leads to an N-fold increase in size of the Cloud Assembly when used
in combination with CDK Pipelines (where N is the number of stages
deployed), and may even lead the Cloud Assembly to exceed CodePipeline's
maximum artifact size of 250MB.

Add the concept of an assetOutdir next to a regular Cloud Assembly
outDir, so that multiple Cloud Assemblies can share an asset
directory. As an initial implementation, the assetOutdir of nested
Cloud Assemblies is just the regular outdir of the root Assembly.

We are playing a bit fast and loose with the semantics of file paths
across our code base; many properties just say "the path of X" without
making clear whether it's absolute or relative, and if it's relative
what it's relative to (cwd()? Or the Cloud Assembly directory?).

Turns out that especially in dealing with assets, the answer is
"can be anything" and things just happen to work out based on who is
providing the path and who is consuming it. In order to limit the
scope of the changes I needed to make I kept modifications to the
AssetStaging class:

  • stagedPath now consistently returns an absolute path.
  • relativeStagedPath() a path relative to the Cloud Assembly or an
    absolute path, as appropriate.

Related changes in this PR:

  • Refactor the copying vs. bundling logic in AssetStaging. I found
    the current maze of ifs and member variable changes too hard to
    follow to convince myself the new code would be doing the right thing,
    so I refactored it to reduce the branching factor.

  • Switch the tests of aws-ecr-assets over to Jest using
    nodeunitShim.

Fixes #10877, fixes #9627, fixes #9917.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

We stage assets into the Cloud Assembly directory. If there are multiple
nested Cloud Assemblies, the same asset will be staged multiple times.
This leads to an N-fold increase in size of the Cloud Assembly when used
in combination with CDK Pipelines (where N is the number of stages
deployed), and may even lead the Cloud Assembly to exceed CodePipeline's
maximum artifact size of 250MB.

Add the concept of an `assetOutdir` next to a regular Cloud Assembly
`outDir`, so that multiple Cloud Assemblies can share an asset
directory. As an initial implementation, the `assetOutdir` of nested
Cloud Assemblies is just the regular `outdir` of the root Assembly.

We are playing a bit fast and loose with the semantics of file paths
across our code base; many properties just say "the path of X" without
making clear whether it's absolute or relative, and if it's relative
what it's relative to (`cwd()`? Or the Cloud Assembly directory?).

Turns out that especially in dealing with assets, the answer is
"can be anything" and things just happen to work out based on who is
providing the path and who is consuming it. In order to limit the
scope of the changes I needed to make I kept modifications to the
`AssetStaging` class:

* `stagedPath` now consistently returns an absolute path.
* `relativeStagedPath()` a path relative to the Cloud Assembly or an
  absolute path, as appropriate.

Related changes in this PR:

- Refactor the *copying* vs. *bundling* logic in `AssetStaging`. I found
  the current maze of `if`s and member variable changes too hard to
  follow to convince myself the new code would be doing the right thing,
  so I refactored it to reduce the branching factor.

- Switch the tests of `aws-ecr-assets` over to Jest using
  `nodeunitShim`.

Fixes #10877, fixes #9627, fixes #9917.
@rix0rrr rix0rrr requested a review from a team October 21, 2020 12:18
@rix0rrr rix0rrr self-assigned this Oct 21, 2020
@gitpod-io
Copy link

gitpod-io bot commented Oct 21, 2020

@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Oct 21, 2020
packages/@aws-cdk/core/lib/asset-staging.ts Outdated Show resolved Hide resolved
const old = this.cache.get(cacheKey);
if (old) { return old; }

const noo = calcFn();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing several oooooooooos. http://nooooooooooooooo.com/

Maybe to remove the pronunciation ambiguity:

Suggested change
const noo = calcFn();
const calcValue = calcFn();

And old can become cachedValue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I was trying to be cute

* returns an absolute path if it was not.
*/
public relativeStagedPath(stack: Stack) {
const thisAsmDir = Stage.of(stack)?.outdir;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why outdir here and not assetOutdir (or just this.outdir)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to make that more clear through naming.

packages/@aws-cdk/core/lib/asset-staging.ts Outdated Show resolved Hide resolved
Comment on lines 177 to 180
// "Staging disabled" only applies to copying. If we've already done the work for bundling,
// it's hardly any work to rename a directory afterwards.
skip = this.node.tryGetContext(cxapi.DISABLE_ASSET_STAGING_CONTEXT);
stageThisAsset = () => this.stageByCopying(skip);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When staging is disabled and bundling is used, the staged path should be the (absolute) path to the bundling directory, is it still the case now?

Copy link
Contributor

@jogold jogold Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #10367, users using this expect the aws:asset:path property of Metadata to be an absolute path to the bundling directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch.

Staging disabled combined with bundling is not really a thing anymore.

It was not clear to me what the point of disabling staging was, because that wasn't explained in the docs or comments, so I assumed it was for performance reasons and concluded that if you've gone to the effort of bundling, one more or fewer mvs doesn't really make a difference.

If the point of "disabling staging" is to get an absolute path into aws:asset:path... well that seems a little silly. If aws:asset:path must always be an absolute path, why don't we then always stick an absolute path into it regardless? Why would it be relative if staging is enabled (thereby making it useless, apparently) and then have a flag to make it absolute?

Copy link
Contributor

@jogold jogold Oct 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not clear to me what the point of disabling staging was

Also not 100% clear to me, not using this feature. It was introduced to allow SAM to interact with CDK: https://docs.aws.amazon.com/cdk/latest/guide/sam.html

If the point of "disabling staging" is to get an absolute path into aws:asset:path... well that seems a little silly.

Agree, maybe the right solution as you say is to always output an absolute path. @eladb?

Copy link
Contributor

@jogold jogold Oct 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe we don't want to have absolute paths in a deployed CF template.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably the reason. We always insert aws:asset:path, even though we're only interested in it if people also supply --no-staging (which is intended to be interpreted as --integrate-with-sam-cli).

Oh well. I'll clarify some more and fix the behavior. Thanks for the catch Jonathan!

rix0rrr and others added 4 commits October 22, 2020 13:49
Co-authored-by: Nick Lynch <nlynch@amazon.com>
Co-authored-by: Jonathan Goldwasser <jogold@users.noreply.github.com>
* Must be declared at the top of the file because we're going to use it statically in the
* AssetStaging class.
*/
class Cache<A> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move to a separate private file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I can, but I didn't because I was afraid people were going to insist I'd write unit tests for it :)

But sigh I guess I will then

@rix0rrr rix0rrr requested review from eladb, njlynch and a team October 23, 2020 08:06
@mergify
Copy link
Contributor

mergify bot commented Oct 26, 2020

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: ae4cc94
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify
Copy link
Contributor

mergify bot commented Oct 26, 2020

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit c84217f into master Oct 26, 2020
@mergify mergify bot deleted the huijbers/shared-assets branch October 26, 2020 12:14
jogold added a commit to jogold/aws-cdk that referenced this pull request Nov 12, 2020
If the asset uses OUTPUT or BUNDLE it's generally because its source is
a very large directory. This is the case for the `NodejsFunction` which
mounts the project root (along with the `node_modules` folder!).

Use a custom hash in this case when skipping bundling. Otherwise running
`cdk ls` can result in heavy fingerprinting operations (again this is
the case for the `NodejsFunction`).

Regression introduced in aws#11008
(https://github.com/aws/aws-cdk/pull/11008/files#diff-62eef996be8abeb157518522c3cbf84a33dd4751c103304df87b04eb6d7bbab6L160)
mergify bot pushed a commit that referenced this pull request Dec 1, 2020
…ling is skipped (#11440)

If the asset uses OUTPUT or BUNDLE it's generally because its source is
a very large directory. This is the case for the `NodejsFunction` which
mounts the project root (along with the `node_modules` folder!).

Use a custom hash in this case when skipping bundling. Otherwise running
`cdk ls` can result in heavy fingerprinting operations (again this is
the case for the `NodejsFunction`) and can be much slower than running
`cdk synth` or `cdk diff`, making it pointless to skip bundling.

Regression introduced in #11008
(https://github.com/aws/aws-cdk/pull/11008/files#diff-62eef996be8abeb157518522c3cbf84a33dd4751c103304df87b04eb6d7bbab6L160)

Before #11008:
https://github.com/aws/aws-cdk/blob/c1453147e7c484c54f1380d2fa9ba0cdf2859002/packages/%40aws-cdk/core/lib/asset-staging.ts#L159-L160

Closes #11459 
Closes #11460 

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution/core This is a PR that came from AWS.
Projects
None yet
5 participants