Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pipelines] 1.67.0 breaks custom asset bundling when using multiple pipelines #10877

Closed
danielfrankcom opened this issue Oct 14, 2020 · 13 comments · Fixed by #11008
Closed

[pipelines] 1.67.0 breaks custom asset bundling when using multiple pipelines #10877

danielfrankcom opened this issue Oct 14, 2020 · 13 comments · Fixed by #11008
Assignees
Labels
@aws-cdk/pipelines CDK Pipelines library bug This issue is a bug. needs-triage This issue or PR still needs to be triaged.

Comments

@danielfrankcom
Copy link

After updating to CDK 1.67.0, my custom asset bundling logic no longer works while using multiple pipelines. This bundling logic uses the BundlingOptions construct to move the lambda function handler to a nested folder structure. When executing cdk synth on the new CDK version, only the first pipeline defined in app.py uses the bundling logic, and the other copies the code from the path without modifying it.

If I revert back to 1.66.0 I no longer see this behavior, with no other code changes. Both pipelines produce the correct artifacts using the older CDK version.

Reproduction Steps

I have stripped down my code to provide a minimal example, and have posted it here.

Running cdk synth against this code produces 2 assemblies in cdk.out, where each assembly has an asset representing the lambda function. The cdk synth output can also be seen by running the pipeline on AWS, although it is easier to test locally.

What did you expect to happen?

I expected both assets in the cdk.out directory to contain the bundled directory structures.

Both assets should contain a directory structure that looks like the following:

asset.1234
  - entry.py
  - lib
    - function
      - function.py

What actually happened?

Only the first pipeline declared in app.py has the bundled directory structure.

The second asset is bundled using the default settings, which ignore the custom bundling logic.

asset.1234
  - function.py

Environment

  • CLI Version : 1.67.0 (build 2b4dd71)
  • Framework Version: 1.67.0
  • Node.js Version: v12.18.4
  • OS : WSL2 in Windows 10 (same behavior occurs in CodeBuild)
  • Language (Version): Python 3.8

This is 🐛 Bug Report

@danielfrankcom danielfrankcom added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 14, 2020
@github-actions github-actions bot added the @aws-cdk/pipelines CDK Pipelines library label Oct 14, 2020
@jogold
Copy link
Contributor

jogold commented Oct 16, 2020

@danielfrankcom can you share the exact folder structure/content of your cdk.out after running synth here?

@danielfrankcom
Copy link
Author

@jogold absolutely!

To generate this, I deleted the cdk.out folder entirely, and ran cdk synth on the project that I linked above. This is the file tree for version 1.66.0, which is the way that I expect it to look.

cdk.out
├── Develop1.assets.json
├── Develop1.template.json
├── Develop2.assets.json
├── Develop2.template.json
├── assembly-Develop1-DevelopLayer
│   ├── Develop1DevelopLayerLayer1283E4BED.assets.json
│   ├── Develop1DevelopLayerLayer1283E4BED.template.json
│   ├── asset.88d670728dd9800639080b11a0a9600df1d1f0c270a30a21d0c684b503fe1b26
│   │   ├── entry.py
│   │   └── lib
│   │       └── function
│   │           ├── __pycache__
│   │           │   └── function.cpython-38.pyc
│   │           └── function.py
│   ├── cdk.out
│   └── manifest.json
├── assembly-Develop2-DevelopLayer
│   ├── Develop2DevelopLayerLayer17829D95B.assets.json
│   ├── Develop2DevelopLayerLayer17829D95B.template.json
│   ├── asset.88d670728dd9800639080b11a0a9600df1d1f0c270a30a21d0c684b503fe1b26
│   │   ├── entry.py
│   │   └── lib
│   │       └── function
│   │           ├── __pycache__
│   │           │   └── function.cpython-38.pyc
│   │           └── function.py
│   ├── cdk.out
│   └── manifest.json
├── cdk.out
├── manifest.json
└── tree.json

After generating this tree I changed the CDK dependencies in requirements.txt to 1.67.0, and executed pip install -r requirements.txt to update the installed dependencies.

I deleted the cdk.out folder again, and ran cdk synth to generate this tree for 1.67.0:

cdk.out
├── Develop1.assets.json
├── Develop1.template.json
├── Develop2.assets.json
├── Develop2.template.json
├── assembly-Develop1-DevelopLayer
│   ├── Develop1DevelopLayerLayer1283E4BED.assets.json
│   ├── Develop1DevelopLayerLayer1283E4BED.template.json
│   ├── asset.88d670728dd9800639080b11a0a9600df1d1f0c270a30a21d0c684b503fe1b26
│   │   ├── entry.py
│   │   └── lib
│   │       └── function
│   │           ├── __pycache__
│   │           │   └── function.cpython-38.pyc
│   │           └── function.py
│   ├── cdk.out
│   └── manifest.json
├── assembly-Develop2-DevelopLayer
│   ├── Develop2DevelopLayerLayer17829D95B.assets.json
│   ├── Develop2DevelopLayerLayer17829D95B.template.json
│   ├── asset.88d670728dd9800639080b11a0a9600df1d1f0c270a30a21d0c684b503fe1b26
│   │   ├── __pycache__
│   │   │   └── function.cpython-38.pyc
│   │   └── function.py
│   ├── cdk.out
│   └── manifest.json
├── cdk.out
├── manifest.json
└── tree.json

You can see in the second tree that the second asset only contains the function.py file, and is not modified by the bundling logic that is supposed to be creating the lib/function directory structure and adding the entry.py file.

@jogold
Copy link
Contributor

jogold commented Oct 16, 2020

@rix0rrr this is due to #10478: the cache logic fails here because Stages do not share the same outdir (but works with multiple Stacks in the same app). But this means that identical assets used in multiple Stages are copied/bundled as many times as there are Stages, what do you suggest here?

@hoegertn
Copy link
Contributor

@rix0rrr this is also what I was talking about on Wednesday with building assets multiple times

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 19, 2020

Yep, I figured as much.

Feels like we can do two things:

  • Add more deduplication logic in the asset bundling so that it will know to only do bundling (or whatever the new operation is, I didn't follow this too closely) once. We already have the cacheKey which depends only on sourcePath, so that should match for multiple assets, feels like.
  • Introducing the concept of an asset outdir that will be shared between all (nested) assemblies in the root assembly, then all old logic applies again.

I feel like the 2nd one is better.

Thoughts?

@hoegertn
Copy link
Contributor

Seconds options sounds great.

@jogold
Copy link
Contributor

jogold commented Oct 19, 2020

I feel like the 2nd one is better.

Agree

@jogold
Copy link
Contributor

jogold commented Oct 19, 2020

Not sure we need an extra directory for this, leaving all the assets in the top level outdir is fine and is the behavior when there are no Stages in an App. @rix0rrr is there a simple way to get to the top level outdir in AssetStaging? Then it's fixed.

A note on the origin of this issue: caching works but the second time we come across the asset there's no bundleDir here

// Staging the bundling asset.
if (this.bundleDir) {

because bundling was skipped. But since the targetPath is different (inside a nested outdir) the asset is not considered as already staged:

// Already staged
if (fs.existsSync(targetPath)) {
return;
}

so it falls back to a normal asset copy:

// Copy file/directory to staging directory
const stat = fs.statSync(this.sourcePath);
if (stat.isFile()) {
fs.copyFileSync(this.sourcePath, targetPath);
} else if (stat.isDirectory()) {
fs.mkdirSync(targetPath);
FileSystem.copyDirectory(this.sourcePath, targetPath, this.fingerprintOptions);
} else {
throw new Error(`Unknown file type: ${this.sourcePath}`);
}

jogold added a commit to jogold/aws-cdk that referenced this issue Oct 19, 2020
Always stage assets in the outdir of the top level Stage/App.

Closes aws#10877
jogold added a commit to jogold/aws-cdk that referenced this issue Oct 19, 2020
Always stage assets in the outdir of the top level Stage/App.

Closes aws#10877
@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 19, 2020

because bundling was skipped. But since the targetPath is different (inside a nested outdir) the asset is not considered as already staged:

Is it fair to consider this case as a logic bug? Seems that the fallback behavior here is an accident and not actually intended, right?

@jogold
Copy link
Contributor

jogold commented Oct 19, 2020

Seems that the fallback behavior here is an accident and not actually intended, right?

correct

@jogold
Copy link
Contributor

jogold commented Oct 19, 2020

Seems that the fallback behavior here is an accident and not actually intended, right?

correct

In fact, the code should properly throw if bundling is used and we have a cache hit and we cannot find the already staged asset.

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 19, 2020

I was thinking that too. Kk, good that we agree :).

rix0rrr added a commit that referenced this issue Oct 21, 2020
We stage assets into the Cloud Assembly directory. If there are multiple
nested Cloud Assemblies, the same asset will be staged multiple times.
This leads to an N-fold increase in size of the Cloud Assembly when used
in combination with CDK Pipelines (where N is the number of stages
deployed), and may even lead the Cloud Assembly to exceed CodePipeline's
maximum artifact size of 250MB.

Add the concept of an `assetOutdir` next to a regular Cloud Assembly
`outDir`, so that multiple Cloud Assemblies can share an asset
directory. As an initial implementation, the `assetOutdir` of nested
Cloud Assemblies is just the regular `outdir` of the root Assembly.

We are playing a bit fast and loose with the semantics of file paths
across our code base; many properties just say "the path of X" without
making clear whether it's absolute or relative, and if it's relative
what it's relative to (`cwd()`? Or the Cloud Assembly directory?).

Turns out that especially in dealing with assets, the answer is
"can be anything" and things just happen to work out based on who is
providing the path and who is consuming it. In order to limit the
scope of the changes I needed to make I kept modifications to the
`AssetStaging` class:

* `stagedPath` now consistently returns an absolute path.
* `relativeStagedPath()` a path relative to the Cloud Assembly or an
  absolute path, as appropriate.

Related changes in this PR:

- Refactor the *copying* vs. *bundling* logic in `AssetStaging`. I found
  the current maze of `if`s and member variable changes too hard to
  follow to convince myself the new code would be doing the right thing,
  so I refactored it to reduce the branching factor.

- Switch the tests of `aws-ecr-assets` over to Jest using
  `nodeunitShim`.

Fixes #10877, fixes #9627, fixes #9917.
@mergify mergify bot closed this as completed in #11008 Oct 26, 2020
mergify bot pushed a commit that referenced this issue Oct 26, 2020
)

We stage assets into the Cloud Assembly directory. If there are multiple
nested Cloud Assemblies, the same asset will be staged multiple times.
This leads to an N-fold increase in size of the Cloud Assembly when used
in combination with CDK Pipelines (where N is the number of stages
deployed), and may even lead the Cloud Assembly to exceed CodePipeline's
maximum artifact size of 250MB.

Add the concept of an `assetOutdir` next to a regular Cloud Assembly
`outDir`, so that multiple Cloud Assemblies can share an asset
directory. As an initial implementation, the `assetOutdir` of nested
Cloud Assemblies is just the regular `outdir` of the root Assembly.

We are playing a bit fast and loose with the semantics of file paths
across our code base; many properties just say "the path of X" without
making clear whether it's absolute or relative, and if it's relative
what it's relative to (`cwd()`? Or the Cloud Assembly directory?).

Turns out that especially in dealing with assets, the answer is
"can be anything" and things just happen to work out based on who is
providing the path and who is consuming it. In order to limit the
scope of the changes I needed to make I kept modifications to the
`AssetStaging` class:

* `stagedPath` now consistently returns an absolute path.
* `relativeStagedPath()` a path relative to the Cloud Assembly or an
  absolute path, as appropriate.

Related changes in this PR:

- Refactor the *copying* vs. *bundling* logic in `AssetStaging`. I found
  the current maze of `if`s and member variable changes too hard to
  follow to convince myself the new code would be doing the right thing,
  so I refactored it to reduce the branching factor.

- Switch the tests of `aws-ecr-assets` over to Jest using
  `nodeunitShim`.

Fixes #10877, fixes #9627, fixes #9917.


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/pipelines CDK Pipelines library bug This issue is a bug. needs-triage This issue or PR still needs to be triaged.
Projects
None yet
4 participants