Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

‼️ NOTICE: cdk deploy fails when stacks share assets in v2.80.0 #25714

Closed
liamoneill opened this issue May 24, 2023 · 2 comments · Fixed by #25719
Closed

‼️ NOTICE: cdk deploy fails when stacks share assets in v2.80.0 #25714

liamoneill opened this issue May 24, 2023 · 2 comments · Fixed by #25719
Labels
bug This issue is a bug. effort/small Small work item – less than a day of effort p0 package/tools Related to AWS CDK Tools or CLI

Comments

@liamoneill
Copy link
Contributor

liamoneill commented May 24, 2023

What is the issue?

In version 2.80.0 of the CDK cli, released May 19, 2023, a WorkGraph was introduced to handle tasks for cdk deploy. However, shared assets between CDK stacks resulted in a dependency cycle and the WorkGraph would be unable to determine what work to do next.

A further issue was discovered later, which could result in a deployment to fail with Error: Duplicate use of node id.
when Assets were shared between Stages.

What is the impact?

If a CDK app contains two stacks that have a dependency relationship between them, and both use the same asset, the cdk deploy command would exit early. The logs from cdk deploy abruptly end after the assets are published, without attempting the actual CloudFormation deploy. Notably the command exited with exit code 0 and without any error message indicating the issue.

Who is affected?

You are affected if you are using the CDK cli version >= 2.80.0,<2.83.1 and have shared assets between stacks or stages.

How do I resolve this?

Please upgrade to the latest version of the CDK cli.
Version 2.81.0 was released with a fix on May 25, 2023.
Version 2.83.1 was released with a fix on June 9, 2023.


Original issue below:

Describe the bug

Since upgrading from v2.76.0 to v2.80.0 apps with dependant stacks that share assets fail to deploy. The logs from cdk deploy abruptly end after the assets are published, without attempting the actual CloudFormation deploy. Notably the command exited with exit code 0 and without any error message indicating the issue.

Visualising the deploy's WorkGraph, noticed that there was a cycle between the first stack and the shared asset.

The stacks share the same asset because they both deploy a Lambda function bundled from the same directory.

graph-with-cycle

This cycle is explained by the publish step depending on its parent stack's dependencies:

dependencies: new Set([
buildId,
// The asset publish step also depends on the stacks that the parent depends on.
// This is purely cosmetic: if we don't do this, the progress printing of asset publishing
// is going to interfere with the progress bar of the stack deployment. We could remove this
// for overall faster deployments if we ever have a better method of progress displaying.
...this.getDepIds(parentStack.dependencies),
]),

Why does the deploy command exit abruptly without an error...

During my debugging I noticed that when an exception is thrown from inside the recursed graph traversal code it's possible for active[x.type]--; to be ran twice for the same node: once before call start() and again after an error is caught.

void fn(x).then(() => {
active[x.type]--;
graph.deployed(x);
start();
}).catch((err) => {
active[x.type]--;
// By recording the failure immediately as the queued task exits, we prevent the next
// queued task from starting.
graph.failed(x, err);
start();
});

This means that in the final start() call totalActive() === -1 and therefore the final error check is skipped and the graph traversal exits without throwing

if (totalActive() === 0) {
if (graph.done()) {
ok();
}
// wait for other active deploys to finish before failing
if (graph.hasFailed()) {
fail(graph.error);
}
}

Expected Behavior

Running cdk deploy should create/update CloudFormation stacks when there are changes.

Current Behavior

The following is the output from a cdk deploy run that should have triggered cdk to create the app's stacks in CloudFormation.

$ npm run -- cdk deploy --require-approval=never --all

> cdk-app@0.1.0 cdk
> cdk deploy --require-approval=never --all


✨  Synthesis time: 9.19s

StackA:  start: Building 59d8ff92f2e05057ab76296224335641e49a49de5340ba4eaadc1db1bb98ec13:current_account-current_region
StackA:  success: Built 59d8ff92f2e05057ab76296224335641e49a49de5340ba4eaadc1db1bb98ec13:current_account-current_region
StackA:  start: Building 2c3f6b7194aed42363a8a33f4d46c307e237fef82efea8f1862a5ffd6041529a:current_account-current_region
StackA:  success: Built 2c3f6b7194aed42363a8a33f4d46c307e237fef82efea8f1862a5ffd6041529a:current_account-current_region
StackB:  start: Building 3990b68e52ea41bffd9515f2a18e073b879243875a8b83e1bd658fc056a0ce6f:current_account-current_region
StackB:  success: Built 3990b68e52ea41bffd9515f2a18e073b879243875a8b83e1bd658fc056a0ce6f:current_account-current_region
StackA:  start: Publishing 2c3f6b7194aed42363a8a33f4d46c307e237fef82efea8f1862a5ffd6041529a:current_account-current_region
StackA:  success: Published 2c3f6b7194aed42363a8a33f4d46c307e237fef82efea8f1862a5ffd6041529a:current_account-current_region

# Exit code is 0 despite nothing being deployed.
$ echo $?
0

Reproduction Steps

#!/usr/bin/env node

import 'source-map-support/register';
import path = require('path');
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';
import { Code, Function, Runtime } from 'aws-cdk-lib/aws-lambda';

class StackA extends cdk.Stack {
  functionA: NodejsFunction;

  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    this.functionA = new Function(this, 'Function', {
      functionName: 'function-a',
      runtime: Runtime.NODEJS_18_X,
      handler: 'index.handler',
      code: Code.fromAsset(path.join(__dirname, '../lambda-handler')),
    });
  }
}

class StackB extends cdk.Stack {
  constructor(scope: Construct, id: string, props: cdk.StackProps, functionA: Function) {
    super(scope, id, props);

    new Function(this, 'Function', {
      functionName: 'function-b',
      runtime: Runtime.NODEJS_18_X,
      handler: 'index.handler',
      code: Code.fromAsset(path.join(__dirname, '../lambda-handler')),
      environment: {
        FUNCTION_A: functionA.functionArn
      }
    });
  }
}

const app = new cdk.App();
const stackA = new StackA(app, 'StackA', {});
new StackB(app, 'StackB', {}, stackA.functionA);

app.synth()

See https://github.com/liamoneill/cdk-deploy-bug-reproduction

Possible Solution

  • Linearise asset building by only adding dependencies on other assets and not stacks.
  • Add a check to prevent active[x.type]--; from being called twice for a single node.
  • Would it be a worthwhile consideration to detect any loops in the deployment graph before starting the actual deploy? Unless that's already implemented when stack references are resolved.

Additional Information/Context

No response

CDK CLI Version

2.80.0 (build bbdb16a)

Framework Version

No response

Node.js Version

v16.20.0

OS

OS X & Ubuntu

Language

Typescript

Language Version

TypeScript (5.0.4)

Other information

Output from adding in a console.log(workGraph) on this line https://github.com/aws/aws-cdk/blob/v2.80.0/packages/aws-cdk/lib/cdk-toolkit.ts#L364

StackA := pending stack (3a72cef490a57e917476cc38c54239b36905a9145dd616ca3aa27f76289c6ab9:current_account-current_region-publish,2678fe7bc8cabbec40a701a87bc751c527801d42499067236a234a6d49fd46af:current_account-current_region-publish)
3a72cef490a57e917476cc38c54239b36905a9145dd616ca3aa27f76289c6ab9:current_account-current_region-build := pending asset-build
3a72cef490a57e917476cc38c54239b36905a9145dd616ca3aa27f76289c6ab9:current_account-current_region-publish := pending asset-publish (3a72cef490a57e917476cc38c54239b36905a9145dd616ca3aa27f76289c6ab9:current_account-current_region-build,StackA)
2678fe7bc8cabbec40a701a87bc751c527801d42499067236a234a6d49fd46af:current_account-current_region-build := pending asset-build
2678fe7bc8cabbec40a701a87bc751c527801d42499067236a234a6d49fd46af:current_account-current_region-publish := pending asset-publish (2678fe7bc8cabbec40a701a87bc751c527801d42499067236a234a6d49fd46af:current_account-current_region-build)
StackB := pending stack (StackA,3a72cef490a57e917476cc38c54239b36905a9145dd616ca3aa27f76289c6ab9:current_account-current_region-publish,f360c023dafc7a0c56111a8b7dee3f8a3e9ec7c69beb27897209cc6dab868aac:current_account-current_region-publish)
f360c023dafc7a0c56111a8b7dee3f8a3e9ec7c69beb27897209cc6dab868aac:current_account-current_region-build := pending asset-build
f360c023dafc7a0c56111a8b7dee3f8a3e9ec7c69beb27897209cc6dab868aac:current_account-current_region-publish := pending asset-publish (f360c023dafc7a0c56111a8b7dee3f8a3e9ec7c69beb27897209cc6dab868aac:current_account-current_region-build,StackA)

Thanks for the hard work on CDK! I'm a happy user.

@liamoneill liamoneill added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels May 24, 2023
@github-actions github-actions bot added the @aws-cdk/core Related to core CDK functionality label May 24, 2023
@mergify mergify bot closed this as completed in #25719 May 25, 2023
mergify bot pushed a commit that referenced this issue May 25, 2023
…en them share an asset (#25719)

Fixes cdk deploy not completing for some apps by removing cycles from core's worker graph between asset publishing steps & stacks. These cycles may be introduced when there dependencies between stacks and these stacks share assets with the same contents.

Additionally fixes a bug in the worker graph's error handling for cycles. 

Closes #25714

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@kaizencc kaizencc changed the title core: cdk deploy does not run when dependant stacks share assets ‼️ NOTICE: cdk deploy does not run when dependant stacks share assets in v2.80.0 May 26, 2023
@kaizencc kaizencc added p0 effort/small Small work item – less than a day of effort package/tools Related to AWS CDK Tools or CLI and removed needs-triage This issue or PR still needs to be triaged. @aws-cdk/core Related to core CDK functionality labels May 26, 2023
@kaizencc kaizencc pinned this issue May 26, 2023
@kaizencc kaizencc changed the title ‼️ NOTICE: cdk deploy does not run when dependant stacks share assets in v2.80.0 ‼️ NOTICE: cdk deploy fails when stacks share assets in v2.80.0 May 26, 2023
kaizencc added a commit to cdklabs/aws-cdk-notices that referenced this issue May 26, 2023
mergify bot pushed a commit to cdklabs/aws-cdk-notices that referenced this issue May 26, 2023
only affects cli version 2.80.0
issue: aws/aws-cdk#25714
@garethjudson
Copy link

I get an issue in v1.81.0 (typescript), which I suspect is related to dependant stacks.

In my environment we have many stacks which are dependant. This was making a linear dependency where each stack was dependant on an immediately prior stack.

Found this easiest way to manage dependency order, where you individual devs may not have the context to determine this easily. This way we don't have to think about dependency order too hard, earlier means before anything later...easy.
If this were causing the issue, could change it, however, before expending development effort, would like to understand the cause. This issue looks suspiciously similar, given the issue is visible in 1.81.0, but not 1.76.0

Mid way through our deployment we get an error:

❌ Deployment failed: Error: Unable to make progress anymore among: ...

Where '...' is a printed list of every stack, simliar to:
wwwStack := completed stack, ... yyyStack:= pending asset-publish, wwwStack := pending stack

Downgrading to version 1.76.0 resolves the issue.
Happy to help with any additional information.

Thanks a bunch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. effort/small Small work item – less than a day of effort p0 package/tools Related to AWS CDK Tools or CLI
Projects
None yet
3 participants