Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws-cdk-lib): asynchronous build with unhandled errors cause deployment malfunctions #17944

Closed
ChristopheBougere opened this issue Dec 10, 2021 · 10 comments
Assignees
Labels
aws-cdk-lib Related to the aws-cdk-lib package bug This issue is a bug. closing-soon This issue will automatically close in 4 days unless further comments are made. effort/small Small work item – less than a day of effort needs-reproduction This issue needs reproduction. p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.

Comments

@ChristopheBougere
Copy link
Contributor

What is the problem?

I have a CDK project deployed with an OpenSearch Domain with availability zone awareness.
To reduce costs, I wanted to reduce the number of nodes in my dev env.
I changed the dataNodes to 1, as well as the availabilityZoneCount.
This was a mistake, because availabilityZoneCount should be at least 2 (or we should simply set zoneAwareness to undefined to disable it).
However, it resulted in an unhandled error (see log below) and in the complete removal of the construct that contained the OpenSearch domain.

(node:5224) UnhandledPromiseRejectionWarning: Error: Invalid zone awareness configuration; availabilityZoneCount must be 2 or 3
    at new Domain (/Users/pathToMyProject/node_modules/aws-cdk-lib/aws-opensearchservice/lib/domain.ts:648:13)
    at new Search (/Users/pathToMyProject/api/cdk/search.ts:38:23)
    at new ApiStack (/Users/pathToMyProject/api/cdk/index.ts:75:24)
    at new MyProjectStack (/Users/pathToMyProject/cdk/lib/index.ts:51:21)
    at build (/Users/pathToMyProject/cdk/bin/my-project.ts:43:5)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:5224) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 13)
(node:5224) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Reproduction Steps

  1. Create a project with an OpenSearch domain set in a vpc with 3 data nodes, and zone awareness in 3 AZs
  2. Deploy
  3. Change the data nodes to 1, and availabilityZoneCount to 1 as well
  4. Deploy. You will see an unhandled error in the logs, but the deployment will continue and the construct that use to contain the open search domain will be totally deleted.

What did you expect to happen?

The deployment should have just failed, instead of deleting one of my constructs.

What actually happened?

The error was unhandled, and one of my construct (the one in which an error happened) was totally suppressed.

CDK CLI Version

2.1.0

Framework Version

2.1.0

Node.js Version

v14.16.0

OS

macOS Monterey 12.0.1

Language

Typescript

Language Version

3.9.7

Other information

This is, indeed, a very critical bug 😬

@ChristopheBougere ChristopheBougere added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 10, 2021
@github-actions github-actions bot added the @aws-cdk/aws-opensearch Related to the @aws-cdk/aws-opensearchservice package label Dec 10, 2021
@ryparker ryparker added the p1 label Dec 10, 2021
@kaizencc
Copy link
Contributor

Hi @ChristopheBougere! Thanks for the bug report. I wasn't able to reproduce this bug on my end following your steps. Setting the availabilityZoneCount to 1 results in a synth time error for me, due to this check:

if (![2, 3].includes(availabilityZoneCount)) {
throw new Error('Invalid zone awareness configuration; availabilityZoneCount must be 2 or 3');
}

Is it possible that you are not using the opensearch.Domain construct - maybe you are using opensearch.CfnDomain? At any rate, could you please provide a code snippet that will help me identify exactly how you came across this deploy-time error.

@kaizencc kaizencc added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Dec 11, 2021
@github-actions
Copy link

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Dec 13, 2021
@ChristopheBougere
Copy link
Contributor Author

Hi @kaizen3031593
Thanks for having a look! So I am using the opensearch.Domain construct. I also had the same error thrown, Invalid zone awareness configuration; availabilityZoneCount must be 2 or 3, but as you can see in the initial stack trace, the promise rejection was unhandled, which I think is the reason the deployment continued after that.
However, this is the first time I see this kind of behaviour in CDK, that's why I'm reporting (maybe related to v2?).
I'll try to provide a code snippet

@ChristopheBougere
Copy link
Contributor Author

Also, my error seems very similar to this one:
#4927 (comment)
I am also resolving config asynchronously in a very similar way, which might make async promise rejection unhandled.

@github-actions github-actions bot removed closing-soon This issue will automatically close in 4 days unless further comments are made. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Dec 13, 2021
@ChristopheBougere
Copy link
Contributor Author

I can confirm that this is not an issue with the opensearch module itself, it happens for any error thrown at deployment time.

The issue comes from the way errors are handled.

I solved it by adding a try/catch with a process.exit in my bin/my-project.ts file:

Before

#!/usr/bin/env node

import { App } from 'aws-cdk-lib';

async function build() {
    const app = new App();

    // resolving config...

    new MyProjectStack(app, 'my-project');
}

build();

After

#!/usr/bin/env node

import { App } from 'aws-cdk-lib';

async function build() {
  try {
    const app = new App();

    // resolving config...

    new MyProjectStack(app, 'my-project');

  } catch {
    console.error('CDK unhandled error:', error);
    process.exit(1);
  }
}

build();

@kaizen3031593 Feel free to close the issue. However, I feel there might have been a change in how unhandled errors are catched in cdk v2 (prior to uprade, cdk thrown errors use to fail deployment properly).

@kaizencc
Copy link
Contributor

@ChristopheBougere I am going to reroute your issue to CDK v2 and @njlynch but I am not sure what, if anything, we can do about building asynchronously.

@kaizencc kaizencc added aws-cdk-lib Related to the aws-cdk-lib package and removed p1 @aws-cdk/aws-opensearch Related to the @aws-cdk/aws-opensearchservice package labels Dec 13, 2021
@kaizencc kaizencc changed the title (aws-opensearchservice): unhandled error cause complete construct removal (aws-cdk-lib): asynchronous build with unhandled errors cause deployment malfunctions Dec 13, 2021
@kaizencc kaizencc assigned njlynch and unassigned kaizencc Dec 13, 2021
@peterwoodworth peterwoodworth removed the needs-triage This issue or PR still needs to be triaged. label Dec 14, 2021
@njlynch
Copy link
Contributor

njlynch commented Dec 14, 2021

Generally, we haven't historically supported async processing within constructs. See #8273 for a long thread on some of the rationale (and counter-arguments) to why that is.

That being said, I'm intrigued by the claim that it worked in v1 but not v2. Can you provide a reproducible example that we can synth/deploy on v1 vs v2 and see what's changed?

@njlynch njlynch added effort/small Small work item – less than a day of effort needs-reproduction This issue needs reproduction. p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Dec 14, 2021
@github-actions
Copy link

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Dec 16, 2021
@ChristopheBougere
Copy link
Contributor Author

Thanks for pointing to this thread @njlynch
I've tried to provide a repro, and it ended up in CDK v1 having the same behavior than v2 :/
I was surprised that it didn't happen to me before.
While I understand the reasons for not supporting async processing, catching this kind of exception could still be helpful to avoid some errors (maybe with a CLI warning if cdk detect async usage?).

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws-cdk-lib Related to the aws-cdk-lib package bug This issue is a bug. closing-soon This issue will automatically close in 4 days unless further comments are made. effort/small Small work item – less than a day of effort needs-reproduction This issue needs reproduction. p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.
Projects
None yet
Development

No branches or pull requests

5 participants