Skip to content

Conversation

@newlinedeveloper
Copy link

Description

Fixes an issue where retrying a CloudFormation deployment that uses a custom resource with an async waiter fails with ExecutionAlreadyExists error.

Root Cause

The custom resource provider framework uses CloudFormation's RequestId as the Step Functions execution name when starting the waiter state machine. When CloudFormation retries a failed deployment, it reuses the same RequestId. Since Step Functions execution names must be unique for 90 days, subsequent retry attempts fail with ExecutionAlreadyExists.

Solution

Removed the name parameter from the startExecution call, allowing Step Functions to auto-generate unique execution names. This is the recommended approach per the AWS Step Functions StartExecution API Reference, where the name parameter is optional and Step Functions will automatically generate a universally unique identifier (UUID) as the execution name if not provided.

Changes

  • Removed name: resourceEvent.RequestId from the waiter state machine execution call in framework.ts
  • Updated log statement to remove the name field
  • Added unit test to verify that name is not included in the startExecution call

Testing

  • Added unit test waiter state machine execution does not include name field (allows retries) to verify the fix
  • All existing unit tests pass
  • Verified that the mock assertion checks for name being undefined

Related Issue

Fixes #35957

Verification

The fix was verified by:

  1. Running unit tests to ensure the name field is not included
  2. Confirming that existing tests continue to pass
  3. The change aligns with AWS Step Functions best practices for execution naming

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. effort/medium Medium work item – several days of effort p1 labels Nov 8, 2025
@aws-cdk-automation aws-cdk-automation requested a review from a team November 8, 2025 11:14
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter fails with the following errors:

❌ Fixes must contain a change to an integration test file and the resulting snapshot.

If you believe this pull request should receive an exemption, please comment and provide a justification. A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed, add Clarification Request to a comment.

✅ A exemption request has been requested. Please wait for a maintainer's review.

@newlinedeveloper
Copy link
Author

Exemption Request

This fix is in runtime code (Lambda function execution) and does not change CloudFormation templates or infrastructure. The existing integration tests verify infrastructure creation, which is unaffected by this change. Unit tests provide comprehensive coverage of the runtime behavior change.

@aws-cdk-automation aws-cdk-automation added the pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback. label Nov 8, 2025
@newlinedeveloper newlinedeveloper force-pushed the fix/custom-resources-waiter-retry-execution-name branch from 2a0d935 to 6d329d8 Compare November 8, 2025 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. effort/medium Medium work item – several days of effort p1 pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CustomResource Provider: WaiterStateMachine can't start when stack deployment is retried

2 participants