Skip to content

Conversation

@Y-JayKim
Copy link
Contributor

@Y-JayKim Y-JayKim commented May 8, 2025

Issue # (if applicable)

Closes #17515 .

Reason for this change

AWS CDK-generated Step Function roles break in-flight Step Function executions when using versioned Lambda functions. During deployment, the Step Function’s IAM role is updated to include permissions for the new Lambda version but removes permissions for the previous version. This causes lambda:InvokeFunction permission failures in in-flight executions that were started before the deployment and are still trying to invoke the previous Lambda version.

This issue is particularly problematic when using Step Function Aliases with deployment preferences for traffic shaping, as a percentage of new executions are directed to the previous version of the state machine, which attempts to invoke a Lambda version it no longer has permissions for.

Description of changes

Implemented a feature flag STEPFUNCTIONS_TASKS_LAMBDA_INVOKE_GRANT_ALL_VERSIONS to control IAM permissions granted when using Lambda versions with Step Functions:

Added a new feature flag in cx-api/lib/features.ts with detailed documentation
Modified LambdaInvoke task implementation to check for this flag:
When enabled: grants permissions to both the specific Lambda version AND all versions using a wildcard pattern (function-arn:*)
When disabled (default behavior): maintains current behavior of granting permission only to the specific version
Updated API documentation to clearly explain the feature flag usage
Updated the README.md to include examples showing how to enable the feature flag
This approach maintains backward compatibility while giving users an opt-in solution to prevent in-flight executions from failing during deployments.

Describe any new or updated permissions being added

When the feature flag is enabled, the Step Function's IAM role will now include an additional IAM permission that grants access to all versions of the Lambda function using a wildcard pattern, e.g.:

  • Before: "Resource": ["arn:aws:lambda:region:account:function:name:version"]
  • After: "Resource": ["arn:aws:lambda:region:account:function:name:version", "arn:aws:lambda:region:account:function:name:*"]

Description of how you validated changes

  • Added comprehensive unit tests that verify both behaviors (with feature flag enabled and disabled)
  • Updated integration tests to demonstrate both scenarios with and without the feature flag
  • Created test suites to verify behavior with both versioned Lambda functions and non-versioned Lambda functions

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@aws-cdk-automation aws-cdk-automation requested a review from a team May 8, 2025 22:06
@github-actions github-actions bot added bug This issue is a bug. effort/small Small work item – less than a day of effort p2 labels May 8, 2025
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label May 8, 2025
@GavinZZ GavinZZ added pr/do-not-merge This PR should not be merged at this time. needs-security-review Related to feature or issues that needs security review labels May 12, 2025
@GavinZZ
Copy link
Member

GavinZZ commented May 12, 2025

Adding a do-not-merge and needs-security-review labels to wait on an update from security review on this approach.

@Y-JayKim Y-JayKim force-pushed the fix/stepfunction-allversion-permission branch from 11ed8dd to f8252a2 Compare May 20, 2025 22:19
@Y-JayKim Y-JayKim removed the pr/do-not-merge This PR should not be merged at this time. label May 20, 2025
@samson-keung samson-keung assigned samson-keung and unassigned GavinZZ May 22, 2025
const functionArn = this.props.lambdaFunction.functionArn;
let resources: string[];
if (grantAllVersions) {
const baseArn = functionArn.replace(/:[^:]*$/, '');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functionArn could be a token I believe. So calling .replace on token will not work.

Maybe the resourceArnsForGrantInvoke can be used instead, or the grantInvoke method. Please verify as I haven't dived super deep on this.

};

const grantAllVersions = cdk.FeatureFlags.of(this).isEnabled(cxapi.STEPFUNCTIONS_TASKS_LAMBDA_INVOKE_GRANT_ALL_VERSIONS);
const functionArn = this.props.lambdaFunction.functionArn;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the doc for the lambdaFunction property and call out what behaviour to expect when using the property and the feature flag. By doc, I mean this: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_stepfunctions_tasks.LambdaInvoke.html#lambdafunction

In particular, I think we should call out that in XYZ configuration, even if user pass specific version to LambdaInvoke the permission will include ALL versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with the new commit

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: c80ec19
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@alvazjor alvazjor added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Aug 25, 2025
@plushie-cat
Copy link

Hello, no rush or pressure or anything but just wanted to understand if this PR is awaiting further review or if the pipeline checks are needing to be addressed first?

@plushie-cat
Copy link

Hello, no rush or pressure or anything but just wanted to understand if this PR is awaiting further review or if the pipeline checks are needing to be addressed first?

I don't mean to keep asking about this sorry 😅 but anyone is able to provide some clarity on where this PR is at that would be greatly appreciated.

@aws-cdk-automation aws-cdk-automation added the pr/needs-further-review PR requires additional review from our team specialists due to the scope or complexity of changes. label Oct 1, 2025
@ozelalisen ozelalisen removed the needs-security-review Related to feature or issues that needs security review label Oct 1, 2025
@Abogical Abogical self-assigned this Oct 23, 2025
Copy link
Member

@Abogical Abogical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Y-JayKim ! Sorry for the late review. I have a couple of points:

  • I don't believe this should be implemented via a feature flag. Rather it should be an optional flag that indicates whether or not you want the policy to be broad enough to include all versions, with the default being false. Broadening the policy to include all versions of a lambda function may be a security risk. We should have users be aware of that risk explicitly via setting this flag.
  • There was a build failure with this PR but the logs were deleted as more than 30 days have passed since then. I'm not sure what failed previously. Regardless, you can try removing the snapshots and the feature flag, rebasing from main, and fix any conflicts before regenerating the snapshots. Hopefully, the build will work this time.

@Abogical
Copy link
Member

Hi @Y-JayKim , are you still working on this?

@Abogical Abogical added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Oct 30, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2025

This PR has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Nov 1, 2025
@github-actions github-actions bot closed this Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug This issue is a bug. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. contribution/core This is a PR that came from AWS. effort/small Small work item – less than a day of effort p2 pr/needs-further-review PR requires additional review from our team specialists due to the scope or complexity of changes. pr/needs-maintainer-review This PR needs a review from a Core Team Member response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(stepfunctions): CDK generated stepfunction roles breaking inflight stepfunction executions with versioned lambdas

8 participants