-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws-sdk: Increase in TimeoutErrors in Amplify hotswap builds #32219
Comments
Also, our latest version is 2.168.0 now. Can you verify with this version? |
I have edited the description to provide some additional info about repro instructions, but this issue is very fickle and is hard to reproduce consistently. It does appear in the most recent cdk version. |
I can't reproduce it but this could be the potential root cause: The timeout occurs because AWS Lambda requires the function to be in an aws-cdk/packages/aws-cdk/lib/api/hotswap/lambda-functions.ts Lines 333 to 336 in 27babe6
AWS SDK v3 has a different waiter implementation, timeout handling and delay strategy that handle requests differently from v2. This could affect how timeouts are processed. I guess we should either increase the timeout somehow from here or add status checking like const response = await lambda.getFunctionConfiguration({
FunctionName: functionName
});
if (response.State === 'Active' && response.LastUpdateStatus === 'Successful') {
// Proceed with hotswap
} I'll bring this up to the core team for further investigation. |
Just to be clear, this message was introduced in version 2.167.0. Previously, the same error would result in a generic error message. So, the fact that we are now seeing it in the wild, by itself, doesn't imply that there is anything wrong there. Indeed, it would be surprising if this message didn't start appearing. To establish that there is an issue, we need to find a case in which this error message is shown in version 2.167.0 or later, and no error happens at all in earlier versions. Given that it's hard to reproduce it consistently, we would need to run each version a few times and compare the error rates. |
The issue surfaced immediately in our e2e tests after we merged CDK version bump here aws-amplify/amplify-backend#2269 . The failure https://github.com/aws-amplify/amplify-backend/actions/runs/12036537741/job/33558355227 . It happened in all three jobs of same kind at first try. I can't find/recall examples of these tests failing due to timeout before. |
It has consistently failed in 5/6 runs.
|
I was able to establish local repro that explains why our tests are failing.
It seems that new waiter implementation is now requiring new IAM permissions to function. |
Comments on closed issues and PRs are hard for our team to see. |
1 similar comment
Comments on closed issues and PRs are hard for our team to see. |
…ction is not allowed (#32301) Closes #32219 ### Reason for this change In SDKv3, the standard `waitUntilFunctionUpdated` function invokes the `GetFunctionConfiguration` API, as opposed to SDKv2, which invoked `GetFunction`. This means that consumers of SDKv3 must allow the `lambda:GetFunctionConfiguration` action in their IAM role policy. ### Description of changes Use a different waiter function provided by the SDK, which invokes `GetFunction` instead of `GetFunctionConfiguration`, and thus restoring required IAM permissions to what they were in SDKv2. See https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-lambda/src/waiters/waitForFunctionUpdatedV2.ts#L10 > As opposed to https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-lambda/src/waiters/waitForFunctionUpdated.ts#L13 ### Description of how you validated changes Manul test. Assumed a role with the following policies: ![Screenshot 2024-11-27 at 9 34 25](https://github.com/user-attachments/assets/69415c37-6fe8-44d3-972c-1373ec55f46e) ```console ❯ cdk deploy --hotswap [09:29:11] ✨ Synthesis time: 2.72s⚠️ The --hotswap and --hotswap-fallback flags deliberately introduce CloudFormation drift to speed up deployments⚠️ They should only be used for development - never use them for your production Stacks! AwsCdkPlaygroundStack: deploying... [1/1] ✨ hotswapping resources: ✨ Lambda Function 'AwsCdkPlaygroundStack-Function76856677-7Rl7hiwwO5LQ' ❌ AwsCdkPlaygroundStack failed: TimeoutError: Resource is not in the expected state due to waiter status: TIMEOUT. Waiter has timed out. ``` Then, run the CLI from the PR. ```console ❯ /Users/epolon/dev/src/github.com/aws/aws-cdk/packages/aws-cdk/bin/cdk deploy --hotswap [10:03:00] ✨ Synthesis time: 3.46s⚠️ The --hotswap and --hotswap-fallback flags deliberately introduce CloudFormation drift to speed up deployments⚠️ They should only be used for development - never use them for your production Stacks! AwsCdkPlaygroundStack: deploying... [1/1] ✨ hotswapping resources: ✨ Lambda Function 'AwsCdkPlaygroundStack-Function76856677-7Rl7hiwwO5LQ' ✨ Lambda Function 'AwsCdkPlaygroundStack-Function76856677-7Rl7hiwwO5LQ' hotswapped! ✅ AwsCdkPlaygroundStack ✨ Deployment time: 12.72s Stack ARN: arn:aws:cloudformation:us-east-1:01234567890:stack/AwsCdkPlaygroundStack/22f2b380-a7cd-11ef-badd-0e08a8e0b5b1 ✨ Total time: 16.19s >>> elapsed time 23s ``` ### Checklist - [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md) ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Describe the bug
Starting from 11/14, we have started seeing
TimeoutError: Resource is not in the expected state due to waiter status: TIMEOUT
, with a frequency we have never seen before. This error was added here in version 2.167.0.Regression Issue
Last Known Working CDK Version
2.166.0
Expected Behavior
Very few to none
TimeoutError: Resource is not in the expected state due to waiter status: TIMEOUT
appearing to our customers.Current Behavior
TimeoutError: Resource is not in the expected state due to waiter status: TIMEOUT
has become one of the most common error messages our customers are receiving.Reproduction Steps
Possible Solution
Roll back this PR: #31702
Additional Information/Context
No response
CDK CLI Version
2.167.0
Framework Version
No response
Node.js Version
OS
Linux/Mac/Windows
Language
TypeScript
Language Version
No response
Other information
No response
The text was updated successfully, but these errors were encountered: