-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
custom_resources: log retention rate limit error during deploy #24485
Comments
thanks @jsauter for the report. |
Hey @jsauter , thanks for reporting this, as we can't tell what's the rate limit is about, this screenshot is still very helpful. |
Hi @jsauter Can you tell how many custom resources or log groups with logRotation enabled you are creating in this CDK app? I am just curious where the limit comes from. |
It this case of this stack, we are created a custom resource and configuring it for log retention. So one in our code, and I guess two from CDK, the 'AWS CDK resource provider framework - onEvent' and the logRetetion cr. |
I'm also running into the issue that the logrotation lambda is timing out at three seconds, causing the stack to timeout/fail. Some additional tuning options would be helpful to resolve issues with this without having to use escape hatches.
|
I'm also hitting this issue. Here's a stack trace for reference. I think the
We've got devs running into this - all you have to do is try to deploy an app with a bunch of lambdas, all with One work-around a team found: They manually reduced the number of lambdas in their stack then deployed several times, introducing only a few new ones each time. It's tedious, but it did get them past the rate limit issue. |
Doing a little digging, in case it helps somebody, it looks like the LogRetention function code just pulls retry options from its resource properties, which is also available to specify by the user in I'm not familiar with the JavaScript aws-sdk, so it's unclear to me if these rate exceeded errors are retried and what the sdk v3 retry strategy looks like, if it does. I don't see any logs in CloudWatch indicating that any retries were done. |
If you use "base" option in logRetentionRetryOptions (I use 200 millis) you can deploy without getting "rate exceeded". But, I don´t know why "base" is deprecated. If anyone can explain the reason to deprecate "base" if it is still working and there is no other solution. |
I think it was because they migrated from SDK v2 to SDK v3 in the lambda and retry strategies work differently in v3. |
This issue got worse since |
Ha, that is a good catch! You can totally configure the retry mechanism to exceed the lambda time out. I'll get on fixing that!
Hi @aestebance,
@jusdino Thanks for the report and sorry for this. This has been resolved in #26858 and the release for it should be out any time now. You are waiting for The reason for this regression was also the migration to SDKv3. Basically I've missed that we now need to explicitly check for |
) We use a custom resource to set the log retention for log groups created by the Lambda service. This custom resource handler code has a built-in retry mechanism to avoid throttling when executing many LogRetention CRs. Users can customize the number of possible retries, potentially retrying for a long time. This can cause the situation that further retries should be attempted, but the Lambda Function timeout is exceeded. The change sets the lambda execution timeout to its maximum value to allow for up to 15 minutes of retries. If the retry budget is exhausted, the handler will throw an error and exit early. Closes #24485 ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
|
) We use a custom resource to set the log retention for log groups created by the Lambda service. This custom resource handler code has a built-in retry mechanism to avoid throttling when executing many LogRetention CRs. Users can customize the number of possible retries, potentially retrying for a long time. This can cause the situation that further retries should be attempted, but the Lambda Function timeout is exceeded. The change sets the lambda execution timeout to its maximum value to allow for up to 15 minutes of retries. If the retry budget is exhausted, the handler will throw an error and exit early. Closes #24485 ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Describe the bug
When deploying a stack containing a custom resource, we received several rate limit exceptions when the cdk code deployed the logRetention infrastructure. This seems to happen intermittently and we have not received feedback recently of the issue. However, we figured you would like to know.
Expected Behavior
Not to receive a rate limit exception and the application to deploy.
Current Behavior
Deploying cdk application that contained a customer resource and lambda, the logRetention infrastructure generated a rate exceeded exception.
Reproduction Steps
It has not happened recently, but create a cdk application with a custom resource that has configuration that will generate a logRetention policy etc.
Possible Solution
No response
Additional Information/Context
We are curious if there is a way to manage these rate limit exceptions if they are outside of our code.
CDK CLI Version
2.29.1
Framework Version
No response
Node.js Version
14.18.3
OS
macos
Language
Typescript
Language Version
No response
Other information
No response
The text was updated successfully, but these errors were encountered: