custom_resources: log retention rate limit error during deploy #24485

jsauter · 2023-03-06T22:41:05Z

Describe the bug

When deploying a stack containing a custom resource, we received several rate limit exceptions when the cdk code deployed the logRetention infrastructure. This seems to happen intermittently and we have not received feedback recently of the issue. However, we figured you would like to know.

Expected Behavior

Not to receive a rate limit exception and the application to deploy.

Current Behavior

Deploying cdk application that contained a customer resource and lambda, the logRetention infrastructure generated a rate exceeded exception.

Reproduction Steps

It has not happened recently, but create a cdk application with a custom resource that has configuration that will generate a logRetention policy etc.

Possible Solution

No response

Additional Information/Context

We are curious if there is a way to manage these rate limit exceptions if they are outside of our code.

CDK CLI Version

2.29.1

Framework Version

No response

Node.js Version

14.18.3

OS

macos

Language

Typescript

Language Version

No response

Other information

No response

jsauter · 2023-03-06T22:42:15Z

Console output during deploy.

cgarvis · 2023-03-07T00:57:04Z

thanks @jsauter for the report.

khushail · 2023-03-07T18:44:34Z

Hey @jsauter , thanks for reporting this, as we can't tell what's the rate limit is about, this screenshot is still very helpful.

pahud · 2023-03-07T18:53:17Z

Hi @jsauter

Can you tell how many custom resources or log groups with logRotation enabled you are creating in this CDK app? I am just curious where the limit comes from.

jsauter · 2023-03-07T19:09:33Z

It this case of this stack, we are created a custom resource and configuring it for log retention. So one in our code, and I guess two from CDK, the 'AWS CDK resource provider framework - onEvent' and the logRetetion cr.

idwagner · 2023-03-09T15:57:46Z

I'm also running into the issue that the logrotation lambda is timing out at three seconds, causing the stack to timeout/fail. Some additional tuning options would be helpful to resolve issues with this without having to use escape hatches.

INIT_START Runtime Version: nodejs:14.v29	Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:XXXXXX
START RequestId: 8fe6a7e5-a521-48dd-8f73-07b97e0feff2 Version: $LATEST
2023-03-09T15:29:10.583Z	8fe6a7e5-a521-48dd-8f73-07b97e0feff2	INFO	
{
    "RequestType": "Create",
    "ServiceToken": "arn:aws:lambda:us-east-1:XXXX:function:XXXXX-LogRetentionaae0aa3c5b4d4f87b0-m6gZos1arCB1",
    "ResponseURL": "...",
    "StackId": "arn:aws:cloudformation:us-east-1:XXXXX:stack/XXXXX/f4e26fa0-be8d-11ed-8fb1-0ecbd2482a13",
    "RequestId": "1b1b5d99-72d0-43ba-936d-49ab668ffe0b",
    "LogicalResourceId": "XXXXXXLogRetentionED515AF7",
    "ResourceType": "Custom::LogRetention",
    "ResourceProperties": {
        "ServiceToken": "arn:aws:lambda:us-east-1:XXXXX:function:XXXXX-LogRetentionaae0aa3c5b4d4f87b0-m6gZos1arCB1",
        "RetentionInDays": "7",
        "LogGroupName": "/aws/lambda/XXXXX-c1quvH5k3SfR"
    }
}

2023-03-09T15:29:13.587Z 8fe6a7e5-a521-48dd-8f73-07b97e0feff2 Task timed out after 3.01 seconds

END RequestId: 8fe6a7e5-a521-48dd-8f73-07b97e0feff2
REPORT RequestId: 8fe6a7e5-a521-48dd-8f73-07b97e0feff2	Duration: 3006.29 ms	Billed Duration: 3000 ms	Memory Size: 128 MB	Max Memory Used: 26 MB

dylan-westbury · 2023-08-07T08:34:01Z

Also receiving rate exceeded for log retention for cdk stack.

Adding logRetentionRetryOptions seems to have resolved

      logRetention: RetentionDays.ONE_MONTH,
      logRetentionRetryOptions: {
        base: Duration.millis(200),
        maxRetries: 10
      },

jusdino · 2023-08-15T23:04:46Z

I'm also hitting this issue. Here's a stack trace for reference. I think the LogRetention custom resource lambda code just needs to have a reasonable retry policy set:

    at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)
    at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:686:14)
    at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:688:12)
    at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:116:18) {
  code: 'ThrottlingException',
  time: 2023-08-15T22:50:17.215Z,
  requestId: '59263e11-a284-42cb-8b4d-097c4cfe7c40',
  statusCode: 400,
  retryable: true
}

We've got devs running into this - all you have to do is try to deploy an app with a bunch of lambdas, all with logRetention set.

One work-around a team found: They manually reduced the number of lambdas in their stack then deployed several times, introducing only a few new ones each time. It's tedious, but it did get them past the rate limit issue.

jusdino · 2023-08-16T00:45:16Z

Doing a little digging, in case it helps somebody, it looks like the LogRetention function code just pulls retry options from its resource properties, which is also available to specify by the user in aws-lambda.Function via the logRetentionRetryOptions property, though you're only given a maxRetries option, which apparently defaults to 5.

I'm not familiar with the JavaScript aws-sdk, so it's unclear to me if these rate exceeded errors are retried and what the sdk v3 retry strategy looks like, if it does. I don't see any logs in CloudWatch indicating that any retries were done.

aestebance · 2023-08-17T20:02:20Z

If you use "base" option in logRetentionRetryOptions (I use 200 millis) you can deploy without getting "rate exceeded". But, I don´t know why "base" is deprecated. If anyone can explain the reason to deprecate "base" if it is still working and there is no other solution.

jusdino · 2023-08-17T23:13:17Z

I think it was because they migrated from SDK v2 to SDK v3 in the lambda and retry strategies work differently in v3.

See aws/aws-cdk#24485

jaapvanblaaderen · 2023-08-22T10:17:50Z

This issue got worse since CDK 2.90: #26837

mrgrain · 2023-09-01T18:00:35Z

Ha, that is a good catch! You can totally configure the retry mechanism to exceed the lambda time out. I'll get on fixing that!

If you use "base" option in logRetentionRetryOptions (I use 200 millis) you can deploy without getting "rate exceeded". But, I don´t know why "base" is deprecated. If anyone can explain the reason to deprecate "base" if it is still working and there is no other solution.

Hi @aestebance, base has been deprecated because we migrated the code from AWS SDKv2 to AWS SDK3.
The SDKv3 has a number of different retry mechanisms, the default one should be better because it is more intelligent and by taking into account retry budgets. We decided that this is the better experience than simply re-implementing the old retry mechanism. I believe that maxRetries is enough to make any case work. But I'd love to get an understanding if that's not the case!

I'm also hitting this issue. Here's a stack trace for reference. I think the LogRetention custom resource lambda code just needs to have a reasonable retry policy set:

We've got devs running into this - all you have to do is try to deploy an app with a bunch of lambdas, all with logRetention set.

One work-around a team found: They manually reduced the number of lambdas in their stack then deployed several times, introducing only a few new ones each time. It's tedious, but it did get them past the rate limit issue.

@jusdino Thanks for the report and sorry for this. This has been resolved in #26858 and the release for it should be out any time now. You are waiting for v2.94.0. Please let me know if that resolves it for you.

The reason for this regression was also the migration to SDKv3. Basically I've missed that we now need to explicitly check for ThrottlingException.

) We use a custom resource to set the log retention for log groups created by the Lambda service. This custom resource handler code has a built-in retry mechanism to avoid throttling when executing many LogRetention CRs. Users can customize the number of possible retries, potentially retrying for a long time. This can cause the situation that further retries should be attempted, but the Lambda Function timeout is exceeded. The change sets the lambda execution timeout to its maximum value to allow for up to 15 minutes of retries. If the retry budget is exhausted, the handler will throw an error and exit early. Closes #24485 ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*

github-actions · 2023-09-06T02:13:35Z

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

) We use a custom resource to set the log retention for log groups created by the Lambda service. This custom resource handler code has a built-in retry mechanism to avoid throttling when executing many LogRetention CRs. Users can customize the number of possible retries, potentially retrying for a long time. This can cause the situation that further retries should be attempted, but the Lambda Function timeout is exceeded. The change sets the lambda execution timeout to its maximum value to allow for up to 15 minutes of retries. If the retry budget is exhausted, the handler will throw an error and exit early. Closes #24485 ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*

jsauter added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 6, 2023

github-actions bot added the @aws-cdk/custom-resources Related to AWS CDK Custom Resources label Mar 6, 2023

khushail self-assigned this Mar 7, 2023

khushail added p2 needs-reproduction This issue needs reproduction. investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels Mar 7, 2023

khushail removed investigating This issue is being investigated and/or work is in progress to resolve the issue. needs-reproduction This issue needs reproduction. labels Mar 7, 2023

khushail removed their assignment Mar 7, 2023

coderbyheart added a commit to teamstatus/aws-backend that referenced this issue Aug 21, 2023

fix(cdk): circumvent Rate limit exceeded when deploying

115a528

See aws/aws-cdk#24485

jaapvanblaaderen mentioned this issue Aug 22, 2023

CDK deploy: Lambda LogRetention resources fail with rate exceeded errors #26837

Closed

cgarvis added the node18-upgrade Any work (bug, feature) related to Node 18 upgrade label Sep 1, 2023

mrgrain self-assigned this Sep 1, 2023

udaypant removed the node18-upgrade Any work (bug, feature) related to Node 18 upgrade label Sep 1, 2023

mrgrain added the sdk-v3-upgrade Tag issues that are associated to SDK V3 upgrade. Not limited to CR usage of SDK only. label Sep 4, 2023

mrgrain mentioned this issue Sep 4, 2023

fix(logs): log retention custom resource timed out during deploy #26995

Merged

mergify bot closed this as completed in #26995 Sep 6, 2023

Exter-dg mentioned this issue Sep 6, 2024

aws-lambda: Log retention gives rate exceeded error #31338

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

custom_resources: log retention rate limit error during deploy #24485

custom_resources: log retention rate limit error during deploy #24485

jsauter commented Mar 6, 2023

jsauter commented Mar 6, 2023

cgarvis commented Mar 7, 2023

khushail commented Mar 7, 2023

pahud commented Mar 7, 2023 •

edited

Loading

jsauter commented Mar 7, 2023

idwagner commented Mar 9, 2023

dylan-westbury commented Aug 7, 2023 •

edited

Loading

jusdino commented Aug 15, 2023 •

edited

Loading

jusdino commented Aug 16, 2023

aestebance commented Aug 17, 2023

jusdino commented Aug 17, 2023

jaapvanblaaderen commented Aug 22, 2023

mrgrain commented Sep 1, 2023

github-actions bot commented Sep 6, 2023

custom_resources: log retention rate limit error during deploy #24485

custom_resources: log retention rate limit error during deploy #24485

Comments

jsauter commented Mar 6, 2023

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

CDK CLI Version

Framework Version

Node.js Version

OS

Language

Language Version

Other information

jsauter commented Mar 6, 2023

cgarvis commented Mar 7, 2023

khushail commented Mar 7, 2023

pahud commented Mar 7, 2023 • edited Loading

jsauter commented Mar 7, 2023

idwagner commented Mar 9, 2023

dylan-westbury commented Aug 7, 2023 • edited Loading

jusdino commented Aug 15, 2023 • edited Loading

jusdino commented Aug 16, 2023

aestebance commented Aug 17, 2023

jusdino commented Aug 17, 2023

jaapvanblaaderen commented Aug 22, 2023

mrgrain commented Sep 1, 2023

github-actions bot commented Sep 6, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

pahud commented Mar 7, 2023 •

edited

Loading

dylan-westbury commented Aug 7, 2023 •

edited

Loading

jusdino commented Aug 15, 2023 •

edited

Loading