fix(core): fix rate limit errors when deploying cloudwatch log groups #822

xuetinp · 2022-09-20T06:02:55Z

Problem #807

When the RFDK creates a CloudWatch Log group it does it via the CDK's LogRetention construct. The CloudWatch Logs APIs that the LogRetention construct calls have a very low TPS limit (5 TPS). The result is that when deploying multiple RFDK constructs one can encounter rate limit errors in the deployment -- this is particularly seen, randomly, with the RFDK's integration tests.

Solution

CDK's LogRetention construct provides a way - logRetentionRetryOptions to configure the Javascript SDK that it's using to make the CloudWatch API calls. Using this, the RFDK should set higher max retries and retry delay. Currently default backoff retry base is 100ms and max retry count is 3, proposed value is 200ms and 7. This is a good combination as it achieves a lot of retries while the total function runtime is not too long.

Added retry option to all places where new LogRetention is created.
Extended log retention function timeout. Referencing https://github.com/aws/aws-cdk/blob/205e493e7bd6c5212f0ae374fdee28128ea49afe/packages/%40aws-cdk/aws-logs/lib/log-retention.ts#L122-L130

Testing

Created a CloudFormation app that iteratively initialize LogRetention constructs. Wrote a script to deploy constructs in parallel to reproduce the rate limit error. Verified proposed values for max retry count and retry delay prevent the error.
Added unit test to verify template is modified as expected.
Build success.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

ddneilson

This is perfect. Thank you!

packages/aws-rfdk/lib/core/lib/exporting-log-group.ts

packages/aws-rfdk/lib/core/lib/log-group-factory.ts

jusiskin

Great work. I have two minor suggestions to improve this.

packages/aws-rfdk/lib/core/lib/exporting-log-group.ts

jericht

LGTM!

fix(core): fix rate limit errors when deploying cloudwatch log groups

c27797b

xuetinp changed the title ~~fix(core): fix rate limit errors when deploying cloudwatch log groups #807~~ fix(core): fix rate limit errors when deploying cloudwatch log groups Sep 20, 2022

ddneilson self-requested a review September 20, 2022 16:04

ddneilson previously approved these changes Sep 20, 2022

View reviewed changes

packages/aws-rfdk/lib/core/lib/exporting-log-group.ts Show resolved Hide resolved

jericht reviewed Sep 20, 2022

View reviewed changes

packages/aws-rfdk/lib/core/lib/exporting-log-group.ts Outdated Show resolved Hide resolved

packages/aws-rfdk/lib/core/lib/log-group-factory.ts Outdated Show resolved Hide resolved

jusiskin requested changes Sep 20, 2022

View reviewed changes

packages/aws-rfdk/lib/core/lib/exporting-log-group.ts Outdated Show resolved Hide resolved

packages/aws-rfdk/lib/core/lib/exporting-log-group.ts Outdated Show resolved Hide resolved

fix(core): fix rate limit errors when deploying cloudwatch log groups

c5f4fb2

xuetinp dismissed ddneilson’s stale review via c5f4fb2 September 20, 2022 19:01

fix merge conflict

aa75e30

jusiskin requested changes Sep 20, 2022

View reviewed changes

packages/aws-rfdk/lib/core/lib/exporting-log-group.ts Outdated Show resolved Hide resolved

packages/aws-rfdk/lib/core/lib/exporting-log-group.ts Outdated Show resolved Hide resolved

minor fixes

320a3a8

jericht approved these changes Sep 20, 2022

View reviewed changes

ddneilson approved these changes Sep 21, 2022

View reviewed changes

jusiskin approved these changes Sep 21, 2022

View reviewed changes

jusiskin merged commit 38df77f into aws:mainline Sep 21, 2022

xuetinp mentioned this pull request Sep 27, 2022

fix(integ): fix rate limit errors when deploying cloudwatch log group… #827

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): fix rate limit errors when deploying cloudwatch log groups #822

fix(core): fix rate limit errors when deploying cloudwatch log groups #822

xuetinp commented Sep 20, 2022 •

edited

Loading

ddneilson left a comment

jusiskin left a comment

jericht left a comment

fix(core): fix rate limit errors when deploying cloudwatch log groups #822

fix(core): fix rate limit errors when deploying cloudwatch log groups #822

Conversation

xuetinp commented Sep 20, 2022 • edited Loading

Problem #807

Solution

Testing

ddneilson left a comment

Choose a reason for hiding this comment

jusiskin left a comment

Choose a reason for hiding this comment

jericht left a comment

Choose a reason for hiding this comment

xuetinp commented Sep 20, 2022 •

edited

Loading