fix(core): fix rate limit errors when deploying cloudwatch log groups #822
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem #807
When the RFDK creates a CloudWatch Log group it does it via the CDK's LogRetention construct. The CloudWatch Logs APIs that the LogRetention construct calls have a very low TPS limit (5 TPS). The result is that when deploying multiple RFDK constructs one can encounter rate limit errors in the deployment -- this is particularly seen, randomly, with the RFDK's integration tests.
Solution
CDK's LogRetention construct provides a way - logRetentionRetryOptions to configure the Javascript SDK that it's using to make the CloudWatch API calls. Using this, the RFDK should set higher max retries and retry delay. Currently default backoff retry base is 100ms and max retry count is 3, proposed value is 200ms and 7. This is a good combination as it achieves a lot of retries while the total function runtime is not too long.
Testing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license