Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(logs): log retention fails with OperationAbortedException #16083

Merged
merged 4 commits into from
Sep 10, 2021
Merged

fix(logs): log retention fails with OperationAbortedException #16083

merged 4 commits into from
Sep 10, 2021

Conversation

ddl-denis-parnovskiy
Copy link
Contributor

@ddl-denis-parnovskiy ddl-denis-parnovskiy commented Aug 17, 2021

Fixes: #15709

When creating a lambda with log retention, CDK actually creates 2 lambda functions. The second lambda function alters log retention of the log group of the first lambda and the retention of its own log group.

Because log group creation is asynchronous, the log retention lambda tries to pre-create both log groups to guarantee it has an object to work on.

If a normal lambda execution also creates the related log group at the same time, an "OperationAbortedException:... Please retry" error is returned.

The existing code handles this situation for log retention lambda but not for the first lambda.

This fix adds the retry pattern to the general log group creation code.

Also existing code had a bug: if OperationAbortedException is hit, the error is hidden but the retention policy is skipped and not actually applied. This fix addresses this bug as well.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

Fixes #15709

Include OperationAbortedException into retryableErrors to handle race of "another create operation is in progress" when the target lambda creates the log group and this function also tries to create the same log group at the same time.
@gitpod-io
Copy link

gitpod-io bot commented Aug 17, 2021

@mergify
Copy link
Contributor

mergify bot commented Aug 17, 2021

Title does not follow the guidelines of Conventional Commits. Please adjust title before merge.

try {
const region = process.env.AWS_REGION;
await createLogGroupSafe(`/aws/lambda/${context.functionName}`, region, retryOptions);
await setRetentionPolicy(`/aws/lambda/${context.functionName}`, region, retryOptions, 1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See in the old code, if CreateLogGroupSafe throws an error, setRetentionPolicy is skipped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this?
This behaviour is the same in your updated code as well, i.e., if createLogGroupSafe() throws an error setRetentionPolicy() is skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old code:
CreateLogGroupSafe encounters OperationAbortedException, throws exception, setRetentionPolicy is skipped, the error is eaten by catch all.
Result: retention policy for this function is not set and no error. This is a bug in the old code I'm talking about.

New code:
CreateLogGroupSafe encounters OperationAbortedException, retries, if really fails after several attempts, then the error is floated up the stack.
Result: either log group is created with limited retention, or an exception is thrown.

@ddl-denis-parnovskiy ddl-denis-parnovskiy changed the title Fix for aws#15709 race when creating log groups fix: aws#15709 race when creating log groups Aug 17, 2021
@peterwoodworth peterwoodworth changed the title fix: aws#15709 race when creating log groups fix: race when creating log groups Aug 19, 2021
@peterwoodworth peterwoodworth added @aws-cdk/aws-logs Related to Amazon CloudWatch Logs effort/small Small work item – less than a day of effort p1 labels Aug 19, 2021
@peterwoodworth peterwoodworth changed the title fix: race when creating log groups fix(logs): race when creating log groups Aug 19, 2021
nija-at
nija-at previously requested changes Sep 6, 2021
Copy link
Contributor

@nija-at nija-at left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this pull request. Please see some of my comments below.

try {
const region = process.env.AWS_REGION;
await createLogGroupSafe(`/aws/lambda/${context.functionName}`, region, retryOptions);
await setRetentionPolicy(`/aws/lambda/${context.functionName}`, region, retryOptions, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this?
This behaviour is the same in your updated code as well, i.e., if createLogGroupSafe() throws an error setRetentionPolicy() is skipped.

@nija-at nija-at assigned nija-at and unassigned rix0rrr Sep 6, 2021
@nija-at nija-at changed the title fix(logs): race when creating log groups fix(logs): log retention fails with OperationAbortedException Sep 6, 2021
Fixes #15709

Include OperationAbortedException into retryableErrors to handle race of "another create operation is in progress" when the target lambda creates the log group and this function also tries to create the same log group at the same time.
@mergify mergify bot dismissed nija-at’s stale review September 8, 2021 17:52

Pull request has been modified.

Copy link
Contributor

@nija-at nija-at left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes. Looks good to me.

@mergify
Copy link
Contributor

mergify bot commented Sep 10, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject89A8053A-LhjRyN9kxr8o
  • Commit ID: 64d2bb7
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify mergify bot merged commit 3e9f04d into aws:master Sep 10, 2021
@mergify
Copy link
Contributor

mergify bot commented Sep 10, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@ddl-denis-parnovskiy ddl-denis-parnovskiy deleted the fix-15709 branch September 11, 2021 00:20
mergify bot pushed a commit that referenced this pull request Dec 10, 2021
Fixes: #17546

This adds to the fix in #16083 that was addressing the issue where the LogRetention Lambda can be executed concurrently and create a race condition where multiple invocations are trying to create or modify the same log group.

The previous fix addressed the issue if it occurred during log group creation, in the `createLogGroupSafe` method, but did not account for the same problem happening when modifying a log group's retention period in the `setRetentionPolicy` method. This fix applies the same logic from the last fix to the other method.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
TikiTDO pushed a commit to TikiTDO/aws-cdk that referenced this pull request Feb 21, 2022
)

Fixes: aws#17546

This adds to the fix in aws#16083 that was addressing the issue where the LogRetention Lambda can be executed concurrently and create a race condition where multiple invocations are trying to create or modify the same log group.

The previous fix addressed the issue if it occurred during log group creation, in the `createLogGroupSafe` method, but did not account for the same problem happening when modifying a log group's retention period in the `setRetentionPolicy` method. This fix applies the same logic from the last fix to the other method.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-logs Related to Amazon CloudWatch Logs effort/small Small work item – less than a day of effort p1
Projects
None yet
5 participants