Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service/cloudwatchevents: Handling for RemoveTargets failure entries and eventual consistency #11475

Merged
merged 2 commits into from
Jan 14, 2020

Conversation

bflad
Copy link
Contributor

@bflad bflad commented Jan 4, 2020

Community Note

  • Please vote on this pull request by adding a 👍 reaction to the original pull request comment to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for pull request followers and do not help prioritize the request

Closes #1479
Reference: https://docs.aws.amazon.com/eventbridge/latest/userguide/
Reference: https://docs.aws.amazon.com/eventbridge/latest/APIReference/API_RemoveTargets.html

Release note for CHANGELOG:

* resource/aws_cloudwatch_event_rule: Retry deletion on CloudWatch Events Target deletion eventual consistency
* resource/aws_cloudwatch_event_target: Return failed entry error code and message if provided in RemoveTargets response

In practice it appears there is some eventual consistency issues between removing CloudWatch Events Targets and associated CloudWatch Events Rules successfully deleting. This change introduces retry logic into the aws_cloudwatch_event_target resource deletion to potentially mitigate the observed service behavior.

The choice of five minutes for this deletion timeout is arbitrary. The only guidance on the subject is this information in the RemoveTargets API Reference:

When you remove a target, when the associated rule triggers, removed targets might continue to be invoked. Allow a short period of time for changes to take effect.

The RemoveTargets API reference also notes:

This action can partially fail if too many requests are made at the same time. If that happens, FailedEntryCount is non-zero in the response and each entry in FailedEntries provides the ID of the failed target and the error code.

Since it appears failures can occur even with "successful" API calls, we now will return the failures found in the API response, if non-empty, to operators as an error. These error entries should allow additional insight if the resource deletion is unsuccessful due to operator issues or potential temporary conditions that we should further mitigate within the resource itself.

Output from acceptance testing:

--- PASS: TestAccAWSCloudWatchEventRule_basic (31.84s)
--- PASS: TestAccAWSCloudWatchEventRule_description (30.48s)
--- PASS: TestAccAWSCloudWatchEventRule_IsEnabled (43.99s)
--- PASS: TestAccAWSCloudWatchEventRule_pattern (31.63s)
--- PASS: TestAccAWSCloudWatchEventRule_prefix (17.09s)
--- PASS: TestAccAWSCloudWatchEventRule_role (31.03s)
--- PASS: TestAccAWSCloudWatchEventRule_tags (43.22s)

--- PASS: TestAccAWSCloudWatchEventTarget_basic (34.50s)
--- PASS: TestAccAWSCloudWatchEventTarget_batch (99.43s)
--- PASS: TestAccAWSCloudWatchEventTarget_ecs (32.80s)
--- PASS: TestAccAWSCloudWatchEventTarget_ecsWithBlankTaskCount (33.34s)
--- PASS: TestAccAWSCloudWatchEventTarget_full (75.19s)
--- PASS: TestAccAWSCloudWatchEventTarget_input_transformer (43.14s)
--- PASS: TestAccAWSCloudWatchEventTarget_kinesis (75.03s)
--- PASS: TestAccAWSCloudWatchEventTarget_missingTargetId (19.97s)
--- PASS: TestAccAWSCloudWatchEventTarget_sqs (20.25s)
--- PASS: TestAccAWSCloudWatchEventTarget_ssmDocument (20.81s)

bflad added 2 commits January 3, 2020 19:15
…t Target deletion eventual consistency

Reference: #1479
Reference: https://docs.aws.amazon.com/eventbridge/latest/userguide/
Reference: https://docs.aws.amazon.com/eventbridge/latest/APIReference/API_RemoveTargets.html

In practice it appears there is some eventual consistency issues between removing CloudWatch Events Targets and associated CloudWatch Events Rules successfully deleting. This change introduces retry logic into the aws_cloudwatch_event_target resource deletion to potentially mitigate the observed service behavior.

The choice of five minutes for this deletion timeout is arbitrary. The only guidance on the subject is this information in the RemoveTargets API Reference:

> When you remove a target, when the associated rule triggers, removed targets might continue to be invoked. Allow a short period of time for changes to take effect.

Output from acceptance testing:

```
--- PASS: TestAccAWSCloudWatchEventRule_basic (31.84s)
--- PASS: TestAccAWSCloudWatchEventRule_description (30.48s)
--- PASS: TestAccAWSCloudWatchEventRule_IsEnabled (43.99s)
--- PASS: TestAccAWSCloudWatchEventRule_pattern (31.63s)
--- PASS: TestAccAWSCloudWatchEventRule_prefix (17.09s)
--- PASS: TestAccAWSCloudWatchEventRule_role (31.03s)
--- PASS: TestAccAWSCloudWatchEventRule_tags (43.22s)

--- PASS: TestAccAWSCloudWatchEventTarget_basic (36.73s)
--- PASS: TestAccAWSCloudWatchEventTarget_batch (93.38s)
--- PASS: TestAccAWSCloudWatchEventTarget_ecs (33.86s)
--- PASS: TestAccAWSCloudWatchEventTarget_ecsWithBlankTaskCount (34.94s)
--- PASS: TestAccAWSCloudWatchEventTarget_full (79.49s)
--- PASS: TestAccAWSCloudWatchEventTarget_input_transformer (39.77s)
--- PASS: TestAccAWSCloudWatchEventTarget_kinesis (79.14s)
--- PASS: TestAccAWSCloudWatchEventTarget_missingTargetId (20.70s)
--- PASS: TestAccAWSCloudWatchEventTarget_sqs (21.00s)
--- PASS: TestAccAWSCloudWatchEventTarget_ssmDocument (22.21s)
```
…and message if provided in RemoveTargets response

Reference: #1479
Reference: https://docs.aws.amazon.com/eventbridge/latest/APIReference/API_RemoveTargets.html

The RemoveTargets API notes:

> This action can partially fail if too many requests are made at the same time. If that happens, `FailedEntryCount` is non-zero in the response and each entry in `FailedEntries` provides the ID of the failed target and the error code.

Since it appears failures can occur even with "successful" API calls, we now will return the failures found in the API response, if non-empty, to operators as an error. These error entries should allow additional insight if the resource deletion is unsuccessful due to operator issues or potential temporary conditions that we should further mitigate within the resource itself.

Output from acceptance testing:

```
--- PASS: TestAccAWSCloudWatchEventTarget_basic (34.50s)
--- PASS: TestAccAWSCloudWatchEventTarget_batch (99.43s)
--- PASS: TestAccAWSCloudWatchEventTarget_ecs (32.80s)
--- PASS: TestAccAWSCloudWatchEventTarget_ecsWithBlankTaskCount (33.34s)
--- PASS: TestAccAWSCloudWatchEventTarget_full (75.19s)
--- PASS: TestAccAWSCloudWatchEventTarget_input_transformer (43.14s)
--- PASS: TestAccAWSCloudWatchEventTarget_kinesis (75.03s)
--- PASS: TestAccAWSCloudWatchEventTarget_missingTargetId (19.97s)
--- PASS: TestAccAWSCloudWatchEventTarget_sqs (20.25s)
--- PASS: TestAccAWSCloudWatchEventTarget_ssmDocument (20.81s)
```
@bflad bflad added the bug Addresses a defect in current functionality. label Jan 4, 2020
@bflad bflad requested a review from a team January 4, 2020 00:50
@ghost ghost added size/S Managed by automation to categorize the size of a PR. needs-triage Waiting for first response or review from a maintainer. service/cloudwatchevents labels Jan 4, 2020
@bflad bflad removed the needs-triage Waiting for first response or review from a maintainer. label Jan 4, 2020
@rsareth
Copy link

rsareth commented Jan 7, 2020

Please, merge it. I'm facing the issue right now ! I can't delete the cloudwatch event because of the exisiting target.

Copy link
Contributor

@ryndaniels ryndaniels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

--- PASS: TestAccAWSCloudWatchEventRule_prefix (14.68s)
--- PASS: TestAccAWSCloudWatchEventTarget_sqs (15.46s)
--- PASS: TestAccAWSCloudWatchEventTarget_missingTargetId (15.88s)
--- PASS: TestAccAWSCloudWatchEventTarget_ecs (18.23s)
--- PASS: TestAccAWSCloudWatchEventTarget_ecsWithBlankTaskCount (18.56s)
--- PASS: TestAccAWSCloudWatchEventTarget_ssmDocument (18.87s)
--- PASS: TestAccAWSCloudWatchEventRule_pattern (20.04s)
--- PASS: TestAccAWSCloudWatchEventRule_description (20.85s)
--- PASS: TestAccAWSCloudWatchEventTarget_basic (20.98s)
--- PASS: TestAccAWSCloudWatchEventRule_basic (21.30s)
--- PASS: TestAccAWSCloudWatchEventRule_IsEnabled (25.34s)
--- PASS: TestAccAWSCloudWatchEventRule_tags (26.01s)
--- PASS: TestAccAWSCloudWatchEventRule_role (29.76s)
--- PASS: TestAccAWSCloudWatchEventTarget_input_transformer (35.04s)
--- PASS: TestAccAWSCloudWatchEventTarget_kinesis (63.12s)
--- PASS: TestAccAWSCloudWatchEventTarget_full (63.76s)
--- PASS: TestAccAWSCloudWatchEventTarget_batch (80.07s)

@bflad bflad added this to the v2.45.0 milestone Jan 14, 2020
@bflad bflad merged commit 72633db into master Jan 14, 2020
@bflad bflad deleted the b-aws_cloudwatch_event_rule-deletion-retries branch January 14, 2020 15:14
bflad added a commit that referenced this pull request Jan 14, 2020
@ghost
Copy link

ghost commented Jan 17, 2020

This has been released in version 2.45.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

@ghost
Copy link

ghost commented Mar 27, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Mar 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. size/S Managed by automation to categorize the size of a PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error-deleting-CloudWatch-Rule
3 participants