Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manually retry for requests.exceptions.Timeout exceptions when sending outgoing webhooks #3632

Merged
merged 8 commits into from
Jan 9, 2024

Conversation

joeyorlando
Copy link
Contributor

@joeyorlando joeyorlando commented Jan 8, 2024

Which issue(s) this PR fixes

Fixes https://github.com/grafana/oncall-private/issues/2439

Checklist

  • Unit, integration, and e2e (if applicable) tests updated
  • Documentation added (or pr:no public docs PR label added if not required)
  • CHANGELOG.md updated (or pr:no changelog PR label added if not required)

exceptions when sending outgoing webhooks
@joeyorlando joeyorlando added the pr:no public docs Added to a PR that does not require public documentation updates label Jan 8, 2024
@joeyorlando joeyorlando requested a review from a team January 8, 2024 20:06
Comment on lines 175 to 197
autoretry_for=(Exception,), retry_backoff=True, max_retries=1 if settings.DEBUG else 3
autoretry_for=(Exception,),
# This allows to exclude some exceptions that match autoretry_for but for which you don’t want a retry.
# https://docs.celeryq.dev/en/stable/userguide/tasks.html#Task.dont_autoretry_for
dont_autoretry_for=(requests.exceptions.Timeout,),
retry_backoff=True,
max_retries=1 if settings.DEBUG else 3,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main change (rest is just adding some type hints)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative here would be to not use dont_autoretry_for and instead manually retry the task if isinstance(exception, requests.exceptions.Timeout) (up to a max of 3 after which point we simply return). This would allow to still retry for these tasks but without having to raise an exception (and avoiding triggering our AmixrRetriedFailedTasks alert)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be preferred to have some limited retry to cover customer network hiccups

@@ -52,15 +66,14 @@ def send_webhook_event(trigger_type, alert_group_id, organization_id=None, user_
).exclude(is_webhook_enabled=False)

for webhook in webhooks_qs:
print(webhook.name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably we don't need this?

autoretry_for=(Exception,),
# This allows to exclude some exceptions that match autoretry_for but for which you don’t want a retry.
# https://docs.celeryq.dev/en/stable/userguide/tasks.html#Task.dont_autoretry_for
dont_autoretry_for=(requests.exceptions.Timeout,),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching this error will catch both requests.exceptions.ConnectTimeout and requests.exceptions.ReadTimeout errors (source code):

  • requests.exceptions.ConnectTimeout: "The request timed out while trying to connect to the remote server"
  • requests.exceptions.ReadTimeout: "The server did not send any data in the allotted amount of time"

both of these seem out of our control

@joeyorlando joeyorlando changed the title dont autoretry for requests.exceptions.Timeout exceptions when sending outgoing webhooks manually retry for requests.exceptions.Timeout exceptions when sending outgoing webhooks Jan 8, 2024
@joeyorlando joeyorlando requested a review from mderynck January 8, 2024 22:06
@joeyorlando joeyorlando enabled auto-merge January 8, 2024 23:48
@joeyorlando joeyorlando disabled auto-merge January 9, 2024 00:13
@joeyorlando joeyorlando merged commit 3bcf5ef into dev Jan 9, 2024
20 of 21 checks passed
@joeyorlando joeyorlando deleted the jorlando/address-outgoing-webhook-exceptions branch January 9, 2024 00:13
iskhakov pushed a commit that referenced this pull request Feb 20, 2024
…ing outgoing webhooks (#3632)

# Which issue(s) this PR fixes

Fixes grafana/oncall-private#2439

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr:no public docs Added to a PR that does not require public documentation updates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants