Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry to some abuse tasks #23098

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

eviljeff
Copy link
Member

@eviljeff eviljeff commented Feb 21, 2025

Fixes: mozilla/addons#15359

Description

Adds retry to some tasks (as specified in the issue). Currently the settings are:

  • retry_jitter: True (by default) - all retry delays are variable, with the exact delay between 0 seconds and whatever the retry delay was calculated as.
  • retry_backoff: 30 - the first retry will be (a maximum of) 30 seconds after; the retry after that (a maximum of) 60 seconds, then 120, and so on.
  • retry_backoff_max: 60*60 - the maximum delay between retries will be 1 hour
  • max_retries: 79 - I calculated this with a spreadsheet, to see how many retries would be needed to retry for 72 hours. Though with retry_jitter it will be always less than 72 hours... so could need increasing.

A report of an entity with many relationships can lead to multiple requests - the patch currently excludes those failures from retry as there's no easy way to retry part of a task.

Testing

  • Prevent cinder from being accessible from your local addons-server instance somehow (turn off your wifi?)
  • Take an action that would normally communicate with Cinder, e.g. submitting an abuse report
  • see in logs from addons-server-worker container that the task failed with a request exception
  • wait 30 seconds
  • see another request exception from celery
  • reconnect your connection to Cinder
  • after some amount of time (the 1st retry would be up to 30 seconds after the original attempt; the second up to 60 seconds after the first) the task should succeed
  • verify the abuse report was submitted and we updated the AbuseReport with an fk to a CinderJob (there may not have been a new CinderJob created if that add-on was reported for abuse previously).

Checklist

  • Add #ISSUENUM at the top of your PR to an existing open issue in the mozilla/addons repository.
  • Successfully verified the change locally.
  • The change is covered by automated tests, or otherwise indicated why doing so is unnecessary/impossible.
  • Add before and after screenshots (Only for changes that impact the UI).
  • Add or update relevant docs reflecting the changes made.

@eviljeff eviljeff requested review from a team and KevinMind and removed request for a team February 21, 2025 14:03
@eviljeff eviljeff force-pushed the 15359-retry-cinder-tasks branch from 09be84c to d67e059 Compare February 21, 2025 16:18
@eviljeff eviljeff marked this pull request as ready for review February 21, 2025 16:18
@diox diox self-requested a review February 21, 2025 16:32
'autoretry_for': (requests.RequestException,),
'retry_backoff': 30, # start backoff at 30 seconds
'retry_backoff_max': 60 * 60, # Max out at 1 hour between retries
'retry_kwargs': {'max_retries': 79}, # this works out about 72 hours
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether max_retries should be directly defined in the task kw, or inside retry_kwargs was ambiguous - and I'm not sure we do it consistently in our code either. I went with this way because of the 2nd example in https://docs.celeryq.dev/en/latest/userguide/tasks.html#automatic-retry-for-known-exceptions

@diox
Copy link
Member

diox commented Feb 25, 2025

AFAICT from the logs, the problem with autoretry is that the exceptions before we give retrying are not logged as errors. They are just logged as info. (the severity - lowercase - is 200 on each attempt that is retried, 500 on the one where we give up)

That would render those exceptions completely invisible to us for 72 hours, they wouldn't be sent to Sentry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task]: Retry tasks calling Cinder API automatically
2 participants