Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of Jira's retry-after handling #1825

Merged

Conversation

matthias-bach-by
Copy link
Contributor

As reported in #1805, while 3.6.0 brought a beneficial change in evaluating Jira's retry-after header, in some scenarios it actually degraded the success rate of requests when hitting rate-limits due to two issues:

  1. Jira intends the retry-after header to be a minimum backoff but the code treats it as a maximum backoff, and
  2. Jira sometimes sends a retry-after value of 0 seconds which the existing code implicitly treated as a failed request.

To address this, this PR changes the back-off behaviour to treat the retry-after time as a minimum backoff, at least as long as our maximum back-off time does allow this, and clips the retry-after header, if specified, to a minimum of one second.

I have extended the existing tests for the retry logic to assert this behaviour. If you would prefer a separate set of tests or see other changes to the approach, let me know and I'll adjust the PR.

The time Jira sends in the Retry-After header is the minimum time Jira wants us to wait before retrying our request. However, the former implementation used this as a maximum waiting time for the next request. In result, there was a chance that we reached three retries without reaching the time that Jira expected us to wait and our request would fail.

This implementation does also affect the other retry cases, as while previously we jittered our backoff between 0 and the target backoff, we now only jitter between 50% and 100% of the target backoff. However, this should still protect us from thundering herds and safes us from introducing a new minimum backoff variable for the retry-after case.

This solves one of the issues reported in pycontribs#1805.
When rejecting request with a 429 response, Jira sometimes sends a Retry-after header asking for a backoff of 0 seconds. With the existing retry logic this would mark the request as non-retryable and thus fail the request. With this change, such requests are treated as if Jira had send a retry-after value of 1 second.

This solves one of the issues reported in pycontribs#1805.
@matthias-bach-by
Copy link
Contributor Author

The cloud tests are failing in set-up and this seems to be a wider issue unrelated to my changes as I am also seeing that on other PRs. Happy to rebase once that is fixed on main.

@adehad adehad linked an issue Mar 21, 2024 that may be closed by this pull request
4 tasks
@adehad adehad merged commit 4999a76 into pycontribs:main Mar 21, 2024
10 of 12 checks passed
@adehad
Copy link
Contributor

adehad commented Mar 21, 2024

Thanks for the excellent PR and sorry for the delay in getting in merged and released!

@matthias-bach-by matthias-bach-by deleted the improve-rate-limit-resilience branch March 21, 2024 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression of 429 response handling in 3.6.0
2 participants