Regression of 429 response handling in 3.6.0 #1805

matthias-bach-by · 2024-01-26T16:15:08Z

Bug summary

While the improved handling of the retry-after header that got introduced in 3.6.0 via #1713 is an important improvement, it sadly introduced two regressions in the behaviour when handling 429 responses.

While the retry-after value is finally evaluated and not just printed, it is sadly used as an upper bound for the backoff time. Thus, we are almost guaranteed to run into a second 429 response when handling the first one.
The code assumes requests that have a retry-after value of 0 must not be retried. However, Jira seems to frequently provide a retry-after value of 0 on 429 responses.

With the previous plain backoff you had a good chance of backing off enough to happen to evade the rate limiting as long as you weren't running parallel requests. With the new behaviour we are a lot more aggressive and even with a single client easily run into the case of not even retrying.

Is there an existing issue for this?

I have searched the existing issues

Jira Instance type

Jira Server or Data Center (Self-hosted)

Jira instance version

9.12.2

jira-python version

3.6.0

Python Interpreter version

3.12

Which operating systems have you used?

Linux
macOS
Windows

Reproduction steps

# 1. Given a Jira client instance
jira: JIRA
# 2. Running a random cheap operation often enough
for _ in range(100):
    jira.issue('SOME-1')

Stack trace

E           jira.exceptions.JIRAError: JiraError HTTP 429 url: https://jira.example.org/rest/api/2/search?jql=issuekey+in+%28SOME-1&startAt=0&validateQuery=True&fields=issuetype&fields=resolution&fields=status&maxResults=100
E           	text: Rate limit exceeded.
E
E           	response headers = {'Content-Type': 'application/json;charset=UTF-8', …}
E           	response text = {"message":"Rate limit exceeded."}

Expected behaviour

I'd expect the value of the retry-after header to be interpreted as a lower bound for the backup. I.e., doing something along the following:

delay = suggested_delay + self.max_retry_delay * random.random()

Furthermore, we should also retry requests that have a suggested_delay of 0. If that would indicate an error for 503 requests, we might need to make the decision depend on the return code, too.

Additional Context

No response

The text was updated successfully, but these errors were encountered:

The time Jira sends in the Retry-After header is the minimum time Jira wants us to wait before retrying our request. However, the former implementation used this as a maximum waiting time for the next request. In result, there was a chance that we reached three retries without reaching the time that Jira expected us to wait and our request would fail. This implementation does also affect the other retry cases, as while previously we jittered our backoff between 0 and the target backoff, we now only jitter between 50% and 100% of the target backoff. However, this should still protect us from thundering herds and safes us from introducing a new minimum backoff variable for the retry-after case. This solves one of the issues reported in pycontribs#1805.

When rejecting request with a 429 response, Jira sometimes sends a Retry-after header asking for a backoff of 0 seconds. With the existing retry logic this would mark the request as non-retryable and thus fail the request. With this change, such requests are treated as if Jira had send a retry-after value of 1 second. This solves one of the issues reported in pycontribs#1805.

matthias-bach-by mentioned this issue Feb 26, 2024

Improve handling of Jira's retry-after handling #1825

Merged

adehad linked a pull request Mar 21, 2024 that will close this issue

Improve handling of Jira's retry-after handling #1825

Merged

adehad closed this as completed in #1825 Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression of 429 response handling in 3.6.0 #1805

Regression of 429 response handling in 3.6.0 #1805

matthias-bach-by commented Jan 26, 2024

Regression of 429 response handling in 3.6.0 #1805

Regression of 429 response handling in 3.6.0 #1805

Comments

matthias-bach-by commented Jan 26, 2024

Bug summary

Is there an existing issue for this?

Jira Instance type

Jira instance version

jira-python version

Python Interpreter version

Which operating systems have you used?

Reproduction steps

Stack trace

Expected behaviour

Additional Context