Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression of 429 response handling in 3.6.0 #1805

Closed
3 of 4 tasks
matthias-bach-by opened this issue Jan 26, 2024 · 0 comments · Fixed by #1825
Closed
3 of 4 tasks

Regression of 429 response handling in 3.6.0 #1805

matthias-bach-by opened this issue Jan 26, 2024 · 0 comments · Fixed by #1825

Comments

@matthias-bach-by
Copy link
Contributor

Bug summary

While the improved handling of the retry-after header that got introduced in 3.6.0 via #1713 is an important improvement, it sadly introduced two regressions in the behaviour when handling 429 responses.

  1. While the retry-after value is finally evaluated and not just printed, it is sadly used as an upper bound for the backoff time. Thus, we are almost guaranteed to run into a second 429 response when handling the first one.
  2. The code assumes requests that have a retry-after value of 0 must not be retried. However, Jira seems to frequently provide a retry-after value of 0 on 429 responses.

With the previous plain backoff you had a good chance of backing off enough to happen to evade the rate limiting as long as you weren't running parallel requests. With the new behaviour we are a lot more aggressive and even with a single client easily run into the case of not even retrying.

Is there an existing issue for this?

  • I have searched the existing issues

Jira Instance type

Jira Server or Data Center (Self-hosted)

Jira instance version

9.12.2

jira-python version

3.6.0

Python Interpreter version

3.12

Which operating systems have you used?

  • Linux
  • macOS
  • Windows

Reproduction steps

# 1. Given a Jira client instance
jira: JIRA
# 2. Running a random cheap operation often enough
for _ in range(100):
    jira.issue('SOME-1')

Stack trace

E           jira.exceptions.JIRAError: JiraError HTTP 429 url: https://jira.example.org/rest/api/2/search?jql=issuekey+in+%28SOME-1&startAt=0&validateQuery=True&fields=issuetype&fields=resolution&fields=status&maxResults=100
E           	text: Rate limit exceeded.
E
E           	response headers = {'Content-Type': 'application/json;charset=UTF-8', …}
E           	response text = {"message":"Rate limit exceeded."}

Expected behaviour

I'd expect the value of the retry-after header to be interpreted as a lower bound for the backup. I.e., doing something along the following:

delay = suggested_delay + self.max_retry_delay * random.random()

Furthermore, we should also retry requests that have a suggested_delay of 0. If that would indicate an error for 503 requests, we might need to make the decision depend on the return code, too.

Additional Context

No response

matthias-bach-by added a commit to matthias-bach-by/jira that referenced this issue Feb 26, 2024
The time Jira sends in the Retry-After header is the minimum time Jira wants us to wait before retrying our request. However, the former implementation used this as a maximum waiting time for the next request. In result, there was a chance that we reached three retries without reaching the time that Jira expected us to wait and our request would fail.

This implementation does also affect the other retry cases, as while previously we jittered our backoff between 0 and the target backoff, we now only jitter between 50% and 100% of the target backoff. However, this should still protect us from thundering herds and safes us from introducing a new minimum backoff variable for the retry-after case.

This solves one of the issues reported in pycontribs#1805.
matthias-bach-by added a commit to matthias-bach-by/jira that referenced this issue Feb 26, 2024
The time Jira sends in the Retry-After header is the minimum time Jira wants us to wait before retrying our request. However, the former implementation used this as a maximum waiting time for the next request. In result, there was a chance that we reached three retries without reaching the time that Jira expected us to wait and our request would fail.

This implementation does also affect the other retry cases, as while previously we jittered our backoff between 0 and the target backoff, we now only jitter between 50% and 100% of the target backoff. However, this should still protect us from thundering herds and safes us from introducing a new minimum backoff variable for the retry-after case.

This solves one of the issues reported in pycontribs#1805.
matthias-bach-by added a commit to matthias-bach-by/jira that referenced this issue Feb 26, 2024
When rejecting request with a 429 response, Jira sometimes sends a Retry-after header asking for a backoff of 0 seconds. With the existing retry logic this would mark the request as non-retryable and thus fail the request. With this change, such requests are treated as if Jira had send a retry-after value of 1 second.

This solves one of the issues reported in pycontribs#1805.
@adehad adehad linked a pull request Mar 21, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant