-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBFS API 429 retries: Fix bug on computing the time until the next retry #326
DBFS API 429 retries: Fix bug on computing the time until the next retry #326
Conversation
This reverts commit 23cdb52.
… Small additionaly improvements.
@@ -113,7 +110,6 @@ def wrapped_function(*args, **kwargs): | |||
return func(*args, **kwargs) | |||
except HTTPError as e: | |||
if e.response.status_code == 429: | |||
click.echo("Rate limit exceeded. Retrying with exponential backoff.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have the following logs, no need for this extra one.
Received 429 REQUEST_LIMIT_EXCEEDED for attempt 1. Retrying in 0.91 seconds.
Received 429 REQUEST_LIMIT_EXCEEDED for attempt 2. Retrying in 0.12 seconds.
Received 429 REQUEST_LIMIT_EXCEEDED for attempt 3. Retrying in 2.95 seconds.
Received 429 REQUEST_LIMIT_EXCEEDED for attempt 4. Retrying in 5.58 seconds.
Received 429 REQUEST_LIMIT_EXCEEDED for attempt 5. Retrying in 12.54 seconds.
tests/dbfs/test_api.py
Outdated
@@ -40,6 +40,7 @@ | |||
'file_size': 1 | |||
} | |||
TEST_FILE_INFO = api.FileInfo(TEST_DBFS_PATH, False, 1) | |||
MAX_RETRY_429_ATTEMPTS = 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Consistency with api.py
and keep it as MAX_RETRY_ATTEMPTS
@@ -94,13 +86,21 @@ class DbfsErrorCodes(object): | |||
def before_sleep_on_429(retry_state): | |||
global time_for_last_retry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed now? This is using the global declaration. We can remove this since you're now initializing locally everytime now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need it, since the function is called multiple times and time_for_last_retry
needs to be incremented every time. Otherwise, on attempt 2 time_for_last_retry
will be uninitialised and we'll get: local variable 'time_for_last_retry' referenced before assignment
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah of course! The same logic still holds. Gotcha!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Thanks for doing this.
… next retry (databricks#326)" This reverts commit 37d51b1.
…all databricks-cli operations (#343) Revert retry logic on 429 responses for DBFS API requests (#319, #326, #327) and reimplement it using [urllib3.util.Retry](https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html), for all the requests made by `databricks-cli`. * We perform a maximum number of 6 retries with exponential backoff, resulting in the following delays between them: 0.5, 1, 2, 4, 8, 16 (seconds). The `Retry-After` header is respected if present. * There is no message logged for each retry (I could not find a way to do it using `urllib3.util.Retry`). * If all retry attempts fail, the original 429 http response is forwarded by the retry utility.
…all databricks-cli operations (#343) Revert retry logic on 429 responses for DBFS API requests (databricks/databricks-cli#319, databricks/databricks-cli#326, databricks/databricks-cli#327) and reimplement it using [urllib3.util.Retry](https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html), for all the requests made by `databricks-cli`. * We perform a maximum number of 6 retries with exponential backoff, resulting in the following delays between them: 0.5, 1, 2, 4, 8, 16 (seconds). The `Retry-After` header is respected if present. * There is no message logged for each retry (I could not find a way to do it using `urllib3.util.Retry`). * If all retry attempts fail, the original 429 http response is forwarded by the retry utility.
In #319 we added 429 retries for the DBFS API.
Computing the time until the next retry as
retry_state.idle_for - time_for_last_retry
is not going to work as expected in the case where more than 1 call to functions wrapped withretry_429
happens (e.g. in the case ofput_file
). The problem is thattime_for_last_retry
is only initialised once per databricks-cli command, while it should be initialised every time a function wrapped withretry_429
is called, in the first attempt.We fix this by initialising
time_for_last_retry
ifretry_state.attempt_number == 1
. This only works if the code is single-threaded, which is the case here.Additional improvements:
retry_state.idle_for
.