-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(url_helper): Retry on 503 error #5578
Conversation
If the server is busy, no need to fail. Add type hints to adjascent code paths. Fixes canonicalGH-5577
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the inline comment
cloudinit/url_helper.py
Outdated
"IMDS returned 503 error code. Retrying in %s", | ||
current_sleep_time, | ||
) | ||
elif isinstance(response, UrlResponse): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the old code was making some bad assumptions...particularly that response
is a UrlResponse
even though it could also be UrlError
if an error occurred. It also looks like the callers of this code aren't really doing anything to handle exceptions. Previously if we did get a UrlError, the old response.contents
would throw an error due to contents
not being a member of UrlError. It was raising the wrong error obviously, but it still wound up having the same effect as the requests' raise_for_status()
.
With this though, if we get an error that isn't a 503, we'll retry until we timeout, even if it's a non-recoverable error. I think we should instead raise the UrlError if it's found and not a 503.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks for catching that @TheRealFalcon!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the old code was making some bad assumptions...particularly that response is a UrlResponse even though it could also be UrlError if an error occurred.
...
Previously if we did get a UrlError, the old response.contents would throw an error due to contents not being a member of UrlError. It was raising the wrong error obviously, but it still wound up having the same effect as the requests' raise_for_status().
This isn't correct. Previously, only UrlResponse was returned. See the change on line 755.
It also looks like the callers of this code aren't really doing anything to handle exceptions.
Exceptions are optionally handled by the function passed in via exception_cb
. Note the line in the original bug report:
2024-07-24 08:13:06,801 - DataSourceEc2.py[WARNING]: Fatal error while requesting Ec2 IMDSv2 API tokens
Looking into this a bit further, I think that we actually need to make a change to the function that caused this log to allow 503 exceptions to retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @holmanb, for fixing this.
503 responses are recommended to contain a Retry-After header containing an estimated time for the recovery of the service.
Do we want to be good citizens and build logic to respect that recommended wait-time if present and not contribute to the server overload?
Is it also possible to get unit tests showing contents return correctly on 200, retry happens on 503, and exception is raised on anything else? I'm guessing one or two of those already exist, but we should test the 503 case. |
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.) |
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.) |
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.) |
If the server is busy, no need to fail.
Add type hints to adjascent code paths.
Fixes GH-5577
Proposed Commit Message
Additional Context
Fixes #5577
Test Steps
Merge type