-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce macro for retry of downloads with additive backoff #1722
Conversation
5864385
to
f73b8c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a basic concept it's quite sane, but before I could approve it I'd be good to know if there're any classes of error return where we should abandon the retry cycle and short-circuit out immediately.
ALso, I'm not certain it needs to be a macro, unless there're other sites you're interested in annotating with it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @kinnison said, I don't think this should be a macro. It looks like it could be a function which takes a closure argument
( $( $call:expr )+ ) => { | ||
{ | ||
let mut ret; | ||
let mut retries = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to be able to configure this somewhere or at least for it to be a const somewhere obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filled #1667 and I'd like this to be configurable. Our main use case is CI, and the number of retries that might make sense for Travis-CI is probably different than for Appveyor, Cirrust-CI, Azure, etc.
Also, 3 retries weren't enough for travis, and now we are using 5 retries: https://github.com/rust-lang/libc/blob/master/ci/build.sh#L31
The backoff time of our retries includes the time it takes for rustup to fail after a download fails, so it might be slightly different from what's done here (but larger I expect), so we'd might need even a larger number of retries here.
I think the following errors should not retry:
Generally I'd say its better to be narrow about retries: retrying can only fix a problem where things are basically ok but interrupted - e.g. UDP/TCP connectivity failures, or process failures at the far end (which show up as variously, name lookup errors, or dropped or stalled connections). Retrying 'just because' is a great way to have very slow failures, rather than very robust code. |
I don't disagree, but when I use That is, I'd rather have slow failures with this PR, than spurious failures which is what we have now. One can always improve the performance of slow failures later, as long as that does not re-introduce spurious failures. |
@gnzlbg Could you provide an indication of the types of failures you're having to retry to get around? E.g. is this things like premature URL closure, or is it things like caching proxies claiming things don't exist when they actually should? |
Not really, IIRC we just get that a download failed. I recall something about the network timing out, don't know if that means that the download did not start, or that it was interrupted, but when using |
These are the errors I see:
Fibre 1Gbps IPv6/v4 dual stack connection; usually takes 2-3 seconds to download the artifacts, so yar. |
Resolves: #1667
Previous behavior: download failures during any installation process were entirely unrecoverable and would result in a critical failure.
New behavior: downloads are retried with an additive backoff delay, where the delay is defined as the sum of the current delay (initialized to zero) and the duration of the failed request per attempt. Three attempts are set by default, after which the normal critical failure will occur if the download is still failing.