Introduce macro for retry of downloads with additive backoff #1722

das-sein · 2019-03-20T14:51:47Z

Resolves: #1667

Previous behavior: download failures during any installation process were entirely unrecoverable and would result in a critical failure.

New behavior: downloads are retried with an additive backoff delay, where the delay is defined as the sum of the current delay (initialized to zero) and the duration of the failed request per attempt. Three attempts are set by default, after which the normal critical failure will occur if the download is still failing.

kinnison

As a basic concept it's quite sane, but before I could approve it I'd be good to know if there're any classes of error return where we should abandon the retry cycle and short-circuit out immediately.

kinnison · 2019-03-21T19:47:58Z

ALso, I'm not certain it needs to be a macro, unless there're other sites you're interested in annotating with it?

nrc

As @kinnison said, I don't think this should be a macro. It looks like it could be a function which takes a closure argument

nrc · 2019-04-02T00:01:27Z

src/dist/manifestation.rs

+    ( $( $call:expr )+ ) => {
+        {
+            let mut ret;
+            let mut retries = 3;


Might be good to be able to configure this somewhere or at least for it to be a const somewhere obvious.

I filled #1667 and I'd like this to be configurable. Our main use case is CI, and the number of retries that might make sense for Travis-CI is probably different than for Appveyor, Cirrust-CI, Azure, etc.

Also, 3 retries weren't enough for travis, and now we are using 5 retries: https://github.com/rust-lang/libc/blob/master/ci/build.sh#L31

The backoff time of our retries includes the time it takes for rustup to fail after a download fails, so it might be slightly different from what's done here (but larger I expect), so we'd might need even a larger number of retries here.

rbtcollins · 2019-04-23T20:40:10Z

I think the following errors should not retry:

failure to write the file (permissions, disk space)
40x class HTTP errors : should only be retried with appropriate client side corrections (e.g. add the right credentials then retry).

Generally I'd say its better to be narrow about retries: retrying can only fix a problem where things are basically ok but interrupted - e.g. UDP/TCP connectivity failures, or process failures at the far end (which show up as variously, name lookup errors, or dropped or stalled connections). Retrying 'just because' is a great way to have very slow failures, rather than very robust code.

gnzlbg · 2019-05-23T10:40:59Z

Retrying 'just because' is a great way to have very slow failures, rather than very robust code.

I don't disagree, but when I use rustup I expect it to succeed and performance in the failure path is not really important for my use cases.

That is, I'd rather have slow failures with this PR, than spurious failures which is what we have now. One can always improve the performance of slow failures later, as long as that does not re-introduce spurious failures.

kinnison · 2019-05-23T15:34:56Z

@gnzlbg Could you provide an indication of the types of failures you're having to retry to get around? E.g. is this things like premature URL closure, or is it things like caching proxies claiming things don't exist when they actually should?

gnzlbg · 2019-05-23T20:28:01Z

Not really, IIRC we just get that a download failed. I recall something about the network timing out, don't know if that means that the download did not start, or that it was interrupted, but when using curl directly, we see these two cases every now and then. We download our Android tools with curl, and I just bumped our retry for them today to 20 retries because travis-CI was having a bad day and 10 wasn't apparently enough and that fixed the issue =/

rbtcollins · 2019-05-23T21:09:23Z

These are the errors I see:

rustup toolchain install nightly
info: syncing channel updates for 'nightly-x86_64-pc-windows-msvc'
info: latest update on 2019-05-23, rust version 1.36.0-nightly (37ff5d388 2019-05-22)
info: downloading component 'rustc'
 41.1 MiB /  60.6 MiB ( 68 %)   1.9 MiB/s in  1m  1s ETA: 10serror: component download failed for rustc-x86_64-pc-windows-msvc
info: caused by: could not download file from 'https://static.rust-lang.org/dist/2019-05-23/rustc-nightly-x86_64-pc-windows-msvc.tar.xz' to 'C:\Users\robertc\.rustup\downloads\ac379c4ba3b9faf074b7e9684c430d95ceb07053756f78b3c6f31d39d5d8934f.partial'
info: caused by: error reading from socket
info: caused by: timed out

Fibre 1Gbps IPv6/v4 dual stack connection; usually takes 2-3 seconds to download the artifacts, so yar.

Introduce macro for retry of downloads with additive backoff

f73b8c2

das-sein force-pushed the retry-failed-downloads branch from 5864385 to f73b8c2 Compare March 20, 2019 17:53

kinnison reviewed Mar 21, 2019

View reviewed changes

nrc reviewed Apr 2, 2019

View reviewed changes

kinnison mentioned this pull request Nov 11, 2019

Kinnison/retry downloads #2121

Merged

kinnison closed this in #2121 Nov 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce macro for retry of downloads with additive backoff #1722

Introduce macro for retry of downloads with additive backoff #1722

das-sein commented Mar 20, 2019 •

edited

Loading

kinnison left a comment

kinnison commented Mar 21, 2019

nrc left a comment

nrc Apr 2, 2019

gnzlbg May 23, 2019

rbtcollins commented Apr 23, 2019

gnzlbg commented May 23, 2019 •

edited

Loading

kinnison commented May 23, 2019

gnzlbg commented May 23, 2019

rbtcollins commented May 23, 2019

Introduce macro for retry of downloads with additive backoff #1722

Introduce macro for retry of downloads with additive backoff #1722

Conversation

das-sein commented Mar 20, 2019 • edited Loading

kinnison left a comment

Choose a reason for hiding this comment

kinnison commented Mar 21, 2019

nrc left a comment

Choose a reason for hiding this comment

nrc Apr 2, 2019

Choose a reason for hiding this comment

gnzlbg May 23, 2019

Choose a reason for hiding this comment

rbtcollins commented Apr 23, 2019

gnzlbg commented May 23, 2019 • edited Loading

kinnison commented May 23, 2019

gnzlbg commented May 23, 2019

rbtcollins commented May 23, 2019

das-sein commented Mar 20, 2019 •

edited

Loading

gnzlbg commented May 23, 2019 •

edited

Loading