-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retries to NetDownload intrinsic. #16798
Add retries to NetDownload intrinsic. #16798
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
}) | ||
}; | ||
|
||
// TODO: Allow the retry strategy to be configurable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine to leave as a TODO I think, assuming the error is enriched.
let retry_strategy = ExponentialBackoff::from_millis(10).map(jitter).take(3); | ||
let response = RetryIf::spawn(retry_strategy, try_download, |err: &(String, bool)| err.1) | ||
.await | ||
.map_err(|(err, _)| err)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned: I think it's totally fine to TODO making it configurable, but the error message should likely be enriched to give some information about the retries: i.e.:
.map_err(|(err, _)| err)?; | |
.map_err(|(err, _)| format!("After {num_attempts} attempts: {err}"))?; |
...so that the next person who comes along has a clear hint of where to go looking if they want to add configurability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this would be nice but I'm not sure we have access to the true num_attempts
(I think it's hidden away inside the RetryIf::spawn
implementation) 🤔 if we hit a 4xx error we won't retry at all.
Instead of modifying this error, what do you think of adding logging to the try_download
branches? So we can log something like "hit 5xx error, retrying" or "hit 4xx error, not retrying"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like RetryIf::spawn
only cares that the 1st arg implements IntoIterator
with Duration
items - I could write a thin wrapper around ExponentialBackoff
that counts/logs the retry attempts as part of next()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this would be nice but I'm not sure we have access to the true
num_attempts
(I think it's hidden away inside theRetryIf::spawn
implementation) 🤔 if we hit a 4xx error we won't retry at all.
I just meant extracting the constant "3
" (or 4
) from the code above, and then using it in two places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooooh. I see what you meant here. Was missing the RetryIf
aspect of this. Hm, it's a bit awkward to lose the conditional retry for client errors... between the two, not reporting the number of retries would be preferable? Sorry for the pivot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, updated
@stuhood comments addressed! |
Interesting... two timeouts in a row for EDIT: No repro... 120s the first time, and 64s the second. Bumping the timeout. |
Wow... it seems that the failure in |
Mm, @Eric-Arellano is seeing this in #16795 as well, so it seems like it might be a network issue independent of this PR. Regardless: have opened #16841 to preserve stdio. |
For now, hard-code a retry strategy of 10ms, 100ms, 1000ms. [ci skip-build-wheels]
[ci skip-build-wheels]
This reverts commit beec08e. [ci skip-rust] [ci skip-build-wheels]
[ci skip-build-wheels]
5c31558
to
4fb2b42
Compare
@stuhood this was targeted to 2.13, but should probably be only for 2.14 instead? |
Closes pantsbuild#6818 For now, hard-code a retry strategy of 10ms, 100ms, 1s, 10s. [ci skip-build-wheels]
Possibly. I felt that it might qualify as a bugfix. @danxmoran : Will you want this in |
It would be nice, we do hit flakiness in CI semi-regularly that this would address. That said, if it'd be awhile until a 2.13.1 is released then it might not be worth the time - I'm currently working through getting us onto v2.14 |
It doesn't need to be a long time, particularly if you can be a squeaky wheel and encourage us to do more frequent 2.13 releases :) I'll get out a 2.13 release today with it |
Closes pantsbuild#6818 For now, hard-code a retry strategy of 10ms, 100ms, 1s, 10s. [ci skip-build-wheels]
…sbuild#17298) As reported in pantsbuild#17294, if an HTTP stream is interrupted after it has opened, the retry that was added in pantsbuild#16798 won't kick in. This change moves the retry up a level to wrap the entire download attempt, and adds a test of recovering from "post-header" errors. Fixes pantsbuild#17294. [ci skip-build-wheels]
…sbuild#17298) As reported in pantsbuild#17294, if an HTTP stream is interrupted after it has opened, the retry that was added in pantsbuild#16798 won't kick in. This change moves the retry up a level to wrap the entire download attempt, and adds a test of recovering from "post-header" errors. Fixes pantsbuild#17294. [ci skip-build-wheels]
…sbuild#17298) As reported in pantsbuild#17294, if an HTTP stream is interrupted after it has opened, the retry that was added in pantsbuild#16798 won't kick in. This change moves the retry up a level to wrap the entire download attempt, and adds a test of recovering from "post-header" errors. Fixes pantsbuild#17294. [ci skip-build-wheels]
…sbuild#17298) As reported in pantsbuild#17294, if an HTTP stream is interrupted after it has opened, the retry that was added in pantsbuild#16798 won't kick in. This change moves the retry up a level to wrap the entire download attempt, and adds a test of recovering from "post-header" errors. Fixes pantsbuild#17294. [ci skip-build-wheels]
…ry-pick of #17298) (#17302) As reported in #17294, if an HTTP stream is interrupted after it has opened, the retry that was added in #16798 won't kick in. This change moves the retry up a level to wrap the entire download attempt, and adds a test of recovering from "post-header" errors. Fixes #17294.
Closes #6818
For now, hard-code a retry strategy of 10ms, 100ms, 1s, 10s.
[ci skip-build-wheels]