Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retries to NetDownload intrinsic. #16798

Merged
merged 4 commits into from
Sep 13, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions src/rust/engine/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions src/rust/engine/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ tempfile = "3"
testutil_mock = { package = "mock", path = "testutil/mock" }
time = "0.3"
tokio = { version = "1.16", features = ["macros", "rt-multi-thread"] }
tokio-retry = "0.3"
tokio-util = { version = "0.7", features = ["io"] }
tryfuture = { path = "tryfuture" }
ui = { path = "ui" }
Expand Down
51 changes: 32 additions & 19 deletions src/rust/engine/src/downloads.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ use bytes::{BufMut, Bytes};
use futures::stream::StreamExt;
use humansize::{file_size_opts, FileSize};
use reqwest::Error;
use tokio_retry::strategy::{jitter, ExponentialBackoff};
use tokio_retry::RetryIf;
use url::Url;

use crate::context::Core;
Expand All @@ -26,30 +28,41 @@ struct NetDownload {

impl NetDownload {
async fn start(core: &Arc<Core>, url: Url, file_name: String) -> Result<NetDownload, String> {
// TODO: Retry failures
let response = core
let try_download = || async {
core
.http_client
.get(url.clone())
.send()
.await
.map_err(|err| format!("Error downloading file: {}", err))?;
.map_err(|err| (format!("Error downloading file: {}", err), true))
.and_then(|res|
// Handle common HTTP errors.
if res.status().is_server_error() {
Err((format!(
"Server error ({}) downloading file {} from {}",
res.status().as_str(),
file_name,
url,
), true))
} else if res.status().is_client_error() {
Err((format!(
"Client error ({}) downloading file {} from {}",
res.status().as_str(),
file_name,
url,
), false))
} else {
Ok(res)
})
};

// TODO: Allow the retry strategy to be configurable?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine to leave as a TODO I think, assuming the error is enriched.

// For now we retry after 10ms, 100ms, 1s, and 10s.
let retry_strategy = ExponentialBackoff::from_millis(10).map(jitter).take(4);
let response = RetryIf::spawn(retry_strategy, try_download, |err: &(String, bool)| err.1)
.await
.map_err(|(err, _)| err)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned: I think it's totally fine to TODO making it configurable, but the error message should likely be enriched to give some information about the retries: i.e.:

Suggested change
.map_err(|(err, _)| err)?;
.map_err(|(err, _)| format!("After {num_attempts} attempts: {err}"))?;

...so that the next person who comes along has a clear hint of where to go looking if they want to add configurability.

Copy link
Contributor Author

@danxmoran danxmoran Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this would be nice but I'm not sure we have access to the true num_attempts (I think it's hidden away inside the RetryIf::spawn implementation) 🤔 if we hit a 4xx error we won't retry at all.

Instead of modifying this error, what do you think of adding logging to the try_download branches? So we can log something like "hit 5xx error, retrying" or "hit 4xx error, not retrying"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like RetryIf::spawn only cares that the 1st arg implements IntoIterator with Duration items - I could write a thin wrapper around ExponentialBackoff that counts/logs the retry attempts as part of next()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this would be nice but I'm not sure we have access to the true num_attempts (I think it's hidden away inside the RetryIf::spawn implementation) 🤔 if we hit a 4xx error we won't retry at all.

I just meant extracting the constant "3" (or 4) from the code above, and then using it in two places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooh. I see what you meant here. Was missing the RetryIf aspect of this. Hm, it's a bit awkward to lose the conditional retry for client errors... between the two, not reporting the number of retries would be preferable? Sorry for the pivot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, updated


// Handle common HTTP errors.
if response.status().is_server_error() {
return Err(format!(
"Server error ({}) downloading file {} from {}",
response.status().as_str(),
file_name,
url,
));
} else if response.status().is_client_error() {
return Err(format!(
"Client error ({}) downloading file {} from {}",
response.status().as_str(),
file_name,
url,
));
}
let byte_stream = Pin::new(Box::new(response.bytes_stream()));
Ok(NetDownload {
stream: byte_stream,
Expand Down