Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on Broken Connection #5589

Closed
westonpace opened this issue Apr 4, 2024 · 2 comments
Closed

Retry on Broken Connection #5589

westonpace opened this issue Apr 4, 2024 · 2 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@westonpace
Copy link
Member

westonpace commented Apr 4, 2024

Describe the bug
object_store appears to be fairly conservative when it comes to retrying errors:

The error is:

Generic GCS error: Error after 0 retries in 122.602µs, max_retries:10, retry_timeout:180s, source:error sending request for url (https://storage.googleapis.com/some-file.lance): connection error: unexpected end of file

Doing some reverse engineering we can see that this maps to:

reqwest::Error(kind = Kind::Request) // error sending request
hyper::Error(kind = Kind::Io) // connection error
std::io::Error(kind = Kind::UnexpectedEof) // unexpected end of file

I wonder if we were able to catch this in earlier versions of the retry logic because we had is_connect:

if let Some(e) = source.downcast_ref::<hyper::Error>() {
    if e.is_connect() || e.is_closed() || e.is_incomplete_message() {
        do_retry = true;
    }
}

Regrettably, is_connect seems to have disappeared with the latest version of hyper and now we have:

if let Some(e) = e.downcast_ref::<hyper::Error>() {
    do_retry = e.is_closed() || e.is_incomplete_message();
    break
}

Note that e.is_incomplete_message will be false in this case (this is hyper::Error(kind = Kind::Io) and not hyper::Error(kind = Kind::IncompleteMessage))

To Reproduce
Regrettably, I cannot easily reproduce this. I believe this is an error originating from the TCP stack (tokio) and so some kind of TCP emulator (or mock TCP implementation) would be needed.

Expected behavior
The request is retried

Additional context
It may be possible to catch this by further downcasting the source to std::io::Error and retrying if that succeeds. Or, maybe just retry anytime we can downcast to hyper::Error?

@westonpace westonpace added the bug label Apr 4, 2024
@tustvold
Copy link
Contributor

tustvold commented Apr 4, 2024

I wonder if we were able to catch this in earlier versions of the retry logic because we had is_connect:

This version hasn't been released yet, and even then #5536 which made this change moved this retry to a different location (which also might have fixed this).

I believe this is probably a duplicate of #5545

@tustvold tustvold added enhancement Any new improvement worthy of a entry in the changelog and removed bug labels Apr 4, 2024
@tustvold tustvold changed the title object_store does not retry in certain cases where it should Retry on Broken Connection Apr 4, 2024
@westonpace
Copy link
Member Author

We can close as duplicate if you'd like. Both are reqwest::Request error not being retried. I had opened a separate one since this is a GET and not a PUT. I'll comment over there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

2 participants