Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tonic Transport Errors not mapped to Unavailable #628

Closed
dfreese opened this issue May 4, 2021 · 3 comments · Fixed by #629
Closed

Tonic Transport Errors not mapped to Unavailable #628

dfreese opened this issue May 4, 2021 · 3 comments · Fixed by #629

Comments

@dfreese
Copy link
Contributor

dfreese commented May 4, 2021

Bug Report

Version

For completeness, all of the potentially relevant deps in our Cargo.toml are:

[dependencies]
async-stream = "=0.3.1"
futures = "=0.3.14"
futures-util = "=0.3.14"
http = "=0.2.4"
hyper = "=0.14.5"
prost = "=0.7.0"
prost-derive = "=0.7.0"
prost-types = "=0.7.0"
prost-build = "=0.7.0"
ring = "=0.16.20"
rustls = { version = "=0.19.0", features = ["default", "dangerous_configuration"] }
rustls-native-certs = "=0.5.0"
tokio = { version = "=1.5.0", features = ["full"] }
tokio-stream = "=0.1.5"
tokio-util = "=0.6.6"
tonic = { version = "=0.4.2", features = ["default", "tls", "tls-roots", "transport"] }
tonic-build = "=0.4.2"
tower = "=0.4.6"

Platform

Linux 5.8.0-38-generic #43~20.04.1-Ubuntu SMP Tue Jan 12 16:39:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Description

We were doing robustness testing of our connections. We start with a basic tonic::transport::let channel = tonic::transport::Channel::from_static("http://some-endpoint.com").connect_lazy(). We will either have the network disconnected at the start, or disconnect it during the use of the client. We've seen a copy different errors related to the underlying connection, but they all show up as transport errors.

Failed to upload file "4": Tonic gRPC Error: status: Unknown, message: "transport error: connection error: broken pipe", details: [], metadata: MetadataMap { headers: {} }

Failed to upload file "4": Tonic gRPC Error: status: Unknown, message: "transport error: error trying to connect: dns error: failed to lookup address information: Name or service not known", details: [], metadata: MetadataMap { headers: {} }

I believe Unavailable is the more correct code here, so that the client knows it's valid to retry. I'm relatively sure that is consistent with gRPC's implementation. It's been a little bit, so I'll need to put together a specific example.

@dfreese
Copy link
Contributor Author

dfreese commented May 4, 2021

Perhaps tonic::transport::Error could be handled, like TimeoutExpired?

tonic/tonic/src/status.rs

Lines 329 to 338 in 9ff4f7b

#[cfg(feature = "transport")]
{
if let Some(h2) = err.downcast_ref::<h2::Error>() {
return Some(Status::from_h2_error(h2));
}
if let Some(timeout) = err.downcast_ref::<crate::transport::TimeoutExpired>() {
return Some(Status::cancelled(timeout.to_string()));
}
}

@davidpdrsn
Copy link
Member

Could any of errors actually be fixed by retrying? If so then yeah it does seem appropriate. Though I'm no expert on the spec.

dfreese pushed a commit to dfreese/tonic that referenced this issue May 4, 2021
This came up when testing retry logic when using the
tonic::transport::Error.  Specifically, we would run into Unknown errors
during the following sequence:

* unplug ethernet
* create client with connect_lazy
* unary call -> Unknown error, representing a tonic::transport::Error

Fixes hyperium#628
@dfreese
Copy link
Contributor Author

dfreese commented May 4, 2021

Yeah. So what we did for testing was:

  1. Unplug the network cable
  2. Construct the client lazily
  3. Attempt to make a unary call -> Unknown dns error right away (This step can be repeated many times with the same result)
  4. Reconnect network cable
  5. Attempt to make a unary call -> succeeds

Though, because of how Connection and Reconnect interact, the behavior is slightly different depending on if things have succeeded previously or not

  1. Construct the client lazily
  2. Attempt to make a unary call -> succeeds
  3. Unplug the network cable
  4. Attempt to make a unary call -> hangs or succeeds depending on length of time between this step and next
  5. Reconnect network cable
  6. Previous unary call errors out (assuming step 4 hung)
  7. Attempt to make a unary call -> succeeds

Step 4 won't hang indefinitely if timeout is set. In that case it returns a:

Tonic gRPC Error: status: Unknown, message: "transport error: request timed out", details: [], metadata: MetadataMap { headers: {} }

I put up #629 as a draft approach to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants