-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PubSub publish timeout #8
Conversation
This adds a timeout to publishing messages to PubSub, as well as improves some logging in the default retry implementation. The initial timeout implementation here is done a semver-compatible way, and with a relatively simple configurable fixed timeout. In the next major semver I'd like to make the configuration required, and potentially add some time-based retrying and timeout backoff.
0bd0ded
to
b220eea
Compare
I see logic for exponential backoff and that code is part of the pubsub client struct. Are we just not using that logic yet, or is this referring to something else? |
let publish = client.publish(tonic_request); | ||
|
||
// apply a timeout to the publish | ||
let publish_fut = tokio::time::timeout(timeout, publish).map_err( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know all that much about grpc... do we need both a timeout on the client and server side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Client side is the more obvious one, in case the server never responds, or a network problem, or whatever. Server side is more of a good convention; if the server takes too long and knows the client will timeout, the server is allowed to free resources after that time, which can help responsiveness overall when under load.
|
||
// apply a timeout to the publish | ||
let publish_fut = tokio::time::timeout(timeout, publish).map_err( | ||
|tokio::time::error::Elapsed { .. }| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we hit the server side timeout, does tokio return the same time::error:Elapsed error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Server timeout would result in an error from the wrappee publish
future. That's why the match is on a double Result<Result<T, Status> Status>
. I'd have used flatten
but it's not stable yet rust-lang/rust#70142
backoff_ms = %interval.as_millis(), | ||
remaining_attempts = match self.intervals.size_hint() { | ||
(_lower_bound, Some(upper_bound)) => upper_bound, | ||
(lower_bound, None) => lower_bound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't grok why the lower bound of the iterator would be indicative of the remaining attempts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's not exactly indicative. I thought about having "unknown" if there's no upper bound. In practice this will never matter because the only possible iterator will always have an upper bound (.take(max_retries.unwrap_or(usize::MAX))
)
There's exponential backoff for retries (i.e. try again after increasing sleep). Timeouts could also use such a backoff (i.e. wait for a response with increasing timeout), instead of the current fixed timeout, and potentially the same or similar backoff code. However the current retry logic is count-limited and not time-limited, which means if the backoff is expontential and the timeout is exponential, retrying a series of timed-out requests will take N attempts which might add to many minutes of time. If retries were time-limited, then there'd be a better overall cap to the possible delay from a series of timeouts. Having time-limited retries is a worthwhile change, but will take some work and I want to get these timeouts in place first. |
This adds a timeout to publishing messages to PubSub, as well as
improves some logging in the default retry implementation.
The initial timeout implementation here is done a semver-compatible way,
and with a relatively simple configurable fixed timeout. In the next
major semver I'd like to make the configuration required, and
potentially add some time-based retrying and timeout backoff.