Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry: Add StandardRetryPolicy and standard_policy mod #698

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

LucioFranco
Copy link
Member

This PR adds a new standard_policy module within the retry module that provides a batteries included policy to be used with the retry middleware. The policy combines the Budget type and generic backoff utlities from the backoff module to provide an easy to use policy with good defaults.

This PR also includes a StandardRetryPolicyBuilder as well as two new traits IsRetryable and CloneRequest. These each have blanket impls for closures. The reason that this implementation breaks these out into two different traits is to allow tower-http to provide a custom CloneRequest implementation that will be able to clone some sort of ReplayBody and let the user pass in the retry decision implementation.

Ref #682

This PR adds a new `standard_policy` module within the retry module
that provides a batteries included policy to be used with the retry
middleware. The policy combines the `Budget` type and generic backoff
utlities from the `backoff` module to provide an easy to use policy
with good defaults.

This PR also includes a `StandardRetryPolicyBuilder` as well as two
new traits `IsRetryable` and `CloneRequest`. These each have blanket
impls for closures. The reason that this implementation breaks these
out into two different traits is to allow `tower-http` to provide
a custom `CloneRequest` implementation that will be able to clone
some sort of `ReplayBody` and let the user pass in the retry decision
implementation.

Ref #682
@LucioFranco LucioFranco mentioned this pull request Oct 12, 2022
19 tasks
@LucioFranco LucioFranco force-pushed the lucio/standard-policy2 branch from 716f9d0 to 88b7df9 Compare October 12, 2022 15:12
Copy link
Member

@davidpdrsn davidpdrsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


pub mod backoff;
pub mod budget;
pub mod future;
mod layer;
mod policy;
pub mod standard_policy;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I feel about calling this "standard policy"; I might prefer a name that actually describes what the policy does, rather than just saying it's "standard"...

tower/src/retry/standard_policy.rs Outdated Show resolved Hide resolved
Comment on lines 46 to 56
/// A trait to determine if the request associated with the response should be
/// retried by [`StandardRetryPolicy`].
///
/// # Closure
///
/// This trait provides a blanket implementation for a closure of the type
/// `Fn(&mut Result<Res, E>) -> bool + Send + Sync + 'static`.
pub trait IsRetryable<Res, E>: Send + Sync + 'static {
/// Return `true` if the request associated with the response should be
/// retried and `false` if it should not be retried.
fn is_retryalbe(&self, response: &mut Result<Res, E>) -> bool;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it seems like this trait should be able to consider the request as well as the response in determining whether a result is retryable. For example, some HTTP request methods are idempotent, and others are non-idempotent; and non-idempotent requests should generally not be retried. With the current implementation, the IsRetryable function can't determine that a given request method is never retryable (without some kind of unfortunate contortion to stuff metadata about the request into the response type, or something).

It seems like it would be good to allow the request type to be considered as part of making this decision, especially if we want the StandardPolicy to be usable for HTTP retries...

Comment on lines +59 to +68
/// A trait to clone a request for the [`StandardRetryPolicy`].
///
/// # Closure
///
/// This trait provides a blanket implementation for a closure of the type
/// `Fn(&Req) -> Option<Req> + Send + Sync + 'static`.
pub trait CloneRequest<Req>: Send + Sync + 'static {
/// Clone a request, if `None` is returned the request will not be retried.
fn clone_request(&self, request: &Req) -> Option<Req>;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like there probably ought to be an implementation of this that's impl CloneRequest<Req> where Req: Clone that just calls Clone. Perhaps that ought to be the default?

Comment on lines +8 to +9
//! - [`IsRetryable`] by default will always return `false`.
//! - [`CloneRequest`] by default will always return `None`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems kind of unfortunate to me that the default behavior for the batteries-included retry policy is to...never retry anything. In particular, defaulting to a CloneRequest implementation that always returns None feels a bit weird to me.

What do you think about an API where the default implementation expects that the request type implements Clone, and just calls its Clone method? That way, users whose requests implement Clone never have to think about the CloneRequest trait, and providing a CloneRequest implementation is only necessary if you want to retry !Clone requests?

tower/src/retry/mod.rs Outdated Show resolved Hide resolved
//!
//! The [`budget`] module contains utilities to reduce the amount of concurrent
//! retries made by a tower middleware stack. The goal for this is to reduce
//! congestive collapse when downstream systems degrade.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could maybe be nice to make "congestive collapse" a link to Wikipedia or something?

Comment on lines 5 to 14
//! The [`standard_policy`] module contains a default retry [`Policy`] that can
//! be used with the [`Retry`] middleware. For more information check the module
//! docs for [`standard_policy`].
//!
//! The [`backoff`] module contains backoff utlities that can be used in a
//! custom [`Policy`].
//!
//! The [`budget`] module contains utilities to reduce the amount of concurrent
//! retries made by a tower middleware stack. The goal for this is to reduce
//! congestive collapse when downstream systems degrade.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take it or leave it: it seems kind of like these paragraphs could make sense as a bulleted list?

}
}

/// A standard retry [`Policy`] that combines a retry budget and a backoff
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "retry budget" and "backoff" here should link to the respective modules?

Comment on lines +97 to +98
is_retryable: Arc<dyn IsRetryable<Res, E>>,
clone_request: Arc<dyn CloneRequest<Req>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a little bit on the fence about whether always Arcing these is the right choice or not --- a question that's definitely come up in the past with other middleware that needs to clone a Fn value. for example, MapRequest is generic over a F: FnMut(...) + Clone, and MapResponse is generic over a FnOnce + Clone, rather than Arcing a Fn...

F: FnMut(R1) -> R2 + Clone,

in this case, since the functions are always cloned into each request, it wouldn't make sense to allow FnMut + Clone, since the mutable state wouldn't be shared across instances of the closure. but, it would allow the use of function pointers (not closures) to avoid allocating an Arc and bumping two reference counts on every request...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, maybe a microoptimization, but it could be worth considering combining the IsRetryable and CloneRequest impls into a single struct that's then Arced, so that we only increment/decrement one atomic ref count on each request, rather than two...but, OTTOH, we would probably have to either box them or make the struct generic, if we did that, and boxing them would mean two heap ptr derefs...:woman_shrugging:

LucioFranco and others added 2 commits October 27, 2022 11:22
Co-authored-by: Eliza Weisman <eliza@buoyant.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants