-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide custom function for max_lifetime to prevent stampeding reconnections #2854
Comments
I think this would be good to take care of at the same time as #2848 (comment) However, since our pool implementation is more opinionated than a generic pool, I think a callback that returns a I'm thinking of a couple of different approaches here, which aren't necessarily mutually exclusive: An async impl<DB: Database> PoolOptions<DB> {
/// Set an async callback to be invoked before beginning a connection attempt.
///
/// The callback is passed a reference to the `ConnectOptions` that will be used to open the connection.
/// and is free to return a modified version.
///
/// The async nature of the callback allows for any pre-connection work to be done,
/// such as a DNS lookup to resolve a domain for a high-availability server,
/// or to insert an artificial delay to mitigate a potential stampeding herd effect.
pub fn before_connect<F>(self, before_connect: F) -> Self
where
F: for<'a> Fn(&'a DB::ConnectOptions) -> BoxFuture<'a, crate::Result<Cow<'a, DB::ConnectOptions>>>,
F: Send + Sync + 'static
{ ... }
} In my mind, this would only be invoked at the beginning of For a more high-level interface though, I was thinking something like this (mimicking impl<DB: Database> PoolOptions<DB> {
/// Set the base value for initial and backoff delay calculations for connection attempts.
///
/// This value will have random jitter applied to mitigate a potential stampeding herd effect.
///
/// See [`.connect_delay_randomization_factor()`][Self::connect_delay_randomization_factor].
pub fn base_connect_delay(self, delay: impl Into<Option<Duration>>) -> Self { ... }
// I would like to come up with a more concise name for this,
// or use something with a rigid definition in statistics like variance.
/// The maximum of a random jitter factor between 0 and 1 that will be applied to the connection delay.
///
/// A value of 0.5 means that each connect delay will have a random jitter applied between +/- 50%.
pub fn connect_delay_randomization_factor(self, factor: f64) -> Self { ... }
/// The exponential growth factor for connection backoffs.
///
/// The backoff delay of each subsequent connection attempt for a given `acquire()` call will be multiplied by this factor,
/// before jitter is applied.
pub fn connect_delay_multiplier(self, multiplier: f64) -> Self { ... }
} |
Is your feature request related to a problem? Please describe.
When enabling
max_lifetime
on a service, SQLx will recreate all of the initial connections at the same time everymax_lifetime
interval. This causes excessive load on the database & causes requests to hang until the connections have opened again.Take this example configuration:
Assuming no more than 8 connections were created in the pool, all 8 connections would have to be created every 30 minutes, therefore causing queries to hang.
Similarly, if the service always has a high load (even with
min_connections
= 0), the connections created on startup would have a stampeding effect when they all reconnect exactly every 30 minutes. The current defaults havemax_lifetime
enabled, which is prone to this issue.It looks like there's already code in place to limit the number of reaped connections which helps limit the number of connections affected if
min_connections
is configured (which is not in the default configuration).Here's an example of the connection rate from some minimal testing:
min_connections = 0, max_lifetime = 5 minutes, running under load
Describe the solution you'd like
I'd like to be able to control the
max_lifetime
for each individual connection by providing a function that returns a duration. This way, we can provide randomness tomax_lifetime
.async-nats
has a similar implementation forreconnect_delay_callback
.Describe alternatives you've considered
max_lifetime
max_lifetime
and leavingtest_before_acquire
set totrue
should work fine as long as there are no memory leaks in the database.The text was updated successfully, but these errors were encountered: