Skip to content

PoolInner::acquire() does not try the idle queue after a transient connection failure #2848

Open
@abonander

Description

@abonander

While thinking about potential sources of the infamous PoolTimedOut error, I realized that there's an interesting failure mode to acquire().

Once it decides to open a new connection, that's all it tries to do:

// Attempt to connect...
return self.connect(deadline, guard).await;

If a nonfatal connection error happens, it just continues in the backoff loop in connect() and never touches the idle queue again:

Ok(Err(Error::Database(error))) if error.is_transient_in_connect_phase() => (),

It will continue to do this until the timeout if the transient error does not resolve itself.

Right now, only the Postgres driver overrides DatabaseError::is_transient_in_connect_phase(), but one of the error codes it considers transient is the "too many connections" error:

// too_many_connections
// This may be returned if we just un-gracefully closed a connection,
// give the database a chance to notice it and clean it up.
"53300",

This means that if the max_connections of the pool exceeds what is currently available on the server, tasks can get stuck in a loop trying to open new connections despite there being idle connections available, leading to surprising PoolTimedOut errors.

This is potentially the cause of some such issues being reported, although it's only likely to occur with the Postgres driver.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugpoolRelated to SQLx's included connection pool

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions