Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection pool hangs if authentication fails #141

Open
levi-nz opened this issue Nov 6, 2022 · 6 comments
Open

Connection pool hangs if authentication fails #141

levi-nz opened this issue Nov 6, 2022 · 6 comments

Comments

@levi-nz
Copy link

levi-nz commented Nov 6, 2022

When using a connection pool with tokio_postgres, the connection hangs until a TimedOut error is returned if the password is incorrect. This behaviour is undesirable and should just result in authentication failure.

To re-produce, simply try connect to any database using an incorrect password and the connection will hang until it times out.

@djc
Copy link
Owner

djc commented Nov 7, 2022

I'd be happy to review a PR for this, but I don't think I'll be able to work on this myself in the near future.

@couchand
Copy link
Contributor

I've spent a bit of time looking into this while debugging some TLS connection issues. It is decidedly not a simple fix, unfortunately. There are a few things that I've been doing to make life a little bit easier, however:

  • If your deployment allows it, set Builder::min_idle to require at least one open connection before considering the Pool ready. Consider testing the connection parameters on startup with a short-lived pool and very low timeout.
  • Be sure to register an ErrorSink with the method Builder::error_sink which will give you an opportunity to log errors that otherwise would fall off the table.
  • Also, I'll be taking a look at using the crate bb8_failsafe here for more proactive connection error management.

The touchpoint for the issues is at PoolInner::add_connection here:

bb8/bb8/src/inner.rs

Lines 199 to 225 in 5737736

let conn = shared
.manager
.connect()
.and_then(|mut c| async { self.on_acquire_connection(&mut c).await.map(|_| c) })
.await;
match conn {
Ok(conn) => {
let conn = Conn::new(conn);
shared
.internals
.lock()
.put(conn, Some(approval), self.inner.clone());
return Ok(());
}
Err(e) => {
if Instant::now() - start > self.inner.statics.connection_timeout {
let mut locked = shared.internals.lock();
locked.connect_failed(approval);
return Err(e);
} else {
delay = max(Duration::from_millis(200), delay);
delay = min(self.inner.statics.connection_timeout / 2, delay * 2);
sleep(delay).await;
}
}
}
This is the code that ultimately calls into the ManageConnection impl to make the connection, and (in the case of an error) keeps trying until it gets one successfully or the pool's connection timeout expires.

The right behavior in the general case is to delay returning any error until the timeout, but it would be great if there were a way to identify particular classes of errors that shouldn't wait for the timeout. It seems like that would require cooperation from the ManageConnection impl, which means it would require a new trait method (say something like fn error_is_fatal(&self, e: Self::Error) -> bool { return false; } which could then be used to exit early. Then we could override the default impl of that trait method where appropriate. Unfortunately for my use-case (bb8-postgres) the tokio-postgres crate does not expose error details in a way that could reasonably be used for this purpose. Long story short, doing this would be a lot of work.


As a side note @djc or anyone else looking more deeply, this call stack is Pool::get => PoolInner::get => PoolInner::make_pooled => PoolInner::spawn_replenishing_approvals raises my eyebrows. In particular, it seems like the waiter ought to be a Result channel so that a root cause can bubble back up. As it is, we lose the original error even in cases where we are actively waiting for the connection, which does not seem to be the intent of spawn_replenishing_approvals.

bb8/bb8/src/inner.rs

Lines 138 to 139 in 5737736

let approvals = locked.push_waiter(tx, &self.inner.statics);
self.spawn_replenishing_approvals(approvals);

@djc
Copy link
Owner

djc commented Mar 3, 2023

As a side note @djc or anyone else looking more deeply, this call stack is Pool::get => PoolInner::get => PoolInner::make_pooled => PoolInner::spawn_replenishing_approvals raises my eyebrows. In particular, it seems like the waiter ought to be a Result channel so that a root cause can bubble back up. As it is, we lose the original error even in cases where we are actively waiting for the connection, which does not seem to be the intent of spawn_replenishing_approvals.

That sounds sensible. Want to propose a PR?

Unfortunately for my use-case (bb8-postgres) the tokio-postgres crate does not expose error details in a way that could reasonably be used for this purpose. Long story short, doing this would be a lot of work.

Might be worth an issue against tokio-postgres? I would definitely be open to adding an error_is_fatal() like hook to the ManageConnection trait.

@couchand
Copy link
Contributor

couchand commented Mar 3, 2023

That sounds sensible. Want to propose a PR?

Happy to work something up. Since the waiter is stored in internals::PoolInternals and the error pops up in inner::PoolInner::spawn_start_connections, I think the move is to replace this line that throws to the error_sink

Err(e) => this.inner.statics.error_sink.sink(e),

with a call to a new method on SharedPool that looks roughly like:

impl<M: ManageConnection> PoolInternals<M> {
    fn forward_error(&self, err: M::Error) {
        let mut locked = self.internals.lock();
        while let Some(waiter) = locked.waiters.pop_front() {
            match waiter.send(Err(err)) {
                Ok(_) => return,
                Err(e) => err = e,
            }
        }
        self.statics.error_sink.sink(e);
    }
}

Might be worth an issue against tokio-postgres? I would definitely be open to adding an error_is_fatal() like hook to the ManageConnection trait.

It looks like it has been discussed a few times (sfackler/rust-postgres#571, sfackler/rust-postgres#583, sfackler/rust-postgres#790), though usually in a different context. There's even another project that is trying to do exactly the same thing: https://github.com/palfrey/wait-for-db/blob/c3fb9dbad94f06f91f83d6ef9e4af0b93ef64083/src/pg.rs#L34-L49

In at least one case a PR was merged that exposed a limited interface to the error kind (sfackler/rust-postgres#739) so it's reasonable to think a similar PR would be well received.

@djc
Copy link
Owner

djc commented Mar 13, 2023

Yeah, the forward_error() approach seems reasonable. Should probably verify all sink() callers (at least in PoolInner) to see if they should use this.

@couchand
Copy link
Contributor

Hmm, it's not so easy, I guess. The above strategy works, but only if the user has configured retry_connection(false). If not, in practice we hit the "outer" timeout on the waiter channel here:

match timeout(self.inner.statics.connection_timeout, rx).await {
before we exhaust the "inner" timeout on the retry loop here:
|| Instant::now() - start > self.inner.statics.connection_timeout

After a quick review I don't immediately see a great way to get these to line up that doesn't feel like a hack (i.e. make the outer one "a little bit longer" or some such). The outer timeout can't really be removed, since we're relying on that in the face of pool contention. The inner timeout in the context of the retry loop looks locally correct, and it's hard to picture what would need to change really.

I'll go ahead and submit a PR to bubble up the error for the case of a pool configured with retry_connection(false), since that at least makes a little bit of progress here.

couchand added a commit to couchand/bb8 that referenced this issue Mar 20, 2023
This ensures that an active waiter will be notified of a
connection failure before we resort to sending it to the sink.

Contributes to djc#141.
djc pushed a commit that referenced this issue May 26, 2023
This ensures that an active waiter will be notified of a
connection failure before we resort to sending it to the sink.

Contributes to #141.
jshearer added a commit to estuary/flow that referenced this issue Aug 16, 2024
…tation. The problem is, `bb8` swallows all error messages as retry timeouts. See djc/bb8#141
jshearer added a commit to estuary/flow that referenced this issue Aug 29, 2024
…tation. The problem is, `bb8` swallows all error messages as retry timeouts. See djc/bb8#141
jshearer added a commit to estuary/flow that referenced this issue Aug 29, 2024
…tation. The problem is, `bb8` swallows all error messages as retry timeouts. See djc/bb8#141
jshearer added a commit to estuary/flow that referenced this issue Sep 9, 2024
…tation. The problem is, `bb8` swallows all error messages as retry timeouts. See djc/bb8#141
jshearer added a commit to estuary/flow that referenced this issue Sep 12, 2024
…tation. The problem is, `bb8` swallows all error messages as retry timeouts. See djc/bb8#141
jshearer added a commit to estuary/flow that referenced this issue Sep 17, 2024
…mplementation. The problem is, `bb8` swallows all error messages as retry timeouts. See djc/bb8#141
jshearer added a commit to estuary/flow that referenced this issue Sep 17, 2024
…tation. The problem is, `bb8` swallows all error messages as retry timeouts. See djc/bb8#141
jshearer added a commit to estuary/flow that referenced this issue Sep 17, 2024
…tation. The problem is, `bb8` swallows all error messages as retry timeouts. See djc/bb8#141
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants