Pool handling of database connectivity #944

mgagliardo91 · 2024-10-01T20:00:27Z

Expected behavior

I am trying to ensure that we have a good grasp on how pg-promise pooling works with various network outages. I have a local setup where I have a pgp pool setup with a connection timeout, query timeout, idle timeout and an allowance of up to 10 connections in the pool.

I'm running into some interesting behavior if I kill the database after the pool has connected at least once. The subsequent queries reuse an idle connection which will then fail with a query timeout and get put back into the pool rather than failing with an actual connection timeout and removed from the pool altogether. This continues into a loop where the same client keeps getting reused, so long as it doesn't timeout, and it will never hit the connect timeout which would have the client thrown away.

Is there anyway, such as within an error handler, to indicate back to the pool that the client should not be reused and instead thrown away? Ultimately, if a database was suddenly inaccessible, I would expect each idle connection to fail and new connections fail on a connect error ultimately leading to alerting in our telemetry stack. Instead, we just get a ton of query timeouts and the same clients keep getting reused.

Actual behavior

Idle clients in the pool continue to get reused when a database is suddenly inaccessible and throw a query timeout instead of the correct connection timeout indicating that the database cannot be communicated with.

Steps to reproduce

Create a pool:

const pool = pgp({
  ...
  max: 10,
  min: 0,
  connectionTimeoutMillis: 5_000,
  query_timeout: 10_000,
  idleTimeoutMillis: 30_000,
})

// query once
await pool.query('SELECT 1 as test');

// pause/shutdown database

// query again
await pool.query('SELECT 1 as test'); // throws "Query read timeout"
await pool.query('SELECT 1 as test'); // throws "Query read timeout"

// wait for all idle clients to return to pool
await pool.query('SELECT 1 as test'); // throws "Connection terminated due to connection timeout"

In the above example, if your clients never fully release from idle timeout you will never hit the connection timeout error

Environment

Version of pg-promise: ^11.9.1
OS type (Linux/Windows/Mac): OSX
Version of Node.js: 18

mgagliardo91 · 2024-10-01T20:01:11Z

I appreciate the help with this @vitaly-t

mgagliardo91 · 2024-10-01T20:24:54Z

As a followup, if I use the error handler and force end the client connection on a query timeout, I do correctly see the client removed from the pool and subsequent connections fail correctly as a connection timeout. The only issue is the fact that a query read timeout doesn't necessarily indicate an issue with the database, but with the looping of idle clients, we will never throw away these clients to learn that the database has had an issue, etc.

const options = {
  error(e, ctx) {
    if ('client' in ctx && e.message == 'Query read timeout') {
      ctx.client.end()
    }
  }
}

vitaly-t · 2024-10-01T21:38:32Z

Maybe you can detect connection issues, as done within Robust-Listeners, and re-create the database object when connection is lost, which will re-create the Pool also.

mgagliardo91 · 2024-10-01T22:41:14Z

@vitaly-t I'm not sure if onLost would get called given the only errors being thrown are query timeouts which are not technically physical connection issues by default. It just so happens that the reuse of idle connections hides the fact that the database is actually not available.

Is there anyway for the underlying .query() method to understand the difference of a timeout due to the query not finishing vs the fact that the endpoint is not even available? I'm a bit surprised that the client is able to call .query() even with the client not technically being connected anymore

vitaly-t · 2024-10-01T23:04:03Z

I'm not really sure, tbh. Query timeouts were added only recently in the underlying driver, it would require an investigation of how they really work underneath, plus some heavy testing.

I'm not even sure if timeouts in the driver are supported on the driver level or on the server level, as that would have a significant effect on the subject.

mgagliardo91 · 2024-10-01T23:15:38Z

I'll dig into the underlying driver more tomorrow - can you think of any negative impact on calling .end() on the client in the error handler after some custom logic to essentially force a new connection? Referring to #944 (comment)

Just trying to determine if we can sort of hack a solution for now over having to consider an alternative

mgagliardo91 added the triage To be investigated label Oct 1, 2024

mgagliardo91 assigned vitaly-t Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pool handling of database connectivity #944

Pool handling of database connectivity #944

mgagliardo91 commented Oct 1, 2024

mgagliardo91 commented Oct 1, 2024

mgagliardo91 commented Oct 1, 2024 •

edited

Loading

vitaly-t commented Oct 1, 2024

mgagliardo91 commented Oct 1, 2024

vitaly-t commented Oct 1, 2024 •

edited

Loading

mgagliardo91 commented Oct 1, 2024 •

edited

Loading

Pool handling of database connectivity #944

Pool handling of database connectivity #944

Comments

mgagliardo91 commented Oct 1, 2024

Expected behavior

Actual behavior

Steps to reproduce

Environment

mgagliardo91 commented Oct 1, 2024

mgagliardo91 commented Oct 1, 2024 • edited Loading

vitaly-t commented Oct 1, 2024

mgagliardo91 commented Oct 1, 2024

vitaly-t commented Oct 1, 2024 • edited Loading

mgagliardo91 commented Oct 1, 2024 • edited Loading

mgagliardo91 commented Oct 1, 2024 •

edited

Loading

vitaly-t commented Oct 1, 2024 •

edited

Loading

mgagliardo91 commented Oct 1, 2024 •

edited

Loading