Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool handling of database connectivity #944

Open
mgagliardo91 opened this issue Oct 1, 2024 · 6 comments
Open

Pool handling of database connectivity #944

mgagliardo91 opened this issue Oct 1, 2024 · 6 comments
Assignees
Labels
triage To be investigated

Comments

@mgagliardo91
Copy link

Expected behavior

I am trying to ensure that we have a good grasp on how pg-promise pooling works with various network outages. I have a local setup where I have a pgp pool setup with a connection timeout, query timeout, idle timeout and an allowance of up to 10 connections in the pool.

I'm running into some interesting behavior if I kill the database after the pool has connected at least once. The subsequent queries reuse an idle connection which will then fail with a query timeout and get put back into the pool rather than failing with an actual connection timeout and removed from the pool altogether. This continues into a loop where the same client keeps getting reused, so long as it doesn't timeout, and it will never hit the connect timeout which would have the client thrown away.

Is there anyway, such as within an error handler, to indicate back to the pool that the client should not be reused and instead thrown away? Ultimately, if a database was suddenly inaccessible, I would expect each idle connection to fail and new connections fail on a connect error ultimately leading to alerting in our telemetry stack. Instead, we just get a ton of query timeouts and the same clients keep getting reused.

Actual behavior

Idle clients in the pool continue to get reused when a database is suddenly inaccessible and throw a query timeout instead of the correct connection timeout indicating that the database cannot be communicated with.

Steps to reproduce

  1. Create a pool:
const pool = pgp({
  ...
  max: 10,
  min: 0,
  connectionTimeoutMillis: 5_000,
  query_timeout: 10_000,
  idleTimeoutMillis: 30_000,
})

// query once
await pool.query('SELECT 1 as test');

// pause/shutdown database

// query again
await pool.query('SELECT 1 as test'); // throws "Query read timeout"
await pool.query('SELECT 1 as test'); // throws "Query read timeout"

// wait for all idle clients to return to pool
await pool.query('SELECT 1 as test'); // throws "Connection terminated due to connection timeout"

In the above example, if your clients never fully release from idle timeout you will never hit the connection timeout error

Environment

  • Version of pg-promise: ^11.9.1
  • OS type (Linux/Windows/Mac): OSX
  • Version of Node.js: 18
@mgagliardo91 mgagliardo91 added the triage To be investigated label Oct 1, 2024
@mgagliardo91
Copy link
Author

I appreciate the help with this @vitaly-t

@mgagliardo91
Copy link
Author

mgagliardo91 commented Oct 1, 2024

As a followup, if I use the error handler and force end the client connection on a query timeout, I do correctly see the client removed from the pool and subsequent connections fail correctly as a connection timeout. The only issue is the fact that a query read timeout doesn't necessarily indicate an issue with the database, but with the looping of idle clients, we will never throw away these clients to learn that the database has had an issue, etc.

const options = {
  error(e, ctx) {
    if ('client' in ctx && e.message == 'Query read timeout') {
      ctx.client.end()
    }
  }
}

@vitaly-t
Copy link
Owner

vitaly-t commented Oct 1, 2024

Maybe you can detect connection issues, as done within Robust-Listeners, and re-create the database object when connection is lost, which will re-create the Pool also.

@mgagliardo91
Copy link
Author

@vitaly-t I'm not sure if onLost would get called given the only errors being thrown are query timeouts which are not technically physical connection issues by default. It just so happens that the reuse of idle connections hides the fact that the database is actually not available.

Is there anyway for the underlying .query() method to understand the difference of a timeout due to the query not finishing vs the fact that the endpoint is not even available? I'm a bit surprised that the client is able to call .query() even with the client not technically being connected anymore

@vitaly-t
Copy link
Owner

vitaly-t commented Oct 1, 2024

I'm not really sure, tbh. Query timeouts were added only recently in the underlying driver, it would require an investigation of how they really work underneath, plus some heavy testing.

I'm not even sure if timeouts in the driver are supported on the driver level or on the server level, as that would have a significant effect on the subject.

@mgagliardo91
Copy link
Author

mgagliardo91 commented Oct 1, 2024

I'll dig into the underlying driver more tomorrow - can you think of any negative impact on calling .end() on the client in the error handler after some custom logic to essentially force a new connection? Referring to #944 (comment)

Just trying to determine if we can sort of hack a solution for now over having to consider an alternative

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage To be investigated
Projects
None yet
Development

No branches or pull requests

2 participants