Skip to content

Memory issue on the big amount of queries #2894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
avorozheev-mwb opened this issue Jan 9, 2023 · 1 comment
Closed

Memory issue on the big amount of queries #2894

avorozheev-mwb opened this issue Jan 9, 2023 · 1 comment

Comments

@avorozheev-mwb
Copy link

I'm debugging a memory issue in my application, and noticed an interesting pattern in how pg-pool works, that leads to an increased memory consumption. This may be not the fault of the library, but more of how I use it, therefore any insights or advice would be helpful.

I've got a NodeJS application that uses Postgres integrated via node-postgres with a big traffic, therefore a significant amount of reads from the database. The consumption from the database happens via typeorm.

The configuration is quite simple:

  • connectionsLimit = 100
  • idleTimeout = 10 000 ms
  • replication with one master and one read replica

The fun begins when application is working in production - with the increase of load the amount of consumed memory significantly increases. When the load is stopped, memory gets released over a span of ~25 minutes.

I've taken a memory snapshot and uploaded it into the visualizing tool, which showed me that a big chunk of memory is occupied by the Postgres connection pool.

image

I've surfed around the pg-pool internals, and figured out the following:

  • when the database query is executed, the available connection tries to be obtained from the pool
  • if there is a free idle connection or the room to create a new one - query gets executed
  • otherwise the query gets added to _pendingQueue and is executed once the next occupied connection becomes idle

So in my case:

  • the application load is higher than the amount of available connections can handle
  • when the amount of pending queries grows bigger than {N} connections, _pendingQueue starts to grow
  • it keeps growing under the constant load, because the query arrival rate is higher than the release rate
  • on a long run it "eats" a significant amount of memory

The main questions at this point are:

  1. Am I doing something dramatically wrong here? What is the best pattern for handling huge database load?
  2. Is there a way to reduce the memory footprint used by pending queries?
  3. I suspect the best way out of this situation is scale the application when the pending queue grows too large. Is consuming waitingCount on an instance of the pool and making a decision based on it is a right thing to do?
@brianc
Copy link
Owner

brianc commented Jan 23, 2023

  1. Am I doing something dramatically wrong here? What is the best pattern for handling huge database load?

Nope! You're doing things right....you just need to scale your app servers more if you're running out of memory.

  1. What is the best pattern for handling huge database load?

More read replicas, with one pg-pool instance per replica will allow more read throughput from postgres (horizontal scaling the db readers)

2. Is there a way to reduce the memory footprint used by pending queries?

Outside of the text & params to the query there isn't much overhead besides that. You could use something like promise-queue or something else higher up in your app to queue a request before submitting to postgres...but YMMV on the amount of memory saved

3. I suspect the best way out of this situation is scale the application when the pending queue grows too large. Is consuming waitingCount on an instance of the pool and making a decision based on it is a right thing to do?

Yup that's a public API you can rely on being there "forever" - there are tests written against it existing. As far as that being the absolute #1 with a bullet way to scale from a decision making perspective...I'd probably go with something more holisitic like number of requests per second or something but that's very application/business specific.

These are great questions - thank you for them! Please lmk if you have any more. ❤️

@brianc brianc closed this as completed Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants