-
Notifications
You must be signed in to change notification settings - Fork 64
Connection terminated due to connection timeout / Connection terminated #116
Comments
I've the same problem even with I am not sure why the connection get closed, the query fails with this stack trace:
I checked my server config and I don't have |
Same issue here, it's actually error spamming us hard. I hypothesize this has to do with the interplay between killed idle connections and the connection timeout - idle connections die, then a stampede of connections come in, and on the DB side, for whatever reason, it can't handle the stampede of new connections all at once. Maybe something to do with cloud provider limitations on the VM hosting the DB, or whatever, but it's probably a good idea to just not kill idle workers if you can afford to have many connections open in the first place. We're going to probably try setting |
Same problem here too. We set idleTimeoutMillis to 0 and error persists :/ |
I am also working through this error. Seems like connections are aborting instead of waiting? |
+1 |
+1 is this issue ever going to be fixed? |
+1 same problem |
+1 |
We have tried setting |
We have ~40 web instances and are seeing ~250 errors within a minute from a single instance a couple of times a day (max connections is set to 3), and different instances at different times, which rules out it being a real DB connection issue, and means it must be a code issue. We've also set the timeout to 20s, and still get it regularly. It is possible that the issue is related to #131, or is there something else wrong here? |
@coderholic Unless it’s an unhandled error you expect to be handled by
Not sure that that rules a DB connection issue out. Which error messages exactly are you seeing? |
The full error we see is:
If there was an issue with the DB I'd expect to see these errors on all instances, rather than just one. We're doing a fairly consistent 150 queries/second to the DB, so it's not like it's mostly idle and the other instances aren't hitting the DB at the point there could be an issue. I'm also not sure why we see ~250 errors every time this happens. Each instance has 3 max connections, so it seems like it's retrying a lot and throwing the same error again and again. We have a 20s timeout set though, so shouldn't we see this error a max of 3 times per 20s, instead of 250 times within a few seconds? |
@coderholic Okay, that different error message is caused by a timeout acquiring a client when there’s no room left in the pool to connect new ones. I don’t know what specifics could account for the difference in your instances, but seeing lots of errors is easy to explain. One way: 3 queries tie up the pool for 21⅔ seconds, the first 1⅔ seconds of queries after them time out and produce 250 errors, then everything else continues as normal. |
@charmander thanks for the additional details. If I'm understanding correctly, this error is about queued queries timing out while waiting for a connection to the DB to be returned to the pool, it's not actually about a pool connection timing out when connecting to the DB? And is there an error thrown for every queued query? If that's the case then it sounds like one fix would be to increase the pool size? |
Also, I'm fairly sure we don't have any queries that take more than 1s (we have slow logging turned on for anything over 1s, and we don't have any queries from these instances that trigger the log). Is there something else, other than a slow query, that could result in a connection not getting returned to the pool for >20s? |
Right.
For every one that times out, yes.
Yes, if your PostgreSQL server(s) can handle it.
It doesn’t have to be one connection not getting returned to the pool for >20s, the query queue just has to fill up faster than it empties for at least that long. (It also doesn’t have to be the query execution itself that’s slow – try timing queries client-side, especially on the problematic client.) |
Hi there,
I got some issues with this pool.
Maybe this is related to brianc/node-postgres#1790 and brianc/node-postgres#1611.
First of all this is my environment:
I'm using pg-pool with this config:
I get about 10 times a day this error:
Connection terminated due to connection timeout
.This leads to a failure of some query and thus to an error for the end-user.
I already tried to debug this in cooperation with my Database-Provider, however this doesn't seem to be related to that. The ping to my db is about 0.23ms in the same data-center.
After having a look at the code, enabling detailed logging and hours of debugging I think I found the issue.
I think this is related to ending clients on idle timeouts:
https://github.com/brianc/node-pg-pool/blob/master/index.js#L41
Here the idle timeout is triggered, however it does not check if the client is currently used. I know the timeout is cleared in https://github.com/brianc/node-pg-pool/blob/master/index.js#L138, but there seems to be an edge case where a client gets destroyed, which has just been given to the application.
For me that happened once in about 2000 requests.
To prove my intention, I set
idleTimeoutMillis
to0
, which disables the removal of idle clients. This solved the issue for me. I never got this error again.I cannot really say how this can happen nor how to fix it, but I'm pretty sure this is related to the idle-timeout.
Hope this helps debugging it.
Have a nice day,
Dustin
The text was updated successfully, but these errors were encountered: