-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoolInner::acquire() does not try the idle queue after a transient connection failure #2848
Comments
I forgot to mention how I think this should be solved. The logic of To avoid wasting connection attempts, the connection process should be spawned as a separate task so that if the The |
## What ❔ - Increases `max_connections` in postgres (locally) from 100 to 200. - Hopefully, it'll reduce the problems with integration tests in CI. ## Why ❔ - A single server (main node or EN) instance by default uses a connection pool size of 50, and it spawns several additional unique connections, making the total slightly higher than 50. - In integration tests, we run the server and the EN simultaneously, resulting in more than 100 connections being utilized. - We recently bumped SQLx to 0.7, which [seems to have problems with that](launchbadge/sqlx#2848). - ...with 200 connections supported, we will be below the threshold again, so it might help. ## Checklist <!-- Check your PR fulfills the following items. --> <!-- For draft PRs check the boxes as you complete them. --> - [ ] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [ ] Tests for the changes have been added / updated. - [ ] Documentation comments have been added / updated. - [ ] Code has been formatted via `zk fmt` and `zk lint`. - [ ] Spellcheck has been run via `zk spellcheck`. - [ ] Linkcheck has been run via `zk linkcheck`.
While thinking about potential sources of the infamous
PoolTimedOut
error, I realized that there's an interesting failure mode toacquire()
.Once it decides to open a new connection, that's all it tries to do:
sqlx/sqlx-core/src/pool/inner.rs
Lines 283 to 284 in e1ac388
If a nonfatal connection error happens, it just continues in the backoff loop in
connect()
and never touches the idle queue again:sqlx/sqlx-core/src/pool/inner.rs
Line 348 in e1ac388
It will continue to do this until the timeout if the transient error does not resolve itself.
Right now, only the Postgres driver overrides
DatabaseError::is_transient_in_connect_phase()
, but one of the error codes it considers transient is the "too many connections" error:sqlx/sqlx-postgres/src/error.rs
Lines 192 to 195 in e1ac388
This means that if the
max_connections
of the pool exceeds what is currently available on the server, tasks can get stuck in a loop trying to open new connections despite there being idle connections available, leading to surprisingPoolTimedOut
errors.This is potentially the cause of some such issues being reported, although it's only likely to occur with the Postgres driver.
The text was updated successfully, but these errors were encountered: