Refactor Pool Stats to be based off of Server/Client stats #445
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is wrong
Stats reported by
SHOW POOLS
seem to be leaking. We see lingeringcl_idle
,cl_waiting
, and similarly forsv_idle
,sv_active
. We confirmed that these are reporting issues not actual lingering clients.This behavior is readily reproducible by running
Why it happens
I wasn't able to get to figure our the reason for the bug but my best guess is that we have race conditions when updating pool-level stats. So even though individual update operations are atomic, we perform a check then update sequence which is not protected by a guard.
https://github.com/postgresml/pgcat/blob/main/src/stats/pool.rs#L174-L179
I am also suspecting that using Relaxed ordering might allow this behavior(I changed all operations to useOrdering::SeqCst
but still got lingering clients)How to fix
Since
SHOW POOLS
/SHOW SERVER
/SHOW CLIENTS
all show the current state of the proxy (as opposed toSHOW STATS
which show aggregate values), this PR refactorsSHOW POOLS
to have it construct the results directly fromSHOW SERVER
andSHOW CLIENT
datasets. This reduces the complexity of stat updates and eliminates the need for having locks when updating pool stats as we only care about updating individual client/server states.This will change the semantics of
maxwait
, so instead of it holding the maxwait time ever encountered by a client (connected or disconnected), it will only consider connected clients which should be okay given PgCat tends to hold on to client connections more than Pgbouncer.