-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FIFO queue of waiters #42
Conversation
I'll also note two other desirable properties that we get from this waiter queue:
|
@tstat just FYI, I merged this branch into my fork via this PR, alongside other long-standing PRs: one, two If you're interested, I'd appreciate if you review the validity of the merge, as I had to conflict-resolve a few lines because of the changes introduced by the other two PRs. As the repo currently doesn't have public tests, I ran the resulting master branch with my internal project and it worked without visible issues, at least. |
@tstat Thanks a lot for this PR! What do you think about limiting the waiter queue? I was thinking that could serve as a way to do backpressure and prevent the queue from getting too big. |
Have you tried using the |
I think this PR could be closed, as this package has changed hands and been re-implemented here: https://github.com/scrive/pool I'll also note that the new implementation bears a lot of resemblance to this PR, but based on a queue of MVars rather than TVars, perhaps some attribution is in order? :) |
@mitchellwrosen the current implementation is based on |
Under the current design, resources are kept in a
TVar
pointing to a list of resources. If a thread attempts to acquire a resource and finds the list empty it callsretry
, adding itself as a waiter to the resource listTVar
. When the resourceTVar
is later modified all waiters are awoken and free to retry their STM transaction. This is problematic when there are many threads waiting on a resource as they will all be awoken by a thread that puts a resource but only one of the awoken threads will acquire that resource and the rest will fail their validation step. All of these wake ups and retries create some overhead.To measure this overhead I created a benchmark that forks some number of threads, each of which executes the following loop 10 times: acquire a postgresql connection from a pool and execute a simple query that takes a fixed amount of time (using
pg_sleep
). The main thread waits for all forked threads to complete this work. Benchmark output names look liket=<int>/c=<int>/d=<int>/s=<int>
wheret
is the number of threads forked,c
is the maximum number of connections the pool allows,d
is the postgresql query delay in microseconds, ands
is the number of stripes thatc
is spread over. I am running these benchmarks with the following flags:--regress=mutatorCpuSeconds:iters +RTS -N20 -A128m -qn2 -I0 -T
.Looking at the output:
We see 109.085 seconds of cpu time consumed when we use one stripe. We can lower this by increasing the number of stripes as fewer threads will be awoken on each put. With 25 stripes of 1 connection we reduce the cpu time consumed to 11.292 seconds, and our running time is reduced from 6.158 seconds to 1.074 seconds. Of course, we are unfairly favoring high striping in this benchmark since each thread does the exact same amount of work. In an actual application with uneven work across threads this would have undesirable consequences.
It is clear that performance improves if we wake up fewer threads, and an ideal scenario would be if we woke up at most one thread when a resource is returned. To this end, I propose that we add a FIFO queue of
TMVar (Entry a)
to the pool. When a thread attempts to acquire a resource it first checks the available resource list. If this list is empty then the thread adds an emptyTMVar
to the waiter queue and waits on it. When returning a resource to the pool the returning thread first checks if there are any waiters in the queue, and if so returns straight to the waiter, waking up only 1 thread.There is a complication though: a waiting thread might be killed, and if so we don't want to put a resource into its
TMVar
. To avoid this issue the queue implemented in this PR allows for removing any entry from the queue. Any thread that enqueues aTMVar
and blocks on reading it first installs an exception handler that removes itself from the queue and returns the resource to the pool if it has already been put into itsTMVar
.The benchmark shows promising results for this change:
Here we see the benchmark ran against this PR with a single stripe and the running time is less than half the running time of the above 25 stripe pool while cpu time is less than 2% the cpu time consumed by the 25 stripe pool. Indeed, with this PR I did not run a benchmark where striping would be beneficial.
The full results may be found here and the html output may be found here.