-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUST-556 POC of maxConnecting #259
RUST-556 POC of maxConnecting #259
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clippy failures seem unrelated to this PR. The other failures are either due to the max files issue or should be fixed in #258 I think.
src/cmap/test/mod.rs
Outdated
|
||
const TEST_DESCRIPTIONS_TO_SKIP: &[&str] = &[ | ||
"must destroy checked in connection if pool has been closed", | ||
"must throw error if checkOut is called on a closed pool", | ||
]; | ||
|
||
const EVENT_DELAY: Duration = Duration::from_millis(500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a temporary solution until the CMAP spec test changes in #258 go in. I wanted to avoid duplicating them here and just opted for the easy solution. Even with this generous delay, we still sometimes don't see the events on Windows with async-std, which is a bit concerning. The failures should go away once the aforementioned changes are merged in though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed the refactor of the pool and added comments for it. I'll come back and do a second review of the updates for maxConnecting later this week
I’m going to mainly focus my review on the design you describe and the big picture here, rather than the code. I trust that Isabel and Sam can confirm your code does what you say it does 🙂 I mainly just have questions about the overall architecture. I don't think I necessarily know enough about CMAP or what an idiomatic async pool would look like to have strong opinions, so it might be worthwhile having Matt weigh in on this since he is familiar with the past pool designs, and also with maxConnecting, but I will defer to your judgement on whether that is needed.
This isn’t that important but just curious, IIUC something being clonable means you can make a deep copy of it, right? Why do we need that capability for our connection pool? So what I understand so far is that (I think) there is a 1:1 pool-worker relationship. Is there also a single manager and a single requester per pool? I thought so from the overview, but then this sentence confused me, because it sounds like there are multiple requesters which a single worker is paying attention to:
Also, at what point is a requester dropped? |
Generally, yes, this is correct. However, Rust's mechanism for reference counted types is to define types in the standard library that implement the same Clone trait as other clonable types but with semantics that share the underlying data. In this case, we utilize those types for all of the mutable state in the connection pool to make it act as if there's a shallow copy. We use the same strategy for Client, Database, and Collection types to allow users to make copies to send to different threads/tasks. |
@saghm Ahh I see, that makes sense. Tangential question: it makes sense to me that a client would be a reference type, since you want to be able to share all of its underlying resources between objects that reference it. But why make databases and collections reference types, too? Would doing so mean that copying them ends up deep copying everything they store references to (such clients)? I'm comparing this to Swift where if you copy a value type - say |
You raise a good point here; I don't think we technically need to make the Database and Collection reference counted right now since the only mutable data they have (the internals of the underlying client) are already reference counted, so it doesn't actually make any semantic difference from the users perspective. This is all internal implementation right now though, so the only real effect of making Database and Collection reference counted right now is that it causes the small amount of data specific to the Database and Collection types (their names, read preferences, and read/write concerns) allocated on the heap rather than the stack. |
Gotcha, thanks for the explanation! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what I understand so far is that (I think) there is a 1:1 pool-worker relationship. Is there also a single manager and a single requester per pool? I thought so from the overview, but then this sentence confused me, because it sounds like there are multiple requesters which a single worker is paying attention to. Also, at what point is a requester dropped?
This is partially explained by the cloning discussion above. While there is a 1:1 relationship between requesters and ConnectionPools
, there can be many requesters for a single worker because each instance of ConnectionPool
has one and can be cloned. A requester is dropped when the ConnectionPool
instance that owns it is dropped. ConnectionPools
also contain managers, so there can be many managers per worker, and since checked out Connections
also have managers, there isn't a 1:1 relationship between pools and managers for a given worker.
The lint failures are likely due to the new minor version of Rust that came out since you started this PR; Isabel merged a PR to fix them last week, so if you rebase with master, I think they should go away |
dbe3ad5
to
45eda2c
Compare
async fn execute(mut self) { | ||
let mut maintenance_interval = RUNTIME.interval(Duration::from_millis(500)); | ||
|
||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the unfinished request cache to the receiver itself, which made this loop/select a little simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, this does make it a lot easier to follow!
Just rebased, hopefully that fixes them. |
45eda2c
to
09b1fc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the questions I had about this have been answered, and the fully green build is super nice, so I think this is ready for my LGTM!
@patrickfreed Thanks for the explanation, that makes sense. Design SGTM then. |
RUST-556
This PR implements a POC of the maxConnecting requirement of the "Avoiding Connection Storms" project. In doing so, it also re-architects the connection pool to use a channel based WaitQueue rather than semaphore based one. This was done to allow for easier and more effective maintenance of the wait queue now that we need to support arbitrary conditions for exiting the queue (e.g. maxConnecting).
I realize that a rewrite of this magnitude typically goes through a design phase, so I apologize for suddenly dropping it like this. I decided it was the preferable choice for this POC largely because a counting semaphore wait queue was frustrating to extend to use in this case, and that this lockless + channel-based pattern is a much more common aproach in the async world for accessing shared state than our previous lock heavy one. Maintaining the old pool and adding support for maxConnecting would introduce even more locking while also producing a suboptimal solution (impossible or difficult to ensure fairness with multiple semaphores/locks in this case), so this seemed like an good time for a rewrite. Additionally, this is a proof-of-concept after all, so I figured it would be a good time to proof-of-concept what an idiomatic async pool would look like too. I think that I should have opted for a pool like this back when I originally converted the old pool to async, but I was lacking in experience with async patterns at the time. Ironically, the original sync pool with its queue of condvars was somewhat similar to this one.
Overview of new pool
I'll give a brief overview of the layout of the new pool here to make digesting the actual code a little easier. We have the same clonable
ConnectionPool
type as before with the same functionality; however, when we create a new pool, instead of creating aConnectionPoolInner
type that is arc'd and wrapping that, we start a worker task (ConnectionPoolWorker
) that opens up a few different channels, and we store senders to those channels (ConnectionRequester
,PoolManager
) on theConnectionPool
. The worker task is in charge of all changes and access to the connection pool state; to do anything with the pool, a you need to send a request via one of the two senders mentioned before. To check out a connection for example, a thread sends aConnectionRequest
via aConnectionRequester
and waits for the worker to respond. To perform any other pool interactions (e.g. clearing, checking in), a thread uses thePoolManager
to send the appropriate requests. TheConnectionRequesters
act like strong references to the pool in that they keep the worker alive until they are all dropped, whereasPoolManagers
are like weak references and can still be in scope when the worker ceases execution. AConnectionPool
instance holds onto both aConnectionRequester
and aPoolManager
, but aConnection
for example only holds on to aPoolManager
in order to check itself back in. Instead of having a separate background task for ensuring minPoolSize, the worker task just takes care of that.In summary:
ConnectionPoolInner
gone, replaced with a background worker taskConnectionPoolWorker
that has exclusive access to pool stateConnectionPoolWorker
listens on a few channels for incoming requests, responding to them as necessaryConnectionRequester
s. TheConnectionPoolWorker
quits once all the requesters are droppedPoolManager
s. These do not keep the worker running and may no-op if outlive the worker.Connection
s usePoolManager
s to check themselves back inSome notes
Given this architecture, it was trivially easy to introduce maxConnecting: we simply don't poll the
ConnectionRequestReceiver
when the pool is empty and has too many pending connections. The actual changes required to implement that part are all in the last few commits or so. Additionally, we now have a context which always has exclusive access to the pool state, making it much easier to reason about. Lastly, thedrop
implementations of the various connection types have been simplified with the removal ofConnectionPoolInner
andDroppedConnectionState
.