-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
io: rework I/O driver to use an intrusive linked list for wakers #2779
Comments
This seems to break some abstractions that work now. For example, now you can put the owned read half of a tcp stream into a |
Regarding the We want to reduce the amount "split" APIs a user must pick from like for TCP. There's the by-ref one, the owned one, and the generic io::split. I'd like propose merging the TCP owned version and the generic internally, all while only showing 1 api.
But in the end, this way means there's only 1 api a user needs to work with. |
This refactors I/O registration in a few ways: - Cleans up the cached readiness in `PollEvented`. This cache used to be helpful when readiness was a linked list of `*mut Node`s in `Registration`. Previous refactors have turned `Registration` into just an `AtomicUsize` holding the current readiness, so the cache is just extra work and complexity. Gone. - Polling the `Registration` for readiness now gives a `ReadyEvent`, which includes the driver tick. This event must be passed back into `clear_readiness`, so that the readiness is only cleared from `Registration` if the tick hasn't changed. Previously, it was possible to clear the readiness even though another thread had *just* polled the driver and found the socket ready again. - Registration now also contains an `async fn readiness`, which stores wakers in an instrusive linked list. This allows an unbounded number of tasks to register for readiness (previously, only 1 per direction (read and write)). By using the intrusive linked list, there is no concern of leaking the storage of the wakers, since they are stored inside the `async fn` and released when the future is dropped. - Registration retains a `poll_readiness(Direction)` method, to support `AsyncRead` and `AsyncWrite`. They aren't able to use `async fn`s, and so there are 2 reserved slots for those methods. - IO types where it makes sense to have multiple tasks waiting on them now take advantage of this new `async fn readiness`, such as `UdpSocket` and `UnixDatagram`. Additionally, this makes the `io-driver` "feature" internal-only (no longer documented, not part of public API), and adds a second internal-only feature, `io-readiness`, to group together linked list part of registration that is only used by some of the IO types. After a bit of discussion, changing stream-based transports (like `TcpStream`) to have `async fn read(&self)` is punted, since that is likely too easy of a footgun to activate. Refs: #2779, #2728
I don't see a way to do an owned or edit: silly me, I didn't realize that the api is not longer |
@leshow |
Yep! That was misunderstanding on my part. I've actually included an example of the new concurrent send in my PR for UdpSocket docs |
Enough of this issue has been done for 0.3. The remaining small tweaks for 0.3 are tracked by #2928. |
Refactor I/O driver
Describes changes to the I/O driver for the Tokio 0.3 release.
Goals
async fn
on I/O types with&self
.send
Registration
API.Non-goals
AsyncRead
/AsyncWrite
for&TcpStream
or other reference type.Overview
Currently, I/O types require
&mut self
forasync
functions. The reason forthis is the task's waker is stored in the I/O resource's internal state
(
ScheduledIo
) instead of in the future returned by theasync
function.Because of this limitation, I/O types limit the number of wakers to one per
direction (a direction is either read-related events or write-related events).
Moving the waker from the internal I/O resource's state to the operation's
future enables multiple wakers to be registered per operation. The "intrusive
wake list" strategy used by
Notify
applies to this case, though there are someconcerns unique to the I/O driver.
Reworking the
Registration
typeWhile
Registration
is made private (per #2728), it remains in Tokio as animplementation detail backing I/O resources such as
TcpStream
. The API ofRegistration
is updated to support waiting for an arbitrary interest set with&self
. This supports concurrent waiters with a different readiness interest.A new registration is created for a
T: mio::Evented
and ainterest
. Thiscreates a
ScheduledIo
entry with the I/O driver and registers the resourcewith
mio
.Because Tokio uses edge-triggered notifications, the I/O driver only
receives readiness from the OS once the ready state changes. The I/O driver
must track each resource's known readiness state. This helps prevent syscalls
when the process knows the syscall should return with
EWOULDBLOCK
.A call to
readiness()
checks if the currently known resource readinessoverlaps with
interest
. If it does, then thereadiness()
immediatelyreturns. If it does not, then the task waits until the I/O driver receives a
readiness event.
The pseudocode to perform a TCP read is as follows.
Reworking the
ScheduledIo
typeThe
ScheduledIo
type is switched to use an intrusive waker linked list. Eachentry in the linked list includes the
interest
set passed toreadiness()
.When an I/O event is received from
mio
, the associated resources' readiness isupdated and the waiter list is iterated. All waiters with
interest
thatoverlap the received readiness event are notified. Any waiter with an
interest
that does not overlap the readiness event remains in the list.
Cancel interest on drop
The future returned by
readiness()
uses an intrusive linked list to store thewaker with
ScheduledIo
. Becausereadiness()
can be called concurrently, manywakers may be stored simultaneously in the list. If the
readiness()
future isdropped early, it is essential that the waker is removed from the list. This
prevents leaking memory.
Race condition
Consider how many tasks may concurrently attempt I/O operations. This, combined
with how Tokio uses edge-triggered events, can result in a race condition. Let's
revisit the TCP read function:
If care is not taken, if between
mio_socket.read(buf)
returning andclear_readiness(event)
is called, a readiness event arrives, theread()
function could deadlock. This happens because the readiness event is received,
clear_readiness()
unsets the readiness event, and on the next iteration,readiness().await
will block forever as a new readiness event is not received.The current I/O driver handles this condition by always registering the task's
waker before performing the operation. This is not ideal as it will result in
unnecessary task notification.
Instead, we will use a strategy to prevent clearing readiness if an "unseen"
readiness event has been received. The I/O driver will maintain a "tick" value.
Every time the
mio
poll()
function is called, the tick is incremented. Eachreadiness event has an associated tick. When the I/O driver sets the resource's
readiness, the driver's tick is packed into the atomic
usize
.The
ScheduledIo
readinessAtomicUsize
is structured as:The
reserved
andgeneration
components exist today.The
readiness()
function returns aReadyEvent
value. This value includes thetick
component read with the resource's readiness value. Whenclear_readiness()
is called, theReadyEvent
is provided. Readiness is onlycleared if the current
tick
matches thetick
included in theReadyEvent
.If the tick values do not match, the call to
readiness()
on the next iterationwill not block and the new
tick
is included in the newReadyToken.
TODO
Implementing
AsyncRead
/AsyncWrite
The
AsyncRead
andAsyncWrite
traits use a "poll" based API. This means thatit is not possible to use an intrusive linked list to track the waker.
Additionally, there is no future associated with the operation which means it is
not possible to cancel interest in the readiness events.
To implement
AsyncRead
andAsyncWrite
,ScheduledIo
includes dedicatedwaker values for the read direction and the write direction. These values are
used to store the waker. Specific
interest
is not tracked forAsyncRead
andAsyncWrite
implementations. It is assumed that only events of interest are:Note that "read closed" and "write closed" are only available with Mio 0.7. With
Mio 0.6, things were a bit messy.
It is only possible to implement
AsyncRead
andAsyncWrite
for resource typesthemselves and not for
&Resource
. Implementing the traits for&Resource
would permit concurrent operations to the resource. Because only a single waker
is stored per direction, any concurrent usage would result in deadlocks. An
alterate implementation would call for a
Vec<Waker>
but this would result inmemory leaks.
Enabling reads and writes for
&TcpStream
Instead of implementing
AsyncRead
andAsyncWrite
for&TcpStream
, a newfunction is added to
TcpStream
.Now,
AsyncRead
andAsyncWrite
can be implemented onTcpStreamRef<'a>
. Whenthe
TcpStreamRef
is dropped, all associated waker resources are cleaned up.Removing all the
split()
functionsWith
TcpStream::by_ref()
,TcpStream::split()
is no longer needed. Instead,it is possible to do something as follows.
It is also possible to sotre a
TcpStream
in anArc
.The text was updated successfully, but these errors were encountered: