Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor IO registration using intrusive linked list #2828

Merged
merged 13 commits into from
Sep 23, 2020

Conversation

seanmonstar
Copy link
Member

@seanmonstar seanmonstar commented Sep 10, 2020

This refactors IO registration in a few ways:

  • Cleans up the cached readiness in PollEvented. This cache used to be helpful when readiness was a linked list of *mut Nodes in Registration. Previous refactors have turned Registration into just an AtomicUsize holding the current readiness, so the cache is just extra work and complexity. Gone.
  • Polling the Registration for readiness now gives a ReadyEvent, which includes the driver tick. This event must be passed back into clear_readiness, so that the readiness is only cleared from Registration if the tick hasn't changed. Previously, it was possible to clear the readiness even though another thread had just polled the driver and found the socket ready again.
  • Registration now also contains an async fn readiness, which stores wakers in an instrusive linked list. This allows an unbounded number of tasks to register for readiness (previously, only 1 per direction (read and write)). By using the intrusive linked list, there is no concern of leaking the storage of the wakers, since they are stored inside the async fn and released when the future is dropped.
  • Registration retains a poll_readiness(Direction) method, to support AsyncRead and AsyncWrite. They aren't able to use async fns, and so there are 2 reserved slots for those methods.
  • IO types where it makes sense to have multiple tasks waiting on them now take advantage of this new async fn readiness, such as UdpSocket and UnixDatagram.

Additionally, this makes the io-driver "feature" internal-only (no longer documented, not part of public API), and adds a second internal-only feature, io-readiness, to group together linked list part of registration that is only used by some of the IO types.

After a bit of discussion, changing stream-based transports (like TcpStream) to have async fn read(&self) is punted, since that is likely too easy of a footgun to activate.

cc #2779, #2728

@seanmonstar seanmonstar force-pushed the io-driver-intrusive branch 4 times, most recently from 3dffee1 to b26ed76 Compare September 10, 2020 22:19
@seanmonstar seanmonstar force-pushed the io-driver-intrusive branch 11 times, most recently from ee8a4b6 to 9ce4e61 Compare September 11, 2020 01:19
@seanmonstar seanmonstar force-pushed the io-driver-intrusive branch 5 times, most recently from 2c4f851 to 1faa353 Compare September 12, 2020 02:31
@Darksonn Darksonn added A-tokio Area: The main tokio crate M-io Module: tokio/io M-net Module: tokio/net labels Sep 14, 2020
Copy link
Member

@carllerche carllerche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me 👍 Great job. I appreciate keeping the PR as minimal as possible as well.

I had some super minor nits I posted inline, nothing is a blocker.

I would like to get another reviewer, especially on the unsafe bits. I tagged a few others on the PR.

tokio/src/io/driver/scheduled_io.rs Show resolved Hide resolved
tokio/src/io/driver/scheduled_io.rs Show resolved Hide resolved
@@ -150,11 +150,11 @@ jobs:
run: cargo install cargo-hack

- name: check --each-feature
run: cargo hack check --all --each-feature -Z avoid-dev-deps
run: cargo hack check --all --each-feature --skip io-driver,io-readiness -Z avoid-dev-deps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only io-driver or io-readiness are enabled, it results in a bunch of unused warnings. The other option is to remove the features and tag all the related code with the cfg(any(..)) matrix of things that need the io-driver stuff.

tokio/src/net/udp/socket.rs Outdated Show resolved Hide resolved
#[cfg(feature = "io-driver")]
#[allow(unused)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably can delete it then.

@hawkw
Copy link
Member

hawkw commented Sep 21, 2020

It would definitely be nice to see a performance comparison --- I imagine this could make some cases faster but it would be nice to have some measurement rather than guesses.

Copy link
Contributor

@Darksonn Darksonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not looked at the entire thing, but focused on the parts with unsafe.

tokio/src/net/tcp/stream.rs Outdated Show resolved Hide resolved
tokio/src/net/unix/stream.rs Outdated Show resolved Hide resolved
Comment on lines +422 to +430
return Poll::Pending;
}

// Explicit drop of the lock to indicate the scope that the
// lock is held. Because holding the lock is required to
// ensure safe access to fields not held within the lock, it
// is helpful to visualize the scope of the critical
// section.
drop(waiters);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is correct, but the return seems a bit confusing considering the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which return is confusing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the return Poll::Pending? I think it is ok as return clearly indicates the mutex goes away.

Comment on lines +435 to +436
// Safety: State::Done means it is no longer shared
let w = unsafe { &mut *waiter.get() };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to verify this by following the code around, but found it non-obvious. It is still in the intrusive linked list as far as I can see?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside ScheduledIo::wake, at line 216, the waiters are drained out of the list when notified.

tokio/src/io/driver/scheduled_io.rs Show resolved Hide resolved
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me overall! I didn't see any obvious issues with the new implementation beyond a concern about leaving a pointer dangling if the mutex is poisoned.

tokio/src/io/driver/scheduled_io.rs Show resolved Hide resolved
tokio/Cargo.toml Show resolved Hide resolved
tokio/src/io/driver/scheduled_io.rs Show resolved Hide resolved
tokio/src/io/driver/scheduled_io.rs Show resolved Hide resolved
tokio/src/io/driver/scheduled_io.rs Show resolved Hide resolved
tokio/src/io/driver/scheduled_io.rs Outdated Show resolved Hide resolved
tokio/src/io/driver/scheduled_io.rs Outdated Show resolved Hide resolved
tokio/src/io/driver/scheduled_io.rs Show resolved Hide resolved

impl Drop for Readiness<'_> {
fn drop(&mut self) {
let mut waiters = self.scheduled_io.waiters.lock().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since removing the waiter from the list is necessary to maintain safety invariants, i think we should change this to use LockResult::into_inner to extract a mutex guard even if the lock was poisoned. if we don't ensure it's removed from the list, we'll leave the list with a potentially dangling pointer that is assumed to be valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of relying on Drop for safety is scary and probably wrong...

Also, I borrowed a lot of this from notify, like:

let mut waiters = notify.waiters.lock().unwrap();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of relying on Drop for safety is scary and probably wrong...

Pin guarantees that the memory location remains valid until it is dropped. If it is not dropped, then the memory must be leaked and remain valid forever.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of relying on Drop for safety is scary and probably wrong...

We're relying on the drop impl to maintain safety regardless of whether or not we ignore mutex poisoning here, it's inherent to the intrusive list that nodes must be unlinked before they are deallocated.

I think Notify also needs to needs to ignore poisoning for the same reason. It might actually matter more in that case since there are probably going to be higher-level users of Notify that send a notification when dropped, potentially including while unwinding...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change should be done in a dedicated PR :) I tracked it here: #2867

For now, unwrap() is fine.

tokio/src/io/driver/scheduled_io.rs Show resolved Hide resolved
Co-authored-by: Eliza Weisman <eliza@buoyant.io>
Co-authored-by: Alice Ryhl <alice@ryhl.io>
@hawkw
Copy link
Member

hawkw commented Sep 22, 2020

Looks like rustfmt and loom CI jobs are sad (the loom one looks like an unnecessary reexport or something?)

@seanmonstar seanmonstar force-pushed the io-driver-intrusive branch 6 times, most recently from ac42251 to f823cf5 Compare September 23, 2020 16:36
@carllerche carllerche merged commit a055784 into master Sep 23, 2020
@carllerche carllerche deleted the io-driver-intrusive branch September 23, 2020 20:02
carllerche added a commit that referenced this pull request Oct 6, 2020
Uses the infrastructure added by #2828 to enable switching
`TcpListener::accept` to use `&self`.

This also switches `poll_accept` to use `&self`. While doing introduces
a hazard, `poll_*` style functions are considered low-level. Most users
will use the `async fn` variants which are more misuse-resistant.

TcpListener::incoming() is temporarily removed as it has the same
problem as `TcpSocket::by_ref()` and will be implemented later.
carllerche added a commit that referenced this pull request Oct 8, 2020
Uses the infrastructure added by #2828 to enable switching
`TcpListener::accept` to use `&self`.

This also switches `poll_accept` to use `&self`. While doing introduces
a hazard, `poll_*` style functions are considered low-level. Most users
will use the `async fn` variants which are more misuse-resistant.

TcpListener::incoming() is temporarily removed as it has the same
problem as `TcpSocket::by_ref()` and will be implemented later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-io Module: tokio/io M-net Module: tokio/net
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants