-
Notifications
You must be signed in to change notification settings - Fork 739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poll selector #1602
Poll selector #1602
Conversation
Before starting to work on large things like this, please open an issue first explain why you need this. |
This started out as an experiment to see if it even was possible - turns out, it is and works pretty well. I thought I'd PR that for a review - that being said, if you still want me to open an issue I can do so (though I guess that would mostly be a copy'n paste of the PR description) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I'm a huge fan of this PR, since it will go a long way towards supporting several currently unsupported systems, including the most important system: the Nintendo 3DS.
I have to wonder: should we also support the select()
libc function, for the systems that don't comply with POSIX?
Also, I'd at least have some note in there about how the implementation is partially derived from polling
.
use std::time::Duration; | ||
use std::{fmt, io}; | ||
|
||
#[cfg(all( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't repeat this cfg
block so often; I'd follow along with what the rest of mio
does and create a macro that encapsulates the cfg
.
|
||
/// Interface to poll. | ||
#[derive(Debug)] | ||
struct SelectorState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be remembering this wrong, but didn't the polling
implementation also have an optional timerfd
object to support sub-millisecond timeouts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've copy & pasted most of this straight from polling
, which doesn't have a timerfd
. That being said, adding a timerfd
is certainly possible. I however would like to not introduce it in this PR as to not further increase complexity.
For the time being you can technically add your own timerfd
just fine to the registry and supply a None
timeout.
I thought about this as well, my reasoning for not using Fun fact: the platform I'm specifically targetting with this PR implements So if you need this right now, you can also just implement |
Yes please open an issue as we'll mostly need to figure out non-code stuff. Things such as who's going to maintain this, CI, OS targets, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honesty I'm not really a fan of the implementation. It just doesn't fit Mio's model very well.
Also just mentioned a couple of time you copy and pasted this from the polling
crate? If so, we can't accept that without a copyright notice. So, we'll likely simply not accept the code.
@@ -42,6 +42,10 @@ os-ext = [ | |||
# Enables `mio::net` module containing networking primitives. | |||
net = [] | |||
|
|||
# Forces the use of the poll system call instead of epoll on systems | |||
# where both are available | |||
force-old-poll = ["os-poll"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not going to add a feature for this or support poll
on any OS that has something better (e.g. epoll
or kqueue
). Mio is complex enough as-is it maintain with the matrix OSs architectures we're not getting selectors in the mix as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also be removed once this PR is ready, I currently use this to develop the poll
selector on Linux because development on Haiku and the espidf is quite hard. I still run the tests for Haiku, but this allows me to iron out bugs on a system which is easy to develop for.
TL;DR: Will be removed when PR is ready
Yes. I haven't yet added a copyright notice, because I'm not exactly sure how it is supposed to look. Both libraries use the MIT license ( |
That's because all files have the same copyright, so it's not needed to add one to each file.
No, I'm afraid it likely means this code will not be accepted. |
Well, as the sole contributor to |
That solves that then. 👍 |
@Janrupf are you still working on this? |
Not really - the implementation itself was (and still is) working, but I have no way to proof it using a CI pipeline or something along those lines (at least not with the constraints previously set by the maintainers). There are now a bunch of merge conflicts as you can see. I'm using an implementation of this in a production environment where it has been running on a few hundred embedded devices without any issues, but that's about as good as it gets. If you have interest on picking this up or getting it merged, I'm willing to help. |
We test it in |
The problem with that is that I'm not willing to maintain this implementation on any OS that supports something "better", e.g. epoll on Linux (better might be debatable here, but let's not). Having 1 implementation per OS is enough already. |
On the following month I'll try implementing CI for one of the poll-only OSes then. |
Hey, I've managed to compile and run the tests using qemu, but it takes a lot longer to run. Even if we optimize and cache in a lot of different ways, like building our own haiku image with rust pre-installed and running From the issue #1682, it looks like you're willing to reconsider unsing a cfg exclusively to run tests on linux? We could have a |
@pheki (see first part of your comment below)
I've documented my thoughts on it in #1684, but yes let's continue with a
I still like to see some kind of Haiku support in the CI if we want to officially support it, even if it's just a Thank you for looking to this. |
I'm interested in throwing my hat in the ring here to try to help get this over the line (and with luck I can get some other esp-rs folks onboard!). Before I get too far though I want to double check that I understand the issues, which AFAICT comes down to two missing requirements:
If this understanding is correct, I think the major issue we're facing right now is that the kinds of platforms that want the poll support aren't well set up for efficient, low cost automation which of course makes (1) difficult. Separately, (2) is complicated by the fact that these platforms don't tend to be stable over long periods of time and many dozens of such platforms have come and gone while Linux persists. This doesn't mean they don't deserve support, it just means that the overall community of people is likely going to be a tiny fraction of Linux. Combine that with the fact that the Rust community is a small fraction of developers as a whole and in practice you can never expect to find a large group of committed maintainers. An unfortunate situation for both tokio maintainers and the communities that would benefit from their great work. I think we can find good solutions and move forward though, hopefully others agree :) Let's take a closer look at (1) first. As an example, esp-idf requires a lot of extra build support and tools (e.g. https://github.com/esp-rs/embuild) as well as a somewhat unsupported qemu fork (https://esp-rs.github.io/book/tooling/simulating/qemu.html) to run the code. I'm confident that we could get this off the ground to run in github's CI, but I doubt very much that we'd ever be able to get the performance good enough to run within a couple of minutes. Realistically I expect this will be 10+ minutes as @pheki has discovered with Haiku as well. Most of the embedded community chooses instead to write platform agnostic code that can be tested on the host, as is the case with this PR being able to run on Linux, then move the rest to run on real hardware that requires significant overhead and cost to maintain in a CI infrastructure. Even companies with deep pockets do it this way just simply because it can become a real money pit to try to get $6 devices to ever be as robust and reliable as Linux rackmount servers (trust me, I tried to do this at Meta for quite some time hehe). While solutions do exist, I do think we need to maybe reexamine how flexible the requirements really are. For example, we are already in the business of pragmatic support for platforms like Android and iOS despite being unable to test those reliably in GitHub CI as well for much the same reasons. I'd propose that we're relying on the fact that a good number of supported targets are "close enough" to FreeBSD, Linux, Windows, and OS X that testing on those should be sufficient to cover the rest. The real issue with (1) then is that there are no platforms currently that actually would use the poll code path and so the tests for this code feel arbitrary and contrived to use Linux just because it technically supports poll. I agree here (and actually I agree with everything @Thomasdezeeuw has been saying), but consider for a moment that what we're really testing here is that Linux, a POSIX-compliant platform, should work the same as any other POSIX-compliant platform, in the same way we're saying that epoll on any obscure configuration of Linux should work the same as our desktop or rackmount server Linux configurations. Definitely not literally true, but true enough to say we've done our due diligence by relying on reliable, stable standards and doing automated testing against a well known and correct implementation. This actually makes the case stronger for using Linux with poll instead of, say, esp-idf's newlib poll which might be buggy and not a good proxy for, say, Haiku. For (2), I personally think we can find strong partners in the embedded community, and in particular esp-rs which is a large and growing community with committed maintainers across a number of components. I initially assumed that the gap to get tokio support for esp32 was so large that it wouldn't be practical even technically but honestly @Janrupf 's work will likely inspire others to step up and commit to getting this work landed and maintain it going forward, myself included. So I guess all that novel was really to say, what exactly do I need to do to get this landed? I see the PR is missing some of the cfg changes you proposed, and a few things could use tidying per your review. The CI checks are currently failing so obviously we gotta fix that. Is there anything else that's insufficient? What more do we need to convince tokio maintainers that we've got the right long term support? I know that for example Rust async has a big push from Espressif as evidenced by Scott Mabin's blog posts: https://mabez.dev/blog/posts/esp-rust-30-06-2023/. Probably not a stretch to imagine I could convince the esp-rs maintainers to commit to having tokio support on the roadmap... |
Hi @jasta, I just want to add as a note that I'm in fact using this with tokio on the ESP32. I created a support PR on the Tokio repository some time ago, which should contain the changes neccessary to also get tokio to run this. Once mio works as expected it's only a matter of compiling Tokio with the right features. And thanks a bunch for the long comment, I really appreciate it! Edit: I'm not home at the moment, but I may have unpushed changes which might make further improvements. I'll check once I'm hone |
Ideally the tests don't need to be modified (except for maybe adding a
This is pretty important, but mostly for Haiku or other OS(s) that I don't use. Basically if something breaks on Haiku I want to ask somewhere with at least the OS running to help. With the
This is spot on, we all have limited time so for me (other another Mio maintainer) to spend hours trying to setup a Haiku (or other) machine just to fix an issue is quite a tough ask. That is why (2) is important, for people with Haiku already running it can be a 2 minute check to at least reproduce an issue.
Agreed, testing on Linux with
👍
From my end:
I think that about it? (note that the list might not seem to long, but I only recently agreed to the |
Ok I'm going to work on the technical aspects of this first:
Once I'm ready I'll submit a new PR and then start socializing the maintenance issues a bit with the esp-rs folks. I'm feeling really good that we can get this over the line soon-ish! |
I noticed a couple of bugs in the poll implementation while doing this work that result directly from inconsistencies to how the Windows implementation works. For example, when poll yields POLLHUP the edge-triggered IoSourceState doesn't work properly and then the caller just repeatedly gets spammed POLLHUP because the fd is not deregistered internally. The Windows implementation solves this by using a Mutex and shared state between IoSourceState and the Selector, so I think I'll just further copy that behaviour. This will fix a failing test I'm seeing locally in tcp_stream.rs:tcp_reset_close_event |
Introduces a new backend for UNIX using the level triggered poll() syscall instead of epoll or kqueue. This support is crucial for embedded systems like the esp32 family but also for alternative operating systems like Haiku. This diff does not introduce any new platform support targets itself but provides the core technical implementation necessary to support these other targets. Future PRs will introduce specific platform support however due to reasons outlined in tokio-rs#1602 (many thanks for this initial effort BTW!) it is not possible to automate tests for those platforms. We will instead rely on the fact that existing strong POSIX platforms like Linux can serve as a proxy to prove that the mio code is working nominally.
Only one issue left to fix and I could use your help @Janrupf. Specifically I bumped into: tokio-rs/tokio#5866 which it looks like you should've hit as well unless you were only testing recv_from and send_to APIs or anything else not covered by PollEvented. |
Also see #1611. We could also decide that reading less then |
Introduces a new backend for UNIX using the level triggered poll() syscall instead of epoll or kqueue. This support is crucial for embedded systems like the esp32 family but also for alternative operating systems like Haiku. This diff does not introduce any new platform support targets itself but provides the core technical implementation necessary to support these other targets. Future PRs will introduce specific platform support however due to reasons outlined in tokio-rs#1602 (many thanks for this initial effort BTW!) it is not possible to automate tests for those platforms. We will instead rely on the fact that Linux can serve as a proxy to prove that the mio code is working nominally. Note that only Linux has a sufficiently complex implementation to pass all tests. This is due to SIGRDHUP missing on other platforms and is required for about a dozen or so tests that check is_read_closed().
FWIW, a (very basic, FDs/PROC only) implementation of Above, however, it was noted that some versions of this change caused Haiku to crash. Is that reproducible? Do you have a reproducer, or even just a screenshot of the crash? |
Nice. We still need the
|
I would also hope that Solaris would be able to use poll(2) too. See #1152. |
Introduces a new backend for UNIX using the level triggered poll() syscall instead of epoll or kqueue. This support is crucial for embedded systems like the esp32 family but also for alternative operating systems like Haiku. This diff does not introduce any new platform support targets itself but provides the core technical implementation necessary to support these other targets. Future PRs will introduce specific platform support however due to reasons outlined in #1602 (many thanks for this initial effort BTW!) it is not possible to automate tests for those platforms. We will instead rely on the fact that Linux can serve as a proxy to prove that the mio code is working nominally. Note that only Linux has a sufficiently complex implementation to pass all tests. This is due to SIGRDHUP missing on other platforms and is required for about a dozen or so tests that check is_read_closed().
Superseded by #1687, thanks to everyone who continued this! |
Thanks @Janrupf for the implementation! |
This PR adds support for using the Unix
poll
API as a backend.Why this is required
There are some operating systems out there which only support
poll
(or other API's). The main challenge here was to track the file descriptors, as thepoll
syscall requires all file descriptors at call site. This is solved by keeping a mutex around the state of the selector and thus tracking everything added/removed from it.The main motivation for this PR is to enable support for the esp-idf and it's growing eco system, eventually allowing tokio to fully run on these devices. As the esp-idf only supports
poll
, mio (and by this extent the full feature set of tokio) is unable to run on it. Additionally this benefits OS such as Haiku (see #1472), which also only provide apoll
based API.Does this work?
The short answer is: Yes, it should.
The long answer:
the
polling
crate already uses this, and in fact, the implementation here is largely a copy-and-paste of the one provided bypolling
. There are some notable differences whatsoever:Arc
poll
is level triggered (may be a breaking change!)What did change
In addition to a new a new polling backend, a feature called
force-old-poll
was added in order to allow further testing of thepoll
implementation (this feature is required to force use of thepoll
API, whenever the OS provideskqueue
orepoll
).It also now really is necessary to deregister event sources, else stale events may be fired (the general rules seems to always have been do deregister event sources, tokio's
PollEvented
for example does so as well as all mio examples).The mio
Waker
also gained a new method calleddid_wake
, which needs to be called after a task has been woken up in order to reset the event source (and not wake the task again when polling). This requires a very small patch in tokio which I have ready.Effect on downstream crates
Probably none, but this is still a breaking change (see the notes about
Waker
and the edge vs. level triggered situation).Since the
poll
implementation will virtually not be used anywhere other than new systems, no existing code should break on existing systems. Still, the API did change and does have new requirements.I expect most downstream crates to properly deregister event sources before destruction, but this now is absolutely required to not run into weird race conditions. Additionally, every crate using mio
Waker
s now needs to calldid_wake
after being woken by the Waker.Notes
This needs a lot of testing and feedback from the tokio and mio maintainers. Also thanks to @Kestrer, who implemented large portions of this code for the
polling
crate.