-
Notifications
You must be signed in to change notification settings - Fork 732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows TCP implementation hits unreachable code #921
Comments
Sorry for the delay. Hitting that code path indicates a bug. It has
been a long time since I worked in that part of the code and nothing
stands out. At this point, the windows implementation is due for a
rewrite (tokio-rs/gsoc#3), so I probably
won't be able to spend time debugging this. If you discover the issue,
I would be happy to accept a PR & release.
Cheers,
Carl
…On Wed, Mar 20, 2019 at 11:04 PM Robert Ying ***@***.***> wrote:
We've noticed that sometimes (very, very rarely), we get crashes on Windows which are caused by this unreachable! statement here:
https://github.com/carllerche/mio/blob/master/src/sys/windows/tcp.rs#L571
It's nonobvious to me why we would ever end up in there, but immediately prior to that we have some logs which reference WSAECONNRESET -- is there a possibility that in the connection reset case that the mio implementation will incorrectly enter write_done ?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@rbtying Would you reproduce the crush case or provide the test case? More information is definitely needed for clarify the situation here, since the crashes is very rare. |
@FXTi @carllerche thanks for the responses! I'm trying to narrow down the crash conditions, but this is coming from a fairly complicated app and we have pretty limited telemetry. I'll let you know if I can get a repro! |
This adds support for performing non-blocking network operations, such as reading and writing to/from a socket. The runtime API exposed is similar to Erlang, allowing one to write code that uses non-blocking APIs without having to resort to using callbacks. For example, in a typicall callback based language you may write the following to read from a socket: socket.create do (socket) { socket.read do (data) { } } In Inko, you instead would (more or less) write the following: import std::net::socket::TcpStream let socket = try! TcpStream.new(ip: '192.0.2.0', port: 80) let message = try! socket.read_string(size: 4) The VM then takes care of using the appropriate non-blocking operations, and will reschedule processes whenever necessary. This functionality is exposed through the following runtime modules: * std::net::ip: used for parsing IPv4 and IPv6 addresses. * std::net::socket: used for TCP and UDP sockets. * std::net::unix: used for Unix domain sockets. The VM uses the system's native polling mechanism to determine when a file descriptor is available for a read or write. On Linux we use epoll, while using kqueue for the various BSDs and Mac OS. For Windows we use wepoll (https://github.com/piscisaureus/wepoll). Wepoll exposes an API that is compatible with the epoll API, but uses Windows IO completion ports under the hoods. When a process attempts to perform a non-blocking operation, the process is registered (combined with the file descriptor to poll) in a global poller and suspended. When the file descriptor becomes available for a read or write, the corresponding process is rescheduled. The polling mechanism is set up in such a way that a process can not be rescheduled multiple times at once. We do not use MIO (https://github.com/tokio-rs/mio), instead we use epoll, kqueue, and wepoll (using https://crates.io/crates/wepoll-binding) directly. At the time of writing, while MIO offers some form of support for Windows it comes with various issues: 1. tokio-rs/mio#921 2. tokio-rs/mio#919 3. tokio-rs/mio#776 4. tokio-rs/mio#913 It's not clear when these issues would be addressed, as the maintainers of MIO appear to not have the experience and resources to resolve them themselves. MIO is part of the Google Summer of Code 2019, with the goal of improving Windows support. Unfortunately, this likely won't be done before the end of 2019, and we don't want to wait that long. Another issue with MIO is its implementation. Internally, MIO uses various forms of synchronisation which can make it expensive to use a single poller across multiple threads; it certainly is not a zero-cost library. It also offers more than we need, such as being able to poll arbitrary objects. We are not the first to run into these issues. For example, the Amethyst video game engine also ran into issues with MIO as detailed in https://community.amethyst.rs/t/sorting-through-the-mio-mess/561. With all of this in mind, I decided it was not worth the time to wait for MIO to get fixed, and to instead spend time directly using epoll, kqueue, and wepoll. This gives us total control over the code, and allows us to implement what we need in the way we need it. Most important of all: it works on Linux, BSD, Mac, and Windows.
@rbtying Any update on this? |
I honestly haven't had time to dig into this more -- mostly, we don't have
a clean lab repro, but we have panic stacks that point here. Unfortunately
gotten busy with other stuff, though.
…On Sun, Jun 2, 2019, 7:31 AM Thomas de Zeeuw ***@***.***> wrote:
@rbtying <https://github.com/rbtying> Any update on this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#921?email_source=notifications&email_token=AAHALQNWQIXYCELH5YBO3MLPYPKURA5CNFSM4HADCYOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWXW5SY#issuecomment-498036427>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAHALQKGQY52WMMIRVILSYTPYPKURANCNFSM4HADCYOA>
.
|
@rbtying Do you have a crash dump? If you do can you determine what the state is, if it's not pending? It seems like possibly IO cancellation might be involved here. You could try writing to a connection and closing it on the remote side in a loop? |
The Windows implementation is rewritten, so I'm closing this, too bad we didn't find the root cause. |
We've noticed that sometimes (very, very rarely), we get crashes on Windows which are caused by this
unreachable!
statement here:https://github.com/carllerche/mio/blob/master/src/sys/windows/tcp.rs#L571
It's nonobvious to me why we would ever end up in there, but immediately prior to that we have some logs which reference WSAECONNRESET -- is there a possibility that in the connection reset case that the mio implementation will incorrectly enter
write_done
?Edit (by @Thomasdezeeuw):
fixed link to the unreachable statement:
mio/src/sys/windows/tcp.rs
Line 568 in ab099cb
The text was updated successfully, but these errors were encountered: