-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libnative does not handle fds with O_NONBLOCK #13336
Comments
The two solutions I can think of for this that seem reasonable are:
|
Surely we can't be the first project/language that has run in to this. I would be curious what other languages/runtimes do in the face of this error. I don't think that libuv handles this test case specifically, I think it just falls out of the general architecture of libuv. |
@alexcrichton I suspect other projects/languages either don't have an opinion on I have a tentative fix over at kballard/rust/libnative_io_nonblock, although I suspect that hardcoding the values for |
Hm, regardless of the values of the flags, I'm not sure that's the best solution. It's not guaranteed that every file descriptor passed in to libnative wants blocking reads/writes. In theory you could pass one in, and then have a select/epoll loop running somewhere else. So far this problem seems isolated to only the stdout/stderr file descriptors, so I think those are the only ones that should be modified. |
@alexcrichton I don't see how you can restrict it to only stdout/stderr. Those can be dup'd or redirected, so you'd have to have some way, inside of The alternative implementation I considered was to fall back to If, in Rust code, you want to deal with non-blocking I/O, I think you need to roll your own on top of libc. At least, until such time as someone comes up with a good proposal for non-blocking I/O in libnative/libgreen. But for the moment, clients of libnative (and libgreen) expect their |
This update brings a few months of changes, but primarily a fix for the following situation. When creating a handle to stdin, libuv used to set the stdin handle to nonblocking mode. This would end up affect this stdin handle across all processes that shared it, which mean that stdin become nonblocking for everyone using the same stdin. On linux, this also affected *stdout* because stdin/stdout roughly point at the same thing. This problem became apparent when running the test suite manually on a local computer. The stdtest suite (running with libgreen) would set stdout to nonblocking mode (as described above), and then the next test suite would always fail for a printing failure (because stdout was returning EAGAIN). This has been fixed upstream, joyent/libuv@342e8c, and this update pulls in this fix. This also brings us in line with a recently upstreamed libuv patch. Closes #13336 Closes #13355
This update brings a few months of changes, but primarily a fix for the following situation. When creating a handle to stdin, libuv used to set the stdin handle to nonblocking mode. This would end up affect this stdin handle across all processes that shared it, which mean that stdin become nonblocking for everyone using the same stdin. On linux, this also affected *stdout* because stdin/stdout roughly point at the same thing. This problem became apparent when running the test suite manually on a local computer. The stdtest suite (running with libgreen) would set stdout to nonblocking mode (as described above), and then the next test suite would always fail for a printing failure (because stdout was returning EAGAIN). This has been fixed upstream, joyent/libuv@342e8c, and this update pulls in this fix. This also brings us in line with a recently upstreamed libuv patch. Closes rust-lang#13336 Closes rust-lang#13355
This update brings a few months of changes, but primarily a fix for the following situation. When creating a handle to stdin, libuv used to set the stdin handle to nonblocking mode. This would end up affect this stdin handle across all processes that shared it, which mean that stdin become nonblocking for everyone using the same stdin. On linux, this also affected *stdout* because stdin/stdout roughly point at the same thing. This problem became apparent when running the test suite manually on a local computer. The stdtest suite (running with libgreen) would set stdout to nonblocking mode (as described above), and then the next test suite would always fail for a printing failure (because stdout was returning EAGAIN). This has been fixed upstream, joyent/libuv@342e8c, and this update pulls in this fix. This also brings us in line with a recently upstreamed libuv patch. Closes #12827 Closes #13336 Closes #13355
FYI, libuv set stdin to nonblocking, which also set stdout to nonblocking. That is the bug I'm referring to. Flipping flags on a descriptor you don't own will result in bugs (as was just proven by libuv). |
@thestinger Any process expecting @alexcrichton Yes, it sounds like libuv was being stupid. I take it upgrading libuv made it stop doing that? However, stdin/stdout/stderr are expected to be blocking by default, and so I don't see the problem with libnative restoring the blocking behavior if they somehow get set to non-blocking (e.g. by a separate process). |
The error can't be handled; it can only be masked. It is not fixing the underlying problem and will cause hidden breakage elsewhere. When an error happens and there's no sane way to handle it internally, it should be reported. This allows the logic error to be discovered and the source of the bug can be fixed. |
@thestinger No, it can be handled. Turning off non-blocking is not masking. It's the proper solution to an fd that is unexpectedly nonblocking. The fundamental bug lies in whatever process decided to mark a shared file description as non-blocking and expecting it to stick. And there's nothing Rust can do about that. Again, "reporting" the error means telling the client that it needs to give up and consider this an unrecoverable error, which is pretty awful when it is not, in fact, an unrecoverable error. It's a pretty trivially recoverable error, but libnative just refused to recover from it. |
Rust should report the logic error so it can be fixed. The other process is not going to be able to cope with the file descriptor becoming blocking if it expected it to be non-blocking. It will break in mysterious ways and the painful debugging process with be Rust's fault for trying to correct the mistake. Attempting to remove the flag is a race, because a process setting the flag is very likely to keep setting it whenever it does a read/write. There's no way to know how long it will remain unset and the consequences for the other process. The programmer may never encounter the issue because Rust was masking it, and it could very easily cause data loss for users. |
@thestinger Why do you keep saying this? It's not an error in Rust. Trying to "report" it does not allow for any possible way to fix it. The bug is in whatever other process marked the file description as non-blocking. What you're insisting on doing is creating a bug in Rust in an attempt to try and avoid breaking an already-broken unknown separate program. |
For reference, not fixing this in Rust says absolutely nothing about whether another program will decide to turn off |
It could very well be an error in the Rust process. It's perfectly sane to intentionally share a non-blocking file descriptor with a child you always control.
That's why Rust should report an error as soon as possible. It found a horrible problem needing attention from a programmer. There's really no limit to what could go wrong if it's left masked. |
It doesn't really matter where the bug originates. If Rust is interacting with a program that's buggy over something like a pipe, it's a bug in the Rust program if they're unable to work correctly together. The other program may even document that the child receives a non-blocking file descriptor and needs to handle it correctly. |
@thestinger I cannot possibly fathom how you can believe that a third-party process that sets a shared file description as non-blocking can be considered a bug in Rust. But that's exactly what you're proposing. What's more, your proposed resolution, of "reporting" the bug, leaves the client code with no possible solution aside from discarding the file descriptor entirely as if it has an unrecoverable error. There is no sane world in which a bug in an arbitrary third-party program can possibly be fixed by your program written in Rust receiving this error report. And there's no sane world in which makes sense to introduce bad behavior in your Rust program as a result of a bug in a third-party program which you can diagnose and fix but intentionally chose not to. That just doesn't make sense. The third-party program is already fundamentally broken. Why do you insist on propagating the bug into Rust?
Besides the fact that pipes don't have this bug (FIFOs and TTYs do, but not pipes), this makes no sense. CLI programs interacting with each other over some file descriptor channel do not use the non-blocking nature of the fd as part of the protocol, only the actual contents of the sent data. The only scenario you've mentioned that makes any sense at all is intentionally sharing a non-blocking fd with a spawned child, but I don't see how that's relevant. Rust does not support fd-based non-blocking I/O. Period. Hopefully we will have a solution to non-blocking I/O at some point, but we don't today, and libnative's current implementation of I/O is very explicitly blocking. If you take this shared fd and hand it to libnative, then you are very explicitly asking for blocking behavior. As such, it is reasonable to assume that libnative will turn off In fact, it would be perfectly reasonable for libnative, upon receiving a new fd, to immediately check the status flags of the fd and turn off |
Fundamentally, non-blocking I/O in POSIX is completely broken. Rust's goal here should be to produce the expected behavior for the Rust program. This behavior means blocking I/O when using libnative. It does not matter if this exposes a bug in a third-party program. That program was already broken, it just didn't know it yet. |
It's fine to set O_NONBLOCK on a shared file descriptor when all processes that are going to be involved are designed to handle it. That's why Rust should report an error when the file descriptor is not set up as it's expected to be. It's either a bug in the process setting the flag or a bug in the Rust program, but it's certainly a logic error and should be reported as such. It's not a runtime error Rust should attempt to recover from. |
@thestinger I still don't understand this argument. What do you expect the caller to do when libnative returns the error? There is literally no way for the calling program to handle this error gracefully. As I have said before, I consider the bug to be in the program that set |
The Rust program could be the buggy one if it's expected that the shared file description is O_NONBLOCK. It's only wrong to set it if there are processes not under your control sharing the file descriptor. |
You've said that before, and I truly do not understand what you are trying to say. We're not talking here about fds that are being given to a Rust program either through forking, or through a unix socket. The only way for those fds to end up being touched by libnative is if you explicitly wrap them in a No, what we are talking about is file descriptions that are shared by virtue of being either a tty or a fifo (and it's plausible there's another type of file that behaves like this as well), that end up hooked up to the Rust programs stdin, stdout, or stderr. There is no circumstance in which it is justifiable to impose In any case, even talking about whether |
It's being enforced by reporting an error if the file descriptor has incorrect flags and requiring that the calling code handle the error via the unused result lint. Setting this for stdin, stdout or stderr is not really wrong if you have control of both ends and can guarantee there's nothing expecting it to be blocking. I don't really see how it's a bug as-is. It's a decision to report the error instead of trying to fix it by changing |
@thestinger What part of "the calling code has no possible way of handling the error" are you not understanding? The only way the calling code can possibly handle the error is if it's using
The only way this is correct is if you're using a FIFO, which is extremely rare. And if you are using a FIFO, and have control over both ends, then you're not even going to run into this problem in the first place because you won't be using libnative (because libnative is blocking I/O and you just stated that you're intentionally using non-blocking). Since you're not using libnative in this scenario, it's not at all applicable to this issue. The reason why that's the only way this is correct is because pipes don't exhibit this issue (pipes ends do not share their file description), and it's unambiguously a bug to set a tty to
I've stated repeatedly that the process that did this is buggy by definition. Causing the Rust program to be buggy too in an attempt to avoid breaking the already-buggy third-party program makes no sense to me. Why do you think that's a rational thing to do? |
I think it's rational to report an error when there's a logic error, rather than trying to handle it. This is always a programming flaw rather than a runtime error so I don't think attempting to handle it silently is a good idea.
This problem occurs when the programmer of the application on either end (often the same person) makes a mistake. The mistake could very well be using libnative instead of functions that are going to work with O_NONBLOCK. It could also be setting O_NONBLOCK without understanding the shared file descriptor issue, and reporting an error ASAP seems best. The process on the other end could lose data if this isn't fixed so catching it early is good. The Rust program will be buggy regardless of how this is dealt with, since it's not correctly communicating with the other process(es). There's a choice between potentially fixing the issue (but likely breaking another process) and reporting it loudly. |
I agree. The issue is that there is not a logic error here, at least not in the Rust program. Trying to report the error here means intentionally breaking the API contract that libnative provides, which is blocking I/O. And since there is no logic error in the calling code, there is no justification whatsoever for why libnative chose to break its guarantee of blocking I/O. There is a programming flaw, but the flaw is in the third-party program (the one that is presumably not written in Rust because Rust doesn't provide non-blocking I/O). Because the flaw is in the third-party program, trying to report an error in the current Rust program is completely nonsensical.
The phrase I've emphasized seems to be the crux of the problem. You seem to assume that the same programmer wrote both programs. But you've provided zero justification for this assumption, and it's one that I find to be extremely implausible. It is exceedingly rare that two programs run in the same terminal were written by the same author. Why are you making the assumption that this is the case? And even if it were the case, why does that even matter? A bug in program A being reported in program B, even if written by the same person, does not make sense. program B doesn't have the bug, so why is it intentionally reporting an otherwise-fixable error? The proper behavior is for program B to recover from the error, which will expose the bug in program A (the legitimately buggy program). Doing anything else is not only wrong, it also hinders proper diagnosis of the bug. Of course, since the programs were in fact almost certainly not written by the same programmer, and in fact are likely to be running on some user's machine instead of the programmer's machine, attempting to report the error in the Rust program does nothing but make that program unusable, and cause the user to (rightly) believe that the Rust program is buggy. |
I don't think we're going to get anywhere here, and there's not going to be enough input from other people. Can you file an RFC instead? |
I'm pulling a massive triage effort to get us ready for 1.0. As part of this, I'm moving stuff that's wishlist-like to the RFCs repo, as that's where major new things should get discussed/prioritized. This issue has been moved to the RFCs repo: rust-lang/rfcs#682 |
Suggest `Option<&T>` instead of `&Option<T>` closes rust-lang#13054 ```rust // bad code fn foo(a: &Option<T>) {} fn bar(&self) -> &Option<T> {} // Use instead fn foo(a: Option<&T>) {} fn bar(&self) -> Option<&T> {} ``` Handles argument types and return types in functions, methods, and closures with explicit types. Honors `avoid_breaking_exported_api` parameter. See this great [YouTube video](https://www.youtube.com/watch?v=6c7pZYP_iIE) with the in-depth explanation. ### Open Questions These are not blocking, and could be done in separate PRs if needed. * [ ] Should `&Option<Box<T>>` be suggested as `Option<&T>` -- without the box? Handled by [clippy::borrowed_box](https://rust-lang.github.io/rust-clippy/master/index.html#/borrowed_box) * [ ] Should `&Option<String>` be suggested as `Option<&str>` -- using de-refed type? ### Possible Future Improvements These cases might also be good to handle, probably in a separate PR. ```rust fn lambdas() { let x = |a: &Option<String>| {}; let x = |a: &Option<String>| -> &Option<String> { todo!() }; } fn mut_ref_to_ref(a: &mut &Option<u8>) {} ``` changelog: [`ref_option`]: Suggest `Option<&T>` instead of `&Option<T>`
Convert `&Option<T>` to `Option<&T>` Run `ref_option` (rust-lang#13336) on the Clippy's own code, quiet a few hits. Per mentioned video, this may actually improve performance as well. Switch lint to `pedantic` ---- changelog: [`ref_option`]: upgrade lint to `pedantic`
libnative does not detect when a read/write fails due to
O_NONBLOCK
being set on the fd. It makes the assumption that all of its files never have that flag set, because it never sets that flag on them. Unfortunately, this isn't necessarily the case. FIFOs and character device files (e.g. terminals) will actually share theO_NONBLOCK
flag among all processes that have the same open file description (e.g. the underlying kernel object that backs the fd).Using a tiny C program that uses
fcntl()
to setO_NONBLOCK
on its stdout, a FIFO, and a Rust program that writes 32k of output to stdout, I can reproduce this issue 100% of the time. The invocation looks likeand on the reading side I just do
This causes the rust program to return a "Resource temporarily unavailable" error from
stdout.write()
after writing 8k of output. Removing the call to./mknblock
restores the expected behavior where the rust program will block until the reading side has started consuming input. And further, switching the rust program over to libgreen also causes it to block even with./mknblock
.The C program looks like this:
The Rust program is a bit longer, mostly because it prints out information about stdout before it begins writing. It looks like this:
The text was updated successfully, but these errors were encountered: