-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libnative does not handle fds with O_NONBLOCK #9
Comments
The two solutions I can think of for this that seem reasonable are:
|
|
Surely we can't be the first project/language that has run in to this. I would be curious what other languages/runtimes do in the face of this error. I don't think that libuv handles this test case specifically, I think it just falls out of the general architecture of libuv. |
@alexcrichton I suspect other projects/languages either don't have an opinion on I have a tentative fix over at kballard/rust/libnative_io_nonblock, although I suspect that hardcoding the values for |
Hm, regardless of the values of the flags, I'm not sure that's the best solution. It's not guaranteed that every file descriptor passed in to libnative wants blocking reads/writes. In theory you could pass one in, and then have a select/epoll loop running somewhere else. So far this problem seems isolated to only the stdout/stderr file descriptors, so I think those are the only ones that should be modified. |
@alexcrichton I don't see how you can restrict it to only stdout/stderr. Those can be dup'd or redirected, so you'd have to have some way, inside of The alternative implementation I considered was to fall back to If, in Rust code, you want to deal with non-blocking I/O, I think you need to roll your own on top of libc. At least, until such time as someone comes up with a good proposal for non-blocking I/O in libnative/libgreen. But for the moment, clients of libnative (and libgreen) expect their |
This is not fixed. #13619 may fix a common way to trigger the issue, but my test code reproduces the issue without requiring libgreen. |
Sorry, I meant to comment on this soon after the PR landed, and I was in a meeting. I do not believe that this falls under the prerogative of libnative. Dealing with nonblocking file descriptors is clearly a tricky situation (as seen with libuv) that I do not believe is up to us. Only owned file descriptors should have the nonblocking bits fiddled with, and all of the file descriptors we own are guaranteed to be blocking. |
@alexcrichton Why do you believe this is something libnative shouldn't handle? As I've demonstrated, certain file descriptors are shared among multiple processes, which means that libnative could encounter a nonblocking fd even when it thinks it owns the fd (e.g. if you open a fifo, the other process using that fifo can mark it nonblock).
And this is precisely why I believe so strongly that libnative must deal with this issue. libnative guarantees that file descriptors it manages are done so in a blocking fashion. Therefore, when it encounters a file descriptor that has been marked as nonblock by another process, it must deal with this, or it will break its own guarantee. |
The code you're proposing is the exact code that caused the bug from libuv. That, plus the overwhelming precedence for not handling this is why I believe this is unnecessary. In summary: Yes, this can happen. No, we should not handle it specially. Our behavior now, returning that the I/O operation failed, is correct in my opinion. |
A program setting O_NONBLOCK on a file descriptor shared with processes it doesn't control is buggy. There's no sane way to deal with it, since removing it will break a guarantee expected by some other process. Trying to remove it and add it back before anyone notices is a racy hack. https://lkml.org/lkml/2007/8/14/135 @alexcrichton: I agree that considering it an error is the best solution. Does it actually return EAGAIN as the error so you know there's another buggy process involved? |
How can that possibly be true? My code is removing the As for the overwhelming precedence for not handling this, I think that the languages you've been looking at are wrong. Additionally I don't think they chose to not handle this, I think they just aren't aware the bug exists in the first place. Plus, I believe at least some of them surface the underlying @thestinger The POSIX non-blocking I/O stuff is horribly broken. There's no way around that. Any process that relies on An alternative fix would be to use |
Turning it off is masking a very real programming error instead of uncovering the problem. The process expecting O_NONBLOCK to be there is now going to be screwed up, without any indication of what's going on. It's far better to report the error. It could easily be set again before you actually perform a read or write to the file descriptor... it's best to report the error while you can. |
FYI, libuv set stdin to nonblocking, which also set stdout to nonblocking. That is the bug I'm referring to. Flipping flags on a descriptor you don't own will result in bugs (as was just proven by libuv). |
@thestinger Any process expecting @alexcrichton Yes, it sounds like libuv was being stupid. I take it upgrading libuv made it stop doing that? However, stdin/stdout/stderr are expected to be blocking by default, and so I don't see the problem with libnative restoring the blocking behavior if they somehow get set to non-blocking (e.g. by a separate process). |
The error can't be handled; it can only be masked. It is not fixing the underlying problem and will cause hidden breakage elsewhere. When an error happens and there's no sane way to handle it internally, it should be reported. This allows the logic error to be discovered and the source of the bug can be fixed. |
@thestinger No, it can be handled. Turning off non-blocking is not masking. It's the proper solution to an fd that is unexpectedly nonblocking. The fundamental bug lies in whatever process decided to mark a shared file description as non-blocking and expecting it to stick. And there's nothing Rust can do about that. Again, "reporting" the error means telling the client that it needs to give up and consider this an unrecoverable error, which is pretty awful when it is not, in fact, an unrecoverable error. It's a pretty trivially recoverable error, but libnative just refused to recover from it. |
Rust should report the logic error so it can be fixed. The other process is not going to be able to cope with the file descriptor becoming blocking if it expected it to be non-blocking. It will break in mysterious ways and the painful debugging process with be Rust's fault for trying to correct the mistake. Attempting to remove the flag is a race, because a process setting the flag is very likely to keep setting it whenever it does a read/write. There's no way to know how long it will remain unset and the consequences for the other process. The programmer may never encounter the issue because Rust was masking it, and it could very easily cause data loss for users. |
@thestinger Why do you keep saying this? It's not an error in Rust. Trying to "report" it does not allow for any possible way to fix it. The bug is in whatever other process marked the file description as non-blocking. What you're insisting on doing is creating a bug in Rust in an attempt to try and avoid breaking an already-broken unknown separate program. |
For reference, not fixing this in Rust says absolutely nothing about whether another program will decide to turn off |
It could very well be an error in the Rust process. It's perfectly sane to intentionally share a non-blocking file descriptor with a child you always control.
That's why Rust should report an error as soon as possible. It found a horrible problem needing attention from a programmer. There's really no limit to what could go wrong if it's left masked. |
It doesn't really matter where the bug originates. If Rust is interacting with a program that's buggy over something like a pipe, it's a bug in the Rust program if they're unable to work correctly together. The other program may even document that the child receives a non-blocking file descriptor and needs to handle it correctly. |
@thestinger I cannot possibly fathom how you can believe that a third-party process that sets a shared file description as non-blocking can be considered a bug in Rust. But that's exactly what you're proposing. What's more, your proposed resolution, of "reporting" the bug, leaves the client code with no possible solution aside from discarding the file descriptor entirely as if it has an unrecoverable error. There is no sane world in which a bug in an arbitrary third-party program can possibly be fixed by your program written in Rust receiving this error report. And there's no sane world in which makes sense to introduce bad behavior in your Rust program as a result of a bug in a third-party program which you can diagnose and fix but intentionally chose not to. That just doesn't make sense. The third-party program is already fundamentally broken. Why do you insist on propagating the bug into Rust?
Besides the fact that pipes don't have this bug (FIFOs and TTYs do, but not pipes), this makes no sense. CLI programs interacting with each other over some file descriptor channel do not use the non-blocking nature of the fd as part of the protocol, only the actual contents of the sent data. The only scenario you've mentioned that makes any sense at all is intentionally sharing a non-blocking fd with a spawned child, but I don't see how that's relevant. Rust does not support fd-based non-blocking I/O. Period. Hopefully we will have a solution to non-blocking I/O at some point, but we don't today, and libnative's current implementation of I/O is very explicitly blocking. If you take this shared fd and hand it to libnative, then you are very explicitly asking for blocking behavior. As such, it is reasonable to assume that libnative will turn off In fact, it would be perfectly reasonable for libnative, upon receiving a new fd, to immediately check the status flags of the fd and turn off |
Fundamentally, non-blocking I/O in POSIX is completely broken. Rust's goal here should be to produce the expected behavior for the Rust program. This behavior means blocking I/O when using libnative. It does not matter if this exposes a bug in a third-party program. That program was already broken, it just didn't know it yet. |
It's fine to set O_NONBLOCK on a shared file descriptor when all processes that are going to be involved are designed to handle it. That's why Rust should report an error when the file descriptor is not set up as it's expected to be. It's either a bug in the process setting the flag or a bug in the Rust program, but it's certainly a logic error and should be reported as such. It's not a runtime error Rust should attempt to recover from. |
@thestinger I still don't understand this argument. What do you expect the caller to do when libnative returns the error? There is literally no way for the calling program to handle this error gracefully. As I have said before, I consider the bug to be in the program that set |
The Rust program could be the buggy one if it's expected that the shared file description is O_NONBLOCK. It's only wrong to set it if there are processes not under your control sharing the file descriptor. |
You've said that before, and I truly do not understand what you are trying to say. We're not talking here about fds that are being given to a Rust program either through forking, or through a unix socket. The only way for those fds to end up being touched by libnative is if you explicitly wrap them in a No, what we are talking about is file descriptions that are shared by virtue of being either a tty or a fifo (and it's plausible there's another type of file that behaves like this as well), that end up hooked up to the Rust programs stdin, stdout, or stderr. There is no circumstance in which it is justifiable to impose In any case, even talking about whether |
It's being enforced by reporting an error if the file descriptor has incorrect flags and requiring that the calling code handle the error via the unused result lint. Setting this for stdin, stdout or stderr is not really wrong if you have control of both ends and can guarantee there's nothing expecting it to be blocking. I don't really see how it's a bug as-is. It's a decision to report the error instead of trying to fix it by changing |
@thestinger What part of "the calling code has no possible way of handling the error" are you not understanding? The only way the calling code can possibly handle the error is if it's using
The only way this is correct is if you're using a FIFO, which is extremely rare. And if you are using a FIFO, and have control over both ends, then you're not even going to run into this problem in the first place because you won't be using libnative (because libnative is blocking I/O and you just stated that you're intentionally using non-blocking). Since you're not using libnative in this scenario, it's not at all applicable to this issue. The reason why that's the only way this is correct is because pipes don't exhibit this issue (pipes ends do not share their file description), and it's unambiguously a bug to set a tty to
I've stated repeatedly that the process that did this is buggy by definition. Causing the Rust program to be buggy too in an attempt to avoid breaking the already-buggy third-party program makes no sense to me. Why do you think that's a rational thing to do? |
This was just a test for issue importing... |
wat |
sorry for the spam :-( I'll be moving some issues from rust to rfcs and this was a test run. |
Saturday Apr 05, 2014 at 07:13 GMT
Originally opened as rust-lang/rust#13336
libnative does not detect when a read/write fails due to
O_NONBLOCK
being set on the fd. It makes the assumption that all of its files never have that flag set, because it never sets that flag on them. Unfortunately, this isn't necessarily the case. FIFOs and character device files (e.g. terminals) will actually share theO_NONBLOCK
flag among all processes that have the same open file description (e.g. the underlying kernel object that backs the fd).Using a tiny C program that uses
fcntl()
to setO_NONBLOCK
on its stdout, a FIFO, and a Rust program that writes 32k of output to stdout, I can reproduce this issue 100% of the time. The invocation looks likeand on the reading side I just do
This causes the rust program to return a "Resource temporarily unavailable" error from
stdout.write()
after writing 8k of output. Removing the call to./mknblock
restores the expected behavior where the rust program will block until the reading side has started consuming input. And further, switching the rust program over to libgreen also causes it to block even with./mknblock
.The C program looks like this:
The Rust program is a bit longer, mostly because it prints out information about stdout before it begins writing. It looks like this:
The text was updated successfully, but these errors were encountered: