Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using one process's output as another's input creates challenges with non-blocking status #1707

Open
mic006 opened this issue Aug 31, 2020 · 11 comments

Comments

@mic006
Copy link

mic006 commented Aug 31, 2020

When launching piped subprocesses, trio provides the stdin of the 2nd process (at least, I have not checked if others are concerned) as O_NONBLOCK while the shell or Python subprocess.Popen() do not (on a Linux platform).

See attachment pipe_nonblock.zip for the scripts used.

$ echo foo | ./desc_stdin.py
stdin F_GETFL 0
$ ./python_launch_pipe.py 
p2 stderr: data=b'stdin F_GETFL 0\n'
main end
$ ./trio_launch_pipe.py 
p2 stderr: data=b'stdin F_GETFL 2048\n'  <-- Here O_NONBLOCK is set
main end

I was expecting that by default, for a new process, all IOs were blocking. As a workaround, I will have to explicitly clear the O_NONBLOCK flag in my spawned exe, but I believe it can create problems with a variety of applications (that are expecting blocking IO by default).

@oremanj oremanj changed the title trio.open_process() sets stdin as O_NONBLOCK for the child process Using one process's output as another's input creates c Aug 31, 2020
@oremanj oremanj changed the title Using one process's output as another's input creates c Using one process's output as another's input creates challenges with non-blocking status Aug 31, 2020
@njsmith
Copy link
Member

njsmith commented Aug 31, 2020

Huh, that's an interesting case! I'm really not sure what the correct behavior here is.

What makes it tricky is that:

  • The parent process holds onto a reference to the pipe that it passes in to the second child, so in principle it could continue to read/write from it as well via its own FdStream object, and that FdStream requires that the fd be in non-blocking mode

  • In principle, you might want to explicitly set an fd to non-blocking before passing it in. This would only happen in a super-exotic case, but in general we do want to make super-exotic cases at least possible to handle. So maybe it would be bad to unconditionally remove the O_NONBLOCK flag when spawning a process?

One option: have a special case where if you pass in an FdStream object as a new process's stdin/stdout/stderr, then open_process sets it to blocking + closes the FdStream in the parent process. (And if someone wants the super-exotic case of passing in the fd in raw non-blocking mode, then they can pass a raw file descriptor instead.)

This feels... odd, but also convenient.

@njsmith
Copy link
Member

njsmith commented Aug 31, 2020

...That might also let us simplify the fd creation code a bit, because stdin=PIPE would become equivalent to a, b = make_pipe(); stdin=b.

@oremanj
Copy link
Member

oremanj commented Aug 31, 2020

When you say stdin=p1.stdout when constructing p2, that makes p2's stdin use the same open file description as p1's stdout. p1's stdout is also still available to be accessed as a Trio stream, for which purpose it must be in non-blocking mode. The non-blocking status is a property of the open file description, not of the file descriptor, so it's the same for Trio and for the child process. Result: one or the other is going to be unhappy.

I'm not sure how best to resolve this. As a workaround, you could use PIPE for both processes and spawn a Trio task to shovel data from one to the other. That's somewhat clumsy, though. Thoughts on a real solution...

  • On Linux we can open /proc/self/fd/N to get another open file description for the same pipe, with independent non-blockingness, but that doesn't work on other systems.
  • We could use socketpairs for process I/O instead of pipes (thus allowing individual calls to use MSG_NOWAIT), but that's an unusual choice and might confuse some subprocesses.
  • We could provide some way to get our end of the pipe to not be put in nonblocking mode at all. Like p1 stdout=trio.BLOCKING_PIPE instead of subprocess.PIPE or something. That's kind of clumsy though.
  • We could detect the case where a trio FdStream is passed as one of the I/O descriptors for a child process, switch the underlying FD back to blocking mode and poison the stream against future uses within Trio (with a helpful error message explaining the situation). That's a little magical, but it would provide the best UI for people who expect their subprocess patterns to translate over unmodified.

@njsmith, thoughts?


Code so others don't have to unpack the zip file:

trio_launch_pipe.py:

#! /usr/bin/python3

import subprocess
import trio


async def main():
    """Main."""
    cmd1 = ("echo", "foo")
    cmd2 = ("./desc_stdin.py",)
    async with await trio.open_process(cmd1, stdout=subprocess.PIPE) as p1:
        async with await trio.open_process(
            cmd2, stdin=p1.stdout, stderr=subprocess.PIPE, stdout=subprocess.DEVNULL
        ) as p2:
            async with p2.stderr:
                async for data in p2.stderr:
                    print(f"p2 stderr: {data=}")
    print("main end")


if __name__ == "__main__":
    trio.run(main)

desc_stdin.py:

#! /usr/bin/python3

import fcntl
import sys

print("stdin F_GETFL", fcntl.fcntl(0, fcntl.F_GETFL), file=sys.stderr)

@oremanj
Copy link
Member

oremanj commented Aug 31, 2020

Huh, it seems like we independently converged on the same solution. :-) Probably special handling for FdStream arguments is the way to go then!

@njsmith
Copy link
Member

njsmith commented Aug 31, 2020

Portability is an interesting point... on Windows this problem doesn't happen at all, and on Linux like you note there's a simple workaround (modulo any exotic environments where /dev and /proc are inaccessible). Does macOS support MSG_NOWAIT? I feel like I looked into this at some point and it might not exist there?

@oremanj
Copy link
Member

oremanj commented Aug 31, 2020

It's not documented but it does seem to work:

% python3.8
Python 3.8.5 (default, Jul 21 2020, 10:48:26)
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket; socket.MSG_DONTWAIT
<MsgFlag.MSG_DONTWAIT: 128>
>>> a, b = socket.socketpair()
>>> b.recv(128)
^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyboardInterrupt
>>> b.recv(128, socket.MSG_DONTWAIT)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BlockingIOError: [Errno 35] Resource temporarily unavailable
>>>

@mic006
Copy link
Author

mic006 commented Sep 2, 2020

Following your discussion, os.set_blocking(p1.stdout.fileno(), True) between the 2 trio.open_process can be used as a workaround.

Using socket.socketpair() instead of a pipe may be a problem for some specific applications (splice system call may not work for example).

@njsmith
Copy link
Member

njsmith commented Sep 2, 2020

@mic006

Following your discussion, os.set_blocking(p1.stdout.fileno(), True) between the 2 trio.open_process can be used as a workaround.

Right, that's a good workaround for right now, but it's (1) awkward to force users to do that, (2) if you then try to access p1.stdout from the parent, then the parent will lock up, because Trio will try to access the fd in a non-blocking manner, but the fd will be in blocking mode. Not a problem for your use case b/c you're not going to use p1.stdout in the parent, but it would be nice if we could figure out an API design that didn't leave this footgun lying around.

Using socket.socketpair() instead of a pipe may be a problem for some specific applications (splice system call may not work for example).

splice should be OK, because we'd only need to use socketpair on macOS/BSDs, and those don't have splice :-). But yeah, in general I'm wary. Using a socket instead of a pipe should just work, but it's not a common thing to do so we might well discover a year later that there's some quirky program that objects for some obscure reason and then have to start over.

I guess one hacky but workable option would be to put p1.stdout into a mode where it toggles O_NONBLOCK on-and-off-again every time the parent process tries to use it.

Some other cases worth at least thinking about:

@njsmith
Copy link
Member

njsmith commented Sep 2, 2020

I guess one hacky but workable option would be to put p1.stdout into a mode where it toggles O_NONBLOCK on-and-off-again every time the parent process tries to use it.

...and we could probably use the same code for #174, now that I think about it.

@mic006
Copy link
Author

mic006 commented Sep 4, 2020

Does it even make sense to keep the stdin or stdout opened in the python process once it has been used to spawn another process ?
To make the piping working properly, I need to explicitly call aclose() to have the pipe closed. Somehow, once a 2nd process have been started with the pipe of a previous, the python process loses the responsibility of it, no ?
Or is there some more complex use cases where it can be useful ?

@tjstum
Copy link
Member

tjstum commented Sep 6, 2023

We ran into this at $work today. In our situation, explicitly setting the pipe back to blocking mode helped, as would the "trio magically sets the stream back to blocking mode and turns it invalid" converged-solution. So my vote would be for that approach.

Another possible idea would be to add some special constant to trio (akin to PIPE) that specifically evoked this magic (like stdout=trio.PIPE_NOT_FOR_USE_IN_TRIO_BUT_ANOTHER_SUBPROCESS_LATER but ... better named). Or the converse, for cases where people do actually want it to stay accessible in trio stdout=trio.PIPE_WITH_FDSTREAM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants