Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for redirecting stdio of child processes to open file handles #1055

Closed
wants to merge 11 commits into from
126 changes: 126 additions & 0 deletions text/0000-process-stdio-redirection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
- Feature Name: process-stdio-redirection
- Start Date: 2015-04-10
- RFC PR:
- Rust Issue:

# Summary

Update the `std::process` API with the ability to redirect stdio of child
processes to any opened file handle or equivalent.

# Motivation

The current API in `std::process` allows to either pipe stdio between parent and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this motivation section may want to be updated with the current state of the standard library because with the FromRaw{Fd,Handle} implementations on Stdio it's actually possible to do this. It looks like this RFC gets us to a point of a slightly-more-ergonomic version of what we have today, but I think it would be worth it to explore the motivation to make sure the gains are worth the additions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I hadn't realized those changes had landed! I'll update the RFC to just focus on a high-level design

child process or redirect stdio to `/dev/null`. It would also be largely useful
to allow stdio redirection to any currently opened file or pipe handles. This
would allow redirecting stdio with a physical file or even another process (via
OS pipe) without forcing the parent process to buffer the data itself.

For example, one may wish to spawn a process which prints gigabytes
of data (e.g. logs) and use another process to filter through it, and save the
result to a file. The current API would force the parent process to dedicate a
thread just to buffer all data between child processes and disk, which is
impractical given that the OS can stream the data for us via pipes and file
redirection.

# Detailed design

First, the standard library should provide an OS agnostic way of creating OS
in-memory pipes, i.e. a `Pipe`, providing reader and writer handles as
`PipeReader` and `PipeWriter` respectively. The reader and writer will simply be
an abstraction over the OS specific file descriptor/HANDLE, and will implement
the respective `Read`/`Write` trait, making it impossible to confuse the two
together. In addition, each should implement the appropriate `AsRaw{Fd,
Handle}`/`FromRaw{Fd, Handle}` for wrapping/unwrapping of the OS handles.

On Unix systems the pipe can be created with `libc::pipe` and the file
descriptors wrapped as appropriate (e.g. `sys::fs2::File`). On Windows the pipe
can be created via Windows' `CreatePipe` (a stub for which is missing in
`liblibc` at the moment) and the resulting HANDLEs also appropriately wrapped
(using `sys::fs2::File`).

This proposal considers making the `Pipe`'s reader and writer fields public
to allow easily moving ownership of the two handles, but any other appropriate
interface is acceptable.

```rust
pub struct Pipe {
pub reader: PipeReader,
pub writer: PipeWriter,
}

pub struct PipeReader(sys::fs2::File);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what's the use case you have in mind for a pipe? We should be able to implement AsRawFd and AsRawHandle for ChildStd{in,out,err} as well as being able to pass them into Stdio::redirect_foo, so I'm wondering if that would actually fit the bill for now.

I'm somewhat hesitant about adding three separate types (pair, reader, writer) as it'd be nice to keep the scope small for the RFC, but if it's needed it's needed! Could you be a little more precise here as well and indicate what functions are used to construct a Pipe and what module these are all located in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I completely failed to consider the possibility of extrancting the fd/HANDLE from ChildStd{in,out,err} (and my apologies for misunderstanding one of your earlier comments). It should work well for hooking up any number of processes together (in most cases) without having to create explicit pipes! (though it might still be nice to have a public interface for creating them)

I'll rework the proposal to include this strategy over the next few days!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I definitely think we'll want to add a pipe-creation interface at some point, and it's well worth considering here, but it's always nice to not add if we don't need to!

pub struct PipeWriter(sys::fs2::File);

impl Read for PipeReader { ... }
impl Write for PipeWriter { ... }
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you could also do something like blending strategies 1/2:

pub trait StdioExt {
    pub fn redirect<T: AsRawFd>(t: T) -> Stdio;
}

impl StdioExt for Stdio { ... }


Next, several `redirect_*` methods should be added to `Stdio` for certain
"blessed" types offered by the standard library, such as `File`, `PipeRead`, and
`PipeWrite`. By white-listing the accepted file-like types we can ensure the API
and its behavior is consistent across Windows and Unix.

```rust
fn redirect_file(f: File) -> Stdio { ... }
fn redirect_pipe_read(r: PipeRead) -> Stdio { ... }
fn redirect_pipe_write(w: PipeWrite) -> Stdio { ... }
```

These methods should take ownership of their arguments since storing references
will require `Stdio` and `Command` to gain lifetime parameters, which will break
the currently stabilized implementations. Thus the caller will be responsible
for duplicating their handles appropriately if they wish to retain ownership.

To make redirections easier to use `Stdio` should become clonable so that once a
file-like handle is wrapped, the wrapper can be passed to any number of
`Commands` simultaneously. This can be accomplished by reference counting the
wrapped file handle.

```rust
impl Clone for Stdio { ... }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this may be a bit ambitious for the standard library, it's pretty rare to see an Arc under the hood of a primitive for the purpose of implementing Clone right now. There's definitely a use case of being able to share descriptors among spawned processes, but I think that we may want to expose that via raw handles for now.

For example unix could have Stdio::from_raw_fd and windows could have Stdio::from_raw_handle which will be low-level wrappers to pass I/O handles to the child. This would satisfy the ability to share the same handle among many processes (probably unsafely) for now as it's probably a somewhat rare desire.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the unsafety should be acceptable for now, especially since commands are probably executed as soon as they are built and all handles are still in scope.

Also I think it might be better to simply have a redirect method defined to take AsRawFd implementers on Unix, and AsRawHandle implementers on Windows to make usage of the API more portable. I'll update the proposal with the changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think it might be better to simply have a redirect method defined to take AsRawFd implementers on Unix, and AsRawHandle implementers on Windows to make usage of the API more portable

You've mentioned this in the RFC as written, but this'll need to be a platform-specific extension as the set of types implementing AsRawFoo is different across platforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking something along the lines of a StdioExt trait which will be defined in std::os::$patform::io based on the appropriate AsRawFoo, but their method signatures would match lexically.

// Unix
pub trait StdioExt { fn redirect<T: AsRawFd>(t: &T) -> Stdio { ... } }
// Windows
pub trait StdioExt { fn redirect<T: AsRawHandle>(t: &T) -> Stdio { ... } }

So then you could write something like the following in both Unix and Windows

#[cfg(unix)] use std::os::unix::io::StdioExt as StdioExt;
#[cfg(windows)] use std::os::windows::io::StdioExt as StdioExt;

let file = File::open(...);
let stdio = unsafe { StdioExt::redirect(file) };
...

Instead of having to write the following everywhere you want to redirect something

#[cfg(unix)] use std::os::unix::io::StdioExt as StdioExt;
#[cfg(windows)] use std::os::windows::io::StdioExt as StdioExt;

let file = File::open(...);
let stdio = unsafe {
    if cfg!(unix) {
        StdioExt::from_raw_fd(file)
    } else {
        StdioExt::from_raw_handle(file)
    }
};
...

This way users are still somewhat aware that this is platform specific code, without suffering the constant need of cfg.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Definitely sounds like a plausible idea to me!


#[deriving(Clone)]
struct StdioImp {
...
// sys::fs2::File is a safe wrapper over a fd/HANDLE,
// not to be confused with the public `std::fs::File`
Redirect(Rc<sys::fs2::File>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an Rc removes the Send and Sync bounds on Stdio, which is a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this would have to be Arc then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that should work.

}
```

# Drawbacks

If one wishes to close (drop) a redirected handle, for example, closing the
read end of a pipe and sending EOF to its child, they will have to manually
ensure all cloned `Stdio` wrappers are dropped so that the underlying handle is
closed. Otherwise it would be possible to deadlock while waiting on a child
which is waiting on and input handle held in the same thread.

In addition, if one desires to use a file-like handle outside of process
redirection, they will need to rely on an external mechanism for duplicating the
handle. In the case of an actual file, it can simply be reopened, but in the
case of OS pipes or sockets the underlying OS handle may need to be duplicated
via `libc` calls or the object itself would need to provide a duplication
mechanism.

# Alternatives

* Instead of specifying numerous `redirect_*` methods, simply accept anything
that implements the appropriate `AsRaw{Fd, Handle}` trait. This will cause OS
specific issues, however. For example, on Unix sockets are simple file
descriptors, but on Windows sockets aren't quite HANDLES, meaning the API
would be inconsistent across OS types.

* Duplicate the underlying OS handle and have the `std::process` APIs take
ownership of the copy. When working with OS pipes, however, the user would
have to manually keep track where the duplicates have gone if they wish to
close them all; failing to do so may cause deadlocks.

* Do not make `Stdio` clonable, and rely on caller duplicating all handles as
necessary. This could be particularly limiting if certain implementations do
not allow duplication.

# Unresolved questions

None at the moment.