Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process::Command doesn't close unused pipe endpoints when it should #98209

Open
jgoerzen opened this issue Jun 17, 2022 · 4 comments
Open

process::Command doesn't close unused pipe endpoints when it should #98209

jgoerzen opened this issue Jun 17, 2022 · 4 comments
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools A-process Area: `std::process` and `std::env` C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@jgoerzen
Copy link

I am debugging a problem where I use process::Command to establish pipes between various commands. I have been experiencing occasional (roughly 0.1% of invocations) deadlocks while those deadlocks are not present when the same pipeline is created from a bash script.

In examining the behavior with strace, I have noticed at least one suspicious behavior: when piping data in to a child process's stdin, I see this pattern. Here PID 230999 is the parent and PID 231002 is the child. I am trying to include only relevant bits of the strace log:

230999 pipe2([5, 6], O_CLOEXEC)         = 0
...
230999 write(2, "DEBUG filespooler::exec: Preparing to run \"hd\" with params []\n", 62) = 62
230999 clone(child_stack=0x7f1c3eb69ff0, flags=CLONE_VM|CLONE_VFORK|SIGCHLD <unfinished ...>
231002 dup2(5, 0)                       = 0
230999 write(2, "DEBUG filespooler::exec: Command PID 231002 started successfully\n", 65 <unfinished ...>
231002 exit_group(0)                    = ?
231002 +++ exited with 0 +++
230999 <... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 231002
230999 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=231002, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
230999 write(2, "DEBUG filespooler::exec: Command exited successfully with status ExitStatus(ExitStatus(0))\n", 91) = 91
230999 close(5)                         = 0

The suspicious thing here is that the parent didn't close the reading side of the pipe (fd 5) until after it has waited for the child process to exit. This is an issue; see this from pipe(7) on Linux:

   If  all  file descriptors referring to the write end of a pipe have been
   closed, then an attempt to read(2) from the pipe  will  see  end-of-file
   (read(2)  will return 0).  If all file descriptors referring to the read
   end of a pipe have been closed, then a write(2)  will  cause  a  SIGPIPE
   signal  to be generated for the calling process.  If the calling process
   is ignoring this signal, then write(2) fails with the error  EPIPE.   An
   application  that  uses pipe(2) and fork(2) should use suitable close(2)
   calls to close unnecessary duplicate file descriptors; this ensures that
   end-of-file and SIGPIPE/EPIPE are delivered when appropriate.

The circumstances around this suggest to me that may not be my precise issue, though I am having difficulty figuring out what else may be.

An any case, the result of this is that when piping from one process to another, the first process will hang indefinitely until the Rust program waits on the second. Here fd 5 should have been closed on the parent side immediately after the fork/clone, before returning anything back to the caller, since it is no longer needed there.

Presumably the unused fds on the child side (post-dup2) were closed due to the use of O_CLOEXEC, but I would argue it is somewhat bad form to just not even bother to do it.

The relevant code is at and around https://salsa.debian.org/jgoerzen/filespooler/-/blob/v1.1.0/src/exec.rs#L47

In this case, payload is of type ChildStdout.

Elsewhere, it gets things a bit better. Here fd 4 is a File that had been opened in the parent process, and is now the stdin for the child, and fds 5 and 6 are used to obtain the stdout FROM the child. This also happens to be the place where the first command is spawned (whose stdout will be the stdin for the second command listed above); the pipe2 call shown here is the same one shown above.

230999 write(2, "DEBUG with_decoder{decoder=\"cat\"}: filespooler::jobqueue: Preparing to invoke decoder: \"/bin/bash\" ["..., 113) = 113
230999 pipe2([5, 6], O_CLOEXEC)         = 0
230999 clone(child_stack=0x7f1c3eb69ff0, flags=CLONE_VM|CLONE_VFORK|SIGCHLD <unfinished ...>
231001 dup2(4, 0)                       = 0
231001 dup2(6, 1)                       = 1
230999 close(6 <unfinished ...>
230999 <... close resumed>)             = 0
230999 close(4)                         = 0
230999 write(2, "DEBUG with_decoder{decoder=\"cat\"}: filespooler::jobqueue: Decoder PID 231001 started successfully\n", 98 <unfinished ...>

Here it closes the unnecessary fds (4 and 6) on the parent before returning to the caller. That's what it should do. Why it does that for a File piped to stdin and not for a ChildStdout piped to stdin, I don't know. (Recall that the fd 5 here later becomes stdin for the second spawned command as shown above, and is not closed by the parent until after the second child exits)

This trace corresponds to the code at https://salsa.debian.org/jgoerzen/filespooler/-/blob/v1.1.0/src/jobqueue.rs#L258 and here input is a File.

Meta

rustc --version --verbose:

rustc 1.56.1 (59eed8a2a 2021-11-01)
binary: rustc
commit-hash: 59eed8a2aac0230a8b53e89d4e99d55912ba6b35
commit-date: 2021-11-01
host: x86_64-unknown-linux-gnu
release: 1.56.1
LLVM version: 13.0.0

@tmiasko
Copy link
Contributor

tmiasko commented Jun 22, 2022

Command::{stdin,stdout,stderr} methods transfer ownership of the file descriptor to the command. Those file descriptors will be dropped when command itself is dropped.

Consequently, if you would like to close a configured file descriptor in the parent before waiting on the child, you need to drop the command before calling Child::wait.

@jgoerzen
Copy link
Author

I did some additional playing around with it. I found this diff https://salsa.debian.org/jgoerzen/filespooler/-/commit/0c80f15c2f5c1e32c899a2bbb13891805e0f1dc9 to actually produce the close at the expected spot, as you indicated.

I don't know why the previous approach I took didn't do this, as I'd have expected the Drop to run when the let rebinding occurred (with the spawn).

But in any case, shouldn't the spawn cause the drop of those things itself? That is, it shouldn't be possible to get into this predicament, I'd think? That is, since spawn() doesn't consume the Command, shouldn't spawn() at least consume (or explicitly drop) the fds after the fork?

@psychon
Copy link

psychon commented Jun 22, 2022

You could do command.spawn(); command.spawn();. That spawns two children with the same FDs. Thus, spawn() cannot close the FDs, can it?

I don't know why the previous approach I took didn't do this, as I'd have expected the Drop to run when the let rebinding occurred (with the spawn).

Doesn't the let binding just shadow the variable? I do not think it actually makes "anything end".
This does not compile:

fn main() {
    let foo = vec![1, 2, 3];
    let borrow = &foo;
    std::mem::drop(foo);
    dbg!(borrow);
}

This compiles:

fn main() {
    let foo = vec![1, 2, 3];
    let foo = &foo;
    dbg!(foo);
}

@jgoerzen
Copy link
Author

Hmm. I suppose there isn't much that could be done about the multiple spawns, despite the dubiousness of spawning twice with the same fd hooked to stdin (particularly without waiting for the first spawn to exit). Perhaps the solution is to document this then?

@workingjubilee workingjubilee added the A-process Area: `std::process` and `std::env` label Jul 22, 2023
@ChrisDenton ChrisDenton added A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed needs-triage-legacy labels Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools A-process Area: `std::process` and `std::env` C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants