Skip to content

adding RFC process-handle-for-async #2823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

adding RFC process-handle-for-async #2823

wants to merge 5 commits into from

Conversation

frehberg
Copy link

@frehberg frehberg commented Nov 28, 2019

@sfackler
Copy link
Member

How does this differ from the AsRawHandle trait for Windows and id method for Unix? https://doc.rust-lang.org/std/os/windows/io/trait.AsRawHandle.html

@frehberg
Copy link
Author

frehberg commented Nov 28, 2019

@sfackler Windows-Handles are the perfect way, the Unix-PIDs are a nightmare, citing from tokio-process https://github.com/alexcrichton/tokio-process/blob/master/src/unix/mod.rs

//! Unix handling of child processes
//!
//! Right now the only "fancy" thing about this is how we implement the
//! `Future` implementation on `Child` to get the exit status. Unix offers
//! no way to register a child with epoll, and the only real way to get a
//! notification when a process exits is the SIGCHLD signal.
//!
//! Signal handling in general is *super* hairy and complicated, and it's even
//! more complicated here with the fact that signals are coalesced, so we may
//! not get a SIGCHLD-per-child.
//!
//! Our best approximation here is to check *all spawned processes* for all
//! SIGCHLD signals received. To do that we create a `Signal`, implemented in
//! the `tokio-signal` crate, which is a stream over signals being received.
//!
//! Later when we poll the process's exit status we simply check to see if a
//! SIGCHLD has happened since we last checked, and while that returns "yes" we
//! keep trying.
//!
//! Note that this means that this isn't really scalable, but then again
//! processes in general aren't scalable (e.g. millions) so it shouldn't be that
//! bad in theory...

This breaks the ownership concept and relies heavily on signal-hander setups. So, instead of using the hairy signal concept of legacy Unix, Rust should rely on the newer Unix APIs for process handling, such as pidfd, pdfork, anf forkfd, providing now a similar functionality as Windows-Handle.
QProcess demonstrates an API, being portable for all kind of OS.

For each child-process Windows provides both, a PID and a HANDLE. On Unix Rust API just supports a PID per child. So using the new concepts of pidfd, pdfork, etc. we may extend the public API of std::process:Process now to provide both id() and handle() as well on Unix

@joshtriplett
Copy link
Member

joshtriplett commented Dec 5, 2019

I'd love to see this change, but I don't know if we can reasonably handle the non-pidfd case or the POSIX case without placing requirements on the caller that we don't want to place.

@thiagomacieira, can you weigh in regarding the Qt code for the non-pidfd compatibility case? We talked about the requirements the self-pipe and SIGCHLD approach had, to avoid disrupting any other code in the same program (given all the issues with signal handling). Can you talk about some of those requirements here?

@thiagomacieira
Copy link

The code I have at https://github.com/qt/qtbase/tree/dev/src/3rdparty/forkfd works pretty well for all Unix operating systems. The pidfd version of it will come soon, as soon as I can find the time to test on a Linux 5.4.0 to make sure things still run (change is at https://codereview.qt-project.org/c/qt/qtbase/+/108456). Here's what you need to know:

  • implementation works on all Unix I could get my hands on, including an older broken version of macOS
  • implementation works with Bionic, MUSL and uClibc on Linux too
  • implementation does catch multiple children exiting and won't miss them due to queued SIGCHLD
  • implementation works in single- and multi-threaded environments
  • regular operation is thread-safe, lock-free and requires no extra threads. It strictly obeys man signal-safety

But:

  • you can only listen for child processes. You can't open an already-running process like you can on Windows.
  • you can pass the file descriptor via Unix SCM_CREDENTIALS to another process, but it's not adviseable
  • installing the signal handler is not thread-safe. There's no way to implement this thread-safely. Some environments may want to install the handler early on, while there's a single thread running to avoid an issue.
  • removing the signal handler is not possible. There's no way to safely do this (including in single-threaded mode). Therefore, it's highly adviseable to never unload the signal handler code.
  • handling the signal depends on cooperation with any other libraries installing SIGCHLD handlers. Specifically:
    • requires any overriding handlers to call our handler if they didn't handle the child in question
    • requires any overriders not to uninstall our handler if they attempt to remove their handler (they shouldn't remove, actually)

Glib's GSpawn code is guilty of both cooperation shortcomings. Therefore, if there's a chance that GSpawn will be used in the same process, the forkfd signal handler should be installed after Glib's.

@thiagomacieira
Copy link

An alternative implementation is to have a babysitter, intermediary process. Instead of the process that is the final objective being the direct child of the main process, it's a grandchild, with the intermediate child stopped at waitpid. When the grandchild dies, the babysitter wakes up, reaps and writes to the pipe. At that point, the main process waitpids the intermediary process too.

Advantages:

  • does not depend on cooperation with other frameworks in the same process
  • depending on how it's coded, the babysitter may outlive the main process and therefore passing its pipe via SCM_CRED to another process is possible

Disadvantages:

  • if the babysitter is a fork-no-exec child, then it could be holding a lot of resources from the parent process. Closing all non-relevant file descriptors is easy, but that won't solve other resources like memory-mapped deleted files. At a minimum, it will still have the entire memory footprint of the original process when forked, thus causing copy-on-write faults on the parent process.
  • if the babysitter instead exec's another process, then we have the problem of needing this file to exist in the filesystem somewhere (execveat of a memfd executable that was stored inside the parent process' binaries is possible, but then won't share memory with other babysitters). Plus there's a start-up time cost.

The minimum cost of an intermediary babysitter, for a very, very tiny, statically-linked executable, is 1 shared page + 2 pages per process thus launched (1 for .data/.bss and one for the stack, which could be bigger since the environment is stored there).

@thiagomacieira
Copy link

thiagomacieira commented Dec 5, 2019

Finally, a third alternative is to dedicate a thread of the main process to waitpid on each child. That means you have one thread per child in the main process. You remove the disadvantages of a separate process (all the memory is shared), the signal handler cooperation and consuming a file descriptor.

But threads aren't free. You do consume at least one page per thread, for the thread's stack. Like the babysitter case, this means you have more Linux processes[*] running at any point in time, which means you could get much closer to your fork limit.

One big disadvantage of this method is that, unlike the other two, you can't simply abandon a child after the threaded waitpid has started. Once you do that, you're committing to the wait until the child exits. The waitpid call can be interrupted by a signal, but there's no reliable, portable way to interrupt a thread. Glibc has pthread_sigqueue and Bionic and uClibc copied it, but MUSL doesn't have it and neither do other Unix systems. On those systems, you could fall back to pthread_cancel and set the thread to asynchronous cancellation (pthread_setcanceltype) and all the issues that come with it.

[*] "process" here is the kernel term: any userspace thread.

@frehberg
Copy link
Author

As already stated in the associated issue-ticket: #2817

As for the question, how to deal with child-processes on older Linux/Posix-Systems (the non-pidfd compatibility case), the following code illustrates the way, the QProcess implementation is working in this legacy environment successfully.
https://gist.github.com/frehberg/58cac82a332febd6aff0e188c825b699

The code can be executed in two modes 1) Not-Qt-Mode, keeping the read-end inparent-process and write-end in child-process. 2) Qt-Mode, closing both ends in forked child-process and keeping both ends in parent-process.

Compile the code using "gcc -o a.out test_dath_pipe.c"

And execute as "./a.out not-qt-mode" or "./a.out qt-mode"

It shows, that in both modes the parent-process receives the termination signal as POLLIN-event.

The way QProcess has been implemented is favourable, indeed, as it does not leak file-descriptors to sub-sub-processes.

I hope, this short code snippet illustrates the mechanisms, implementing the death-pipe on top of basic POSIX API in case the underlying OS does not provide a suitable native functionality.

frehberg and others added 4 commits January 3, 2020 02:46
Co-Authored-By: the8472 <the8472@users.noreply.github.com>
Co-Authored-By: the8472 <the8472@users.noreply.github.com>
Co-Authored-By: the8472 <the8472@users.noreply.github.com>
Co-Authored-By: the8472 <the8472@users.noreply.github.com>
@dtolnay dtolnay added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jan 16, 2020
@KodrAus KodrAus added A-async-await Proposals re. async / await Libs-Tracked Libs issues that are tracked on the team's project board. labels Jul 29, 2020
@frehberg frehberg closed this by deleting the head repository Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-async-await Proposals re. async / await Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants