-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
I am working on implementing a set of brand new APIs for running processes. The new APIs shall not repeat any design mistakes from the past. They need to be simple, consistent across all operating systems, and easy to use correctly, while being hard to use incorrectly. And of course I want them to be as performant as possible.
Some of the issues with the existing APIs are caused by bad API design. But some need a better underlying implementation. I've implemented a prototype of the new APIs, but before I proceed further with the API proposal (so far only for the low-level part that can be used to build anything on top of it), I would like to verify that I've made the right choices.
PID recycling problem
PIDs are globally unique only while the process is alive. Once a process exits and is reaped, the PID can be recycled by the kernel for a new process. This creates a race condition window: code that holds a PID might inadvertently operate on a different process if that PID gets reused. For example, sending a kill signal to a PID that has just been reassigned could terminate an unrelated process. In security-sensitive contexts, this is more than theoretical – there have been real vulnerabilities exploiting PID reuse.
Currently, we provide ProcessWaitState.Unix.cs type that maintains a static/shared table that maps process ID to ProcessWaitState object. Access to this table is synchronized using a lock. This allows us to avoid some of the PID recycling issues (as long as the user does not perform syscalls on their own, bypassing managed APIs).
It took the Unix community a while (pun intended) to realize that PID recycling is a real problem. But the great news is that there is a solution: process descriptors (on Windows known as process handles).
Starting with Linux 5.3 and FreeBSD 9, they have introduced the same concepts but with slightly different APIs:
- A child process can be created with
clone3(..., CLONE_PIDFD)(Linux) orpdfork()(FreeBSD) which returns a process descriptor. - It's possible to wait on the process descriptor with
poll(..., timeout)(andepollandselect) - It's possible to kill the child process using the process descriptor with
pidfd_send_signal(Linux) orpdkill(FreeBSD) - Handling zombie child processes is the same, but using
pidfdrather thanpid
But, as always, there are some caveats:
- macOS does not support process descriptors at all.
- We do support RHEL 8.0 (Linux 4.18+), but process descriptors are only available starting with Linux 5.3. Other supported RHEL versions (9.0, 10.0) do support process descriptors.
We can take advantage of process descriptors when they are available, and fall back to another approach when they are not (more details below).
@stephentoub @jkotas Do you believe that having two code paths is worth the effort to avoid PID recycling issues on supported platforms?
Process exit monitoring
Even if we have process descriptors, we still need to monitor process exit. But in order to do that asynchronously with cancellation support, we would need to use io_uring. But io_uring is Linux-specific and can be simply blocked.
While reading this amazing blog post, I've stumbled upon another idea: the self-pipe trick. The idea is as follows:
- Create pipe for exit monitoring, use CLOEXEC to avoid other parallel processes inheriting it.
- Clone/fork the child process.
- Duplicate the exit pipe in the child process (so it survives execve).
- Close the write end of the pipe in the parent process.
- Call
execve.
With such an approach, the child process runs with one additional file descriptor. When it exits, the kernel automatically closes all file descriptors owned by the child process, including the duplicated write end of the exit pipe. The parent process can monitor the read end of the exit pipe using poll/epoll/select. When the read end becomes readable, it means that the child process has exited.
And this works on all Unix-like operating systems, including macOS. And we already have epoll support implemented by Socket on Unix, so async process exit monitoring with cancellation support can be implemented in the following way:
using Socket socket = new(new SafeSocketHandle(_exitPipeFd));
int bytesRead = await socket.ReceiveAsync(s_exitPipeBuffer, SocketFlags.None, cancellationToken);The disadvantages of this approach that I was able to come up with:
- The user could just somehow find this
fdand close it from the child process, breaking the exit monitoring. But this is a very unlikely scenario, especially if we use a high fd number. We always need to check the process exit status withwaitid/waitpid, so in worst case scenario the async method would perform a blocking sys-call. - The dependency on
Socketand other networking-related APIs for Unix implementation. Size on disk matters, but it would affect only the applications using theAsyncoverloads of the new Process APIs on Unix. And IMO this is acceptable trade-off. - There is a small performance overhead of creating a pipe and duplicating the fd in the child process. But process creation is already expensive, so this should not be a big deal. And BTW we already create an additional pipe per process for the benefit of knowing when the child process has called exec.
One of the requirements I was given by @jkotas is to let the users start the process on their own (for example to support a very niche scenario for a config switch that we don't expose because it's OS-specific) and then use our new APIs to work with it (example: wait for exit asynchronously)
The proposed approach works great as long as we orchestrate the process creation ourselves. If the user creates a process using fork/execve on their own, we cannot guarantee that they will duplicate the exit pipe correctly. We could document this requirement clearly and even expose a public ctor that requires an exit pipe fd.
If we decide to go with this approach, I would like to introduce a new SafeHandle-derived type that represents a child process (not just any particular process). There would be no breaking changes and a clear contract that this type is only for child processes created by us or by the user following our guidelines.
public sealed class SafeChildProcessHandle : SafeHandleZeroOrMinusOneIsInvalid
{
public SafeChildProcessHandle(IntPtr existingHandle, bool ownsHandle);
[UnsupportedOSPlatform("windows")] // [SupportedOSPlatform("unix")] is not supported
public SafeChildProcessHandle(int pid, IntPtr exitPipeFd, bool ownsHandle);
}The name is just a proposal (I am open to other suggestions like Subprocess).
@stephentoub @jkotas What do you think about this design?