Skip to content

Design of the Unix implementation of the new Process APIs #122819

@adamsitnik

Description

@adamsitnik

I am working on implementing a set of brand new APIs for running processes. The new APIs shall not repeat any design mistakes from the past. They need to be simple, consistent across all operating systems, and easy to use correctly, while being hard to use incorrectly. And of course I want them to be as performant as possible.

Some of the issues with the existing APIs are caused by bad API design. But some need a better underlying implementation. I've implemented a prototype of the new APIs, but before I proceed further with the API proposal (so far only for the low-level part that can be used to build anything on top of it), I would like to verify that I've made the right choices.

PID recycling problem

PIDs are globally unique only while the process is alive. Once a process exits and is reaped, the PID can be recycled by the kernel for a new process. This creates a race condition window: code that holds a PID might inadvertently operate on a different process if that PID gets reused. For example, sending a kill signal to a PID that has just been reassigned could terminate an unrelated process. In security-sensitive contexts, this is more than theoretical – there have been real vulnerabilities exploiting PID reuse.

Currently, we provide ProcessWaitState.Unix.cs type that maintains a static/shared table that maps process ID to ProcessWaitState object. Access to this table is synchronized using a lock. This allows us to avoid some of the PID recycling issues (as long as the user does not perform syscalls on their own, bypassing managed APIs).

It took the Unix community a while (pun intended) to realize that PID recycling is a real problem. But the great news is that there is a solution: process descriptors (on Windows known as process handles).

Starting with Linux 5.3 and FreeBSD 9, they have introduced the same concepts but with slightly different APIs:

  • A child process can be created with clone3(..., CLONE_PIDFD) (Linux) or pdfork() (FreeBSD) which returns a process descriptor.
  • It's possible to wait on the process descriptor with poll(..., timeout) (and epoll and select)
  • It's possible to kill the child process using the process descriptor with pidfd_send_signal (Linux) or pdkill (FreeBSD)
  • Handling zombie child processes is the same, but using pidfd rather than pid

But, as always, there are some caveats:

  • macOS does not support process descriptors at all.
  • We do support RHEL 8.0 (Linux 4.18+), but process descriptors are only available starting with Linux 5.3. Other supported RHEL versions (9.0, 10.0) do support process descriptors.

We can take advantage of process descriptors when they are available, and fall back to another approach when they are not (more details below).

@stephentoub @jkotas Do you believe that having two code paths is worth the effort to avoid PID recycling issues on supported platforms?

Process exit monitoring

Even if we have process descriptors, we still need to monitor process exit. But in order to do that asynchronously with cancellation support, we would need to use io_uring. But io_uring is Linux-specific and can be simply blocked.

While reading this amazing blog post, I've stumbled upon another idea: the self-pipe trick. The idea is as follows:

With such an approach, the child process runs with one additional file descriptor. When it exits, the kernel automatically closes all file descriptors owned by the child process, including the duplicated write end of the exit pipe. The parent process can monitor the read end of the exit pipe using poll/epoll/select. When the read end becomes readable, it means that the child process has exited.

And this works on all Unix-like operating systems, including macOS. And we already have epoll support implemented by Socket on Unix, so async process exit monitoring with cancellation support can be implemented in the following way:

using Socket socket = new(new SafeSocketHandle(_exitPipeFd));

int bytesRead = await socket.ReceiveAsync(s_exitPipeBuffer, SocketFlags.None, cancellationToken);

The disadvantages of this approach that I was able to come up with:

  • The user could just somehow find this fd and close it from the child process, breaking the exit monitoring. But this is a very unlikely scenario, especially if we use a high fd number. We always need to check the process exit status with waitid/waitpid, so in worst case scenario the async method would perform a blocking sys-call.
  • The dependency on Socket and other networking-related APIs for Unix implementation. Size on disk matters, but it would affect only the applications using the Async overloads of the new Process APIs on Unix. And IMO this is acceptable trade-off.
  • There is a small performance overhead of creating a pipe and duplicating the fd in the child process. But process creation is already expensive, so this should not be a big deal. And BTW we already create an additional pipe per process for the benefit of knowing when the child process has called exec.

One of the requirements I was given by @jkotas is to let the users start the process on their own (for example to support a very niche scenario for a config switch that we don't expose because it's OS-specific) and then use our new APIs to work with it (example: wait for exit asynchronously)

The proposed approach works great as long as we orchestrate the process creation ourselves. If the user creates a process using fork/execve on their own, we cannot guarantee that they will duplicate the exit pipe correctly. We could document this requirement clearly and even expose a public ctor that requires an exit pipe fd.

If we decide to go with this approach, I would like to introduce a new SafeHandle-derived type that represents a child process (not just any particular process). There would be no breaking changes and a clear contract that this type is only for child processes created by us or by the user following our guidelines.

public sealed class SafeChildProcessHandle : SafeHandleZeroOrMinusOneIsInvalid
{
    public SafeChildProcessHandle(IntPtr existingHandle, bool ownsHandle);

    [UnsupportedOSPlatform("windows")] // [SupportedOSPlatform("unix")] is not supported
    public SafeChildProcessHandle(int pid, IntPtr exitPipeFd, bool ownsHandle);
}

The name is just a proposal (I am open to other suggestions like Subprocess).

@stephentoub @jkotas What do you think about this design?

@tmds @am11 Please share your perspective as well.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions