Design of the Unix implementation of the new Process APIs

I am working on implementing a set of brand new APIs for running processes. The new APIs shall not repeat any design mistakes from the past. They need to be simple, consistent across all operating systems, and easy to use correctly, while being hard to use incorrectly. And of course I want them to be as performant as possible.

Some of the issues with the existing APIs are caused by bad API design. But some need a better underlying implementation. I've implemented a [prototype](https://github.com/adamsitnik/ProcessPlayground) of the new APIs, but before I proceed further with the API proposal (so far only for the low-level part that can be used to build anything on top of it), I would like to verify that I've made the right choices.

## PID recycling problem

PIDs are globally unique only while the process is alive. Once a process exits and is reaped, the PID can be recycled by the kernel for a new process. This creates a race condition window: code that holds a PID might inadvertently operate on a different process if that PID gets reused. For example, sending a kill signal to a PID that has just been reassigned could terminate an unrelated process. In security-sensitive contexts, this is more than theoretical – there have been [real vulnerabilities](https://access.redhat.com/security/cve/cve-2025-4598) exploiting PID reuse.

Currently, we provide [ProcessWaitState.Unix.cs](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessWaitState.Unix.cs) type that maintains a static/shared table that maps process ID to `ProcessWaitState` object. Access to this table is synchronized using a lock. This allows us to avoid **some** of the PID recycling issues (as long as the user does not perform syscalls on their own, bypassing managed APIs).

It took the Unix community a while (pun intended) to realize that PID recycling is a real problem. But the great news is that there is a solution: **process descriptors** (on Windows known as process handles).

Starting with Linux 5.3 and FreeBSD 9, they have introduced the same concepts but with slightly different APIs:

- A child process can be created with `clone3(..., CLONE_PIDFD)` (Linux) or `pdfork()` (FreeBSD) which returns a process descriptor.
- It's possible to wait on the process descriptor with `poll(..., timeout)` (and `epoll` and `select`)
- It's possible to kill the child process using the process descriptor with `pidfd_send_signal` (Linux) or `pdkill` (FreeBSD)
- Handling zombie child processes is the same, but using `pidfd` rather than `pid`

But, as always, there are some caveats:

- macOS does not support process descriptors at all.
- We do support RHEL 8.0 (Linux 4.18+), but process descriptors are only available starting with Linux 5.3. Other supported RHEL versions (9.0, 10.0) do support process descriptors.

We can take advantage of process descriptors when they are available, and fall back to another approach when they are not (more details below).

@stephentoub @jkotas Do you believe that having two code paths is worth the effort to avoid PID recycling issues on supported platforms?

## Process exit monitoring

Even if we have process descriptors, we still need to monitor process exit. But in order to do that asynchronously with cancellation support, we would need to use `io_uring`. But `io_uring` is Linux-specific and can be simply [blocked](https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html).

While reading [this amazing blog post](https://gaultier.github.io/blog/way_too_many_ways_to_wait_for_a_child_process_with_a_timeout.html), I've stumbled upon another idea: the [self-pipe trick](https://gaultier.github.io/blog/way_too_many_ways_to_wait_for_a_child_process_with_a_timeout.html#third-approach-self-pipe-trick). The idea is as follows:

- [Create pipe for exit monitoring](https://github.com/adamsitnik/ProcessPlayground/blob/510ea020f666b481d404c9f810a7e032cdcd1738/Library/native/process_spawn.c#L82), use CLOEXEC to avoid other parallel processes inheriting it.
- Clone/fork the child process.
- [Duplicate the exit pipe](https://github.com/adamsitnik/ProcessPlayground/blob/510ea020f666b481d404c9f810a7e032cdcd1738/Library/native/process_spawn.c#L167-L169) in the child process (so it survives execve).
- Close the write end of the pipe in the parent process.
- Call `execve`.

With such an approach, the child process runs with one additional file descriptor. When it exits, the kernel automatically closes all file descriptors owned by the child process, including the duplicated write end of the exit pipe. The parent process can monitor the read end of the exit pipe using `poll`/`epoll`/`select`. When the read end becomes readable, it means that the child process has exited.

And this works on all Unix-like operating systems, including macOS. And we already have `epoll` support implemented by `Socket` on Unix, so async process exit monitoring with cancellation support can be implemented in the [following way](https://github.com/adamsitnik/ProcessPlayground/blob/510ea020f666b481d404c9f810a7e032cdcd1738/Library/SafeChildProcessHandle.Unix.cs#L443-L451):

```csharp
using Socket socket = new(new SafeSocketHandle(_exitPipeFd));

int bytesRead = await socket.ReceiveAsync(s_exitPipeBuffer, SocketFlags.None, cancellationToken);
```

The disadvantages of this approach that I was able to come up with:

- The user could just somehow find this `fd` and close it from the child process, breaking the exit monitoring. But this is a very unlikely scenario, especially if we use a high fd number. We always need to check the process exit status with `waitid/waitpid`, so in worst case scenario the async method would perform a blocking sys-call.
- The dependency on `Socket` and other networking-related APIs for Unix implementation. Size on disk matters, but it would affect only the applications using the `Async` overloads of the new Process APIs on Unix. And IMO this is acceptable trade-off.
- There is a small performance overhead of creating a pipe and duplicating the fd in the child process. But process creation is already expensive, so this should not be a big deal. And BTW we already create an [additional pipe](https://github.com/dotnet/runtime/blob/1b4d0422cd72a09b8e589c5704ea0b45fefd44dd/src/native/libs/System.Native/pal_process.c#L300-L303) per process for the benefit of knowing when the child process has called exec.

One of the requirements I was given by @jkotas is to let the users start the process on their own (for example to support a very niche scenario for a config switch that we don't expose because it's OS-specific) and then use our new APIs to work with it (example: wait for exit asynchronously) 

The proposed approach works great as long as we orchestrate the process creation ourselves. If the user creates a process using `fork`/`execve` on their own, we cannot guarantee that they will duplicate the exit pipe correctly. We could document this requirement clearly and even expose a public `ctor` that requires an exit pipe fd.

If we decide to go with this approach, I would like to introduce a new `SafeHandle`-derived type that represents a child process (not just any particular process). There would be no breaking changes and a clear contract that this type is only for child processes created by us or by the user following our guidelines.

```csharp
public sealed class SafeChildProcessHandle : SafeHandleZeroOrMinusOneIsInvalid
{
    public SafeChildProcessHandle(IntPtr existingHandle, bool ownsHandle);

    [UnsupportedOSPlatform("windows")] // [SupportedOSPlatform("unix")] is not supported
    public SafeChildProcessHandle(int pid, IntPtr exitPipeFd, bool ownsHandle);
}
```

The name is just a proposal (I am open to other suggestions like `Subprocess`). 

@stephentoub @jkotas What do you think about this design?

@tmds @am11 Please share your perspective as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design of the Unix implementation of the new Process APIs #122819

PID recycling problem

Process exit monitoring

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Design of the Unix implementation of the new Process APIs #122819

Description

PID recycling problem

Process exit monitoring

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions