Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Linux-specific pidfd process extensions #77168

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions library/std/src/os/linux/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@
#![stable(feature = "raw_ext", since = "1.1.0")]

pub mod fs;
pub mod process;
pub mod raw;
47 changes: 47 additions & 0 deletions library/std/src/os/linux/process.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
//! Linux-specific extensions to primitives in the `std::process` module.

#![unstable(feature = "linux_pidfd", issue = "none")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open a tracking issue and update this attribute before landing.
Putting a review comment as a reminder; doesn't need to be done now if the PR is not in shape to land yet.


use crate::process;
use crate::sys_common::AsInnerMut;
use crate::io::Result;

/// Os-specific extensions to [`process::Child`]
///
/// [`process::Child`]: crate::process::Child
pub trait ChildExt {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this a sealed trait (like https://doc.rust-lang.org/std/slice/trait.SliceIndex.html) so it cannot be implemented outside of the standard library and we are free to add methods.

/// Obtains the pidfd created for this child process, if available.
///
/// A pidfd will only ever be available if `create_pidfd(true)` was called
/// when the corresponding `Command` was created.
///
/// Even if `create_pidfd(true)` is called, a pidfd may not be available
/// due to an older version of Linux being in use, or if
/// some other error occured.
///
/// See `man pidfd_open` for more details about pidfds.
fn pidfd(&self) -> Result<i32>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not Result<RawFd>?

}

/// Os-specific extensions to [`process::Command`]
///
/// [`process::Command`]: crate::process::Command
pub trait CommandExt {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this a sealed trait.

/// Sets whether or this `Command` will attempt to create a pidfd
/// for the child. If this method is never called, a pidfd will
/// not be crated.
///
/// The pidfd can be retrieved from the child via [`ChildExt::pidfd`]
///
/// A pidfd will only be created if it is possible to do so
/// in a guaranteed race-free manner (e.g. if the `clone3` system call is
/// supported). Otherwise, [`ChildExit::pidfd`] will return an error.
fn create_pidfd(&mut self, val: bool) -> &mut process::Command;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the user need to request it explicitly? As long as clone3 is available is there any cost to always obtaining it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always opening a pidfd would cause it to be leaked, since we don't ever close it in the standard library. This would eventually exhaust all of the file descriptors for the process with leaked pidfds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it could be closed when the Child is dropped and one would have to take it out if one wants to keep using it, but yeah that would indeed be an extra cost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ownership and moving would be appropriate here since someone has to be responsible for closing it and if one can retrieve copies it becomes unclear who is responsible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Without seeing this discussion, I would presume that Process owns the pidfd and will close it on drop.
  2. The pidfd should be closed when clone3 succeeds but exec fails, since a user has no ability to do so on their own, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without seeing this discussion, I would presume that Process owns the pidfd and will close it on drop.

If we do that, then I think we should add a way to prevent this behavior (other then calling mem::forget on the Child). Otherwise, users who just need the pidfd will be forced to potentially leak memory (if Child ever heap allocates internally), or keep a Child around when it's not actually needed.

The pidfd should be closed when clone3 succeeds but exec fails, since a user has no ability to do so on their own, right?

The pidfd only exists in the parent process. Once clone3 succeeds, the pidfd is perfectly useable, regardless of what happens in the child process from that point on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pidfd only exists in the parent process. Once clone3 succeeds, the pidfd is perfectly useable, regardless of what happens in the child process from that point on.

My point was that spawn returns an error in that case, so the user cannot access pidfd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that, then I think we should add a way to prevent this behavior (other then calling mem::forget on the Child).

To keep things RAII we should introduce a new wrapper type akin to other fd-wrapping IO types such as File, TcpStream and so on. E.g. struct ProcessDescriptor(RawFd) which could implement Drop.

Then Child could have

  • take_pidfd(&mut self) -> ProcessDescriptor
  • pidfd(&self) -> &ProcessDescriptor

ProcessDescriptor then could implement IntoRawFd and AsRawFd

Copy link
Member Author

@Aaron1011 Aaron1011 Sep 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that spawn returns an error in that case, so the user cannot access pidfd.

Oh, that's a good point. I'll fix that.

}

impl CommandExt for process::Command {
fn create_pidfd(&mut self, val: bool) -> &mut process::Command {
self.as_inner_mut().create_pidfd(val);
self
}
}
2 changes: 1 addition & 1 deletion library/std/src/process.rs
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ use crate::sys_common::{AsInner, AsInnerMut, FromInner, IntoInner};
/// [`wait`]: Child::wait
#[stable(feature = "process", since = "1.0.0")]
pub struct Child {
handle: imp::Process,
pub(crate) handle: imp::Process,

/// The handle for writing to the child's standard input (stdin), if it has
/// been captured. To avoid partially moving
Expand Down
5 changes: 5 additions & 0 deletions library/std/src/sys/unix/process/process_common.rs
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ pub struct Command {
stdin: Option<Stdio>,
stdout: Option<Stdio>,
stderr: Option<Stdio>,
pub(crate) make_pidfd: bool,
}

// Create a new type for argv, so that we can make it `Send` and `Sync`
Expand Down Expand Up @@ -149,6 +150,7 @@ impl Command {
stdin: None,
stdout: None,
stderr: None,
make_pidfd: false,
}
}

Expand Down Expand Up @@ -181,6 +183,9 @@ impl Command {
pub fn gid(&mut self, id: gid_t) {
self.gid = Some(id);
}
pub fn create_pidfd(&mut self, val: bool) {
self.make_pidfd = val;
}

pub fn saw_nul(&self) -> bool {
self.saw_nul
Expand Down
129 changes: 111 additions & 18 deletions library/std/src/sys/unix/process/process_unix.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ use crate::ptr;
use crate::sys;
use crate::sys::cvt;
use crate::sys::process::process_common::*;
use crate::sync::atomic::{AtomicBool, Ordering};

use libc::{c_int, gid_t, pid_t, uid_t};

Expand Down Expand Up @@ -34,18 +35,7 @@ impl Command {

let (input, output) = sys::pipe::anon_pipe()?;

// Whatever happens after the fork is almost for sure going to touch or
// look at the environment in one way or another (PATH in `execvp` or
// accessing the `environ` pointer ourselves). Make sure no other thread
// is accessing the environment when we do the fork itself.
//
// Note that as soon as we're done with the fork there's no need to hold
// a lock any more because the parent won't do anything and the child is
// in its own process.
let result = unsafe {
let _env_lock = sys::os::env_lock();
cvt(libc::fork())?
};
let (result, pidfd) = self.do_fork()?;

let pid = unsafe {
match result {
Expand All @@ -70,11 +60,11 @@ impl Command {
rtassert!(output.write(&bytes).is_ok());
libc::_exit(1)
}
n => n,
n => n as pid_t,
}
};

let mut p = Process { pid, status: None };
let mut p = Process { pid, status: None, pidfd };
drop(output);
let mut bytes = [0; 8];

Expand Down Expand Up @@ -107,6 +97,95 @@ impl Command {
}
}

// Attempts to fork the process. If successful, returns
// Ok((0, -1)) in the child, and Ok((child_pid, child_pidfd)) in the parent.
fn do_fork(&mut self) -> Result<(libc::c_long, libc::pid_t), io::Error> {
// Whatever happens after the fork is almost for sure going to touch or
// look at thbe environment in one way or another (PATH in `execvp` or
// accessing the `environ` pointer ourselves). Make sure no other thread
// is accessing the environment when we do the fork itself.
//
// Note that as soon as we're done with the fork there's no need to hold
// a lock any more because the parent won't do anything and the child is
// in its own process.
let _env_lock = unsafe { sys::os::env_lock() };

// If we fail to create a pidfd for any reason, this will
// stay as -1, which indicates an error
let mut pidfd: libc::pid_t = -1;

// On Linux, attempt to use the `clone3` syscall, which
// supports more argument (in particular, the ability to create a pidfd).
// If this fails, we will fall through this block to a call to `fork()`
cfg_if::cfg_if! {
if #[cfg(target_os = "linux")] {
static HAS_CLONE3: AtomicBool = AtomicBool::new(true);

const CLONE_PIDFD: u64 = 0x00001000;

#[repr(C)]
struct clone_args {
flags: u64,
pidfd: u64,
child_tid: u64,
parent_tid: u64,
exit_signal: u64,
stack: u64,
stack_size: u64,
tls: u64,
set_tid: u64,
set_tid_size: u64,
cgroup: u64,
}

syscall! {
fn clone3(cl_args: *mut clone_args, len: libc::size_t) -> libc::c_long
}

if HAS_CLONE3.load(Ordering::Relaxed) {
let mut flags = 0;
if self.make_pidfd {
flags |= CLONE_PIDFD;
Copy link
Member

@joshtriplett joshtriplett Oct 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the error path for "the caller asked for a pidfd but the kernel doesn't support clone3"?

(In theory we could have a fallback that tries pidfd_open, but that's less safe for a variety of reasons, so let's just not.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the Some(libc::ENOSYS) => branch below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see that; I just missed that you'd initialized pidfd to -1 at the top, so that it's safe to always return it. That seems fine.

OK, this LGTM. r=me when you're ready.

}

let mut args = clone_args {
flags,
pidfd: &mut pidfd as *mut libc::pid_t as u64,
child_tid: 0,
parent_tid: 0,
exit_signal: libc::SIGCHLD as u64,
stack: 0,
stack_size: 0,
tls: 0,
set_tid: 0,
set_tid_size: 0,
cgroup: 0
};

let args_ptr = &mut args as *mut clone_args;
let args_size = crate::mem::size_of::<clone_args>();

let res = cvt(unsafe { clone3(args_ptr, args_size) });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't run atfork callbacks. This is for example used by rand to force all ReseedingRng instances to reseed themselves. (https://github.com/rust-random/rand/blob/42247d8616b0f8fb62d8a5a6e8dd8f6a17e1b3eb/src/rngs/adapter/reseeding.rs#L287) The rand usage is not relevant here, but there may be other cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought that posix_spawnnp didn't run atfork handlers, but it looks like I was mistaken.
This will require some further thought.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason that I used __clone3 is that glibc's clone() implementation requires the caller to allocate a stack for the new process (even though the kernel allows passing a null stack pointer to the raw clone syscall). We could allocate a stack ourself, but I think it would be better to avoid that if at all possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the posix_spawn man page states:

       According to POSIX, it is unspecified whether fork handlers
       established with pthread_atfork(3) are called when posix_spawn() is
       invoked.  Since glibc 2.24, the fork handlers are not executed in any
       case.  On older implementations, fork handlers are called only if the
       child is created using fork(2).

so I think this should be fine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some extensive discussion going on in glibc upstream about handling and wrapping clone3. See:

match res {
Ok(n) => return Ok((n, pidfd)),
Err(e) => match e.raw_os_error() {
// Multiple threads can race to execute this store,
// but that's fine - that just means that multiple threads
// will have tried and failed to execute the same syscall,
// with no other side effects.
Some(libc::ENOSYS) => HAS_CLONE3.store(false, Ordering::Relaxed),
_ => return Err(e)
}
}
}
}
}
// If we get here, we are either not on Linux,
// or we are on Linux and the 'clone3' syscall does not exist
cvt(unsafe { libc::fork() }.into()).map(|res| (res, pidfd))
}


pub fn exec(&mut self, default: Stdio) -> io::Error {
let envp = self.capture_env();

Expand Down Expand Up @@ -252,8 +331,6 @@ impl Command {
#[cfg(not(any(
target_os = "macos",
target_os = "freebsd",
all(target_os = "linux", target_env = "gnu"),
all(target_os = "linux", target_env = "musl"),
)))]
fn posix_spawn(
&mut self,
Expand All @@ -268,8 +345,6 @@ impl Command {
#[cfg(any(
target_os = "macos",
target_os = "freebsd",
all(target_os = "linux", target_env = "gnu"),
all(target_os = "linux", target_env = "musl"),
))]
fn posix_spawn(
&mut self,
Expand Down Expand Up @@ -404,6 +479,12 @@ impl Command {
pub struct Process {
pid: pid_t,
status: Option<ExitStatus>,
// On Linux, stores the pidfd created for this child.
// This is -1 if the user did not request pidfd creation,
// or if the pidfd could not be created for some reason
// (e.g. the `clone3` syscall was not available).
#[cfg(target_os = "linux")]
pidfd: libc::c_int,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using RawFd for this.

}

impl Process {
Expand Down Expand Up @@ -494,3 +575,15 @@ impl fmt::Display for ExitStatus {
}
}
}

#[cfg(target_os = "linux")]
#[unstable(feature = "linux_pidfd", issue = "none")]
impl crate::os::linux::process::ChildExt for crate::process::Child {
fn pidfd(&self) -> crate::io::Result<i32> {
if self.handle.pidfd > 0 {
Ok(self.handle.pidfd)
} else {
Err(crate::io::Error::from(crate::io::ErrorKind::Other))
}
}
}
27 changes: 27 additions & 0 deletions src/test/ui/command/command-create-pidfd.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
// run-pass
// linux-only - pidfds are a linux-specific concept

#![feature(linux_pidfd)]
use std::os::linux::process::{CommandExt, ChildExt};
use std::process::Command;

fn main() {
// We don't assert the precise value, since the standard libarary
// may be opened other file descriptors before our code ran.
let _ = Command::new("echo")
.create_pidfd(true)
.spawn()
.unwrap()
.pidfd().expect("failed to obtain pidfd");

let _ = Command::new("echo")
.create_pidfd(false)
.spawn()
.unwrap()
.pidfd().expect_err("pidfd should not have been created when create_pid(false) is set");

let _ = Command::new("echo")
.spawn()
.unwrap()
.pidfd().expect_err("pidfd should not have been created");
}