Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File descriptors leak control #27609

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions src/liblibc/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3410,6 +3410,8 @@ pub mod consts {
pub const F_GETFL : c_int = 3;
pub const F_SETFL : c_int = 4;

pub const FD_CLOEXEC : c_int = 1;

pub const O_ACCMODE : c_int = 3;

pub const SIGTRAP : c_int = 5;
Expand Down Expand Up @@ -3522,6 +3524,8 @@ pub mod consts {
pub const F_GETFL : c_int = 3;
pub const F_SETFL : c_int = 4;

pub const FD_CLOEXEC : c_int = 1;

pub const SIGTRAP : c_int = 5;
pub const SIG_IGN: size_t = 1;

Expand Down Expand Up @@ -4174,6 +4178,8 @@ pub mod consts {
pub const F_GETFL : c_int = 3;
pub const F_SETFL : c_int = 4;

pub const FD_CLOEXEC : c_int = 1;

pub const SIGTRAP : c_int = 5;
pub const SIG_IGN: size_t = 1;

Expand Down Expand Up @@ -4633,6 +4639,8 @@ pub mod consts {
pub const F_SETLKW : c_int = 9;
pub const F_DUPFD_CLOEXEC : c_int = 10;

pub const FD_CLOEXEC : c_int = 1;

pub const SIGTRAP : c_int = 5;
pub const SIG_IGN: size_t = 1;

Expand Down Expand Up @@ -5072,6 +5080,8 @@ pub mod consts {
pub const F_GETFL : c_int = 3;
pub const F_SETFL : c_int = 4;

pub const FD_CLOEXEC : c_int = 1;

pub const O_ACCMODE : c_int = 3;

pub const SIGTRAP : c_int = 5;
Expand Down Expand Up @@ -5795,6 +5805,7 @@ pub mod funcs {

extern {
pub fn closedir(dirp: *mut DIR) -> c_int;
pub fn dirfd(dirp: *const DIR) -> c_int;
pub fn rewinddir(dirp: *mut DIR);
pub fn seekdir(dirp: *mut DIR, loc: c_long);
pub fn telldir(dirp: *mut DIR) -> c_long;
Expand Down
4 changes: 4 additions & 0 deletions src/libstd/fs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -674,6 +674,10 @@ impl Iterator for ReadDir {
}
}

impl AsInner<fs_imp::ReadDir> for ReadDir {
fn as_inner(&self) -> &fs_imp::ReadDir { &self.0 }
}

#[stable(feature = "rust1", since = "1.0.0")]
impl DirEntry {
/// Returns the full path to the file that this entry represents.
Expand Down
4 changes: 4 additions & 0 deletions src/libstd/sys/unix/ext/io.rs
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,10 @@ impl AsRawFd for net::TcpListener {
impl AsRawFd for net::UdpSocket {
fn as_raw_fd(&self) -> RawFd { *self.as_inner().socket().as_inner() }
}
#[unstable(feature = "raw_dirfd", reason = "recently added")]
impl AsRawFd for fs::ReadDir {
fn as_raw_fd(&self) -> RawFd { self.as_inner().as_inner().dirfd() }
}

#[stable(feature = "from_raw_os", since = "1.1.0")]
impl FromRawFd for net::TcpStream {
Expand Down
25 changes: 25 additions & 0 deletions src/libstd/sys/unix/ext/process.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

#![stable(feature = "rust1", since = "1.0.0")]

use collections::HashSet;
use os::unix::raw::{uid_t, gid_t};
use os::unix::io::{FromRawFd, RawFd, AsRawFd, IntoRawFd};
#[cfg(stage0)]
Expand Down Expand Up @@ -44,6 +45,18 @@ pub trait CommandExt {
/// spawn another process (the daemon) in the same session.
#[unstable(feature = "process_session_leader", reason = "recently added")]
fn session_leader(&mut self, on: bool) -> &mut process::Command;

/// Set to `false` to prevent file descriptors leak (default is `true`).
#[unstable(feature = "process_leak_fds", reason = "recently added")]
fn leak_fds(&mut self, on: bool) -> &mut process::Command;

/// Allow to prevent file descriptors leak except for an authorized whitelist.
///
/// The file descriptors in the whitelist will leak through *all* the subsequent executions
/// (cf. `open(2)` and `O_CLOEXEC`). The new process should change the property of this file
/// descriptors to avoid unintended leaks (cf. `fcntl(2)` and `FD_CLOEXEC`).
#[unstable(feature = "process_leak_fds", reason = "recently added")]
fn leak_fds_whitelist(&mut self, leak: HashSet<RawFd>) -> &mut process::Command;
}

#[stable(feature = "rust1", since = "1.0.0")]
Expand All @@ -62,6 +75,18 @@ impl CommandExt for process::Command {
self.as_inner_mut().session_leader = on;
self
}

fn leak_fds(&mut self, on: bool) -> &mut process::Command {
self.as_inner_mut().leak_fds = on;
self
}

fn leak_fds_whitelist(&mut self, whitelist: HashSet<RawFd>) -> &mut process::Command {
self.as_inner_mut().leak_fds_whitelist = whitelist;
// Do not leak any FDs except those from the whitelist
self.as_inner_mut().leak_fds = false;
self
}
}

/// Unix-specific extensions to `std::process::ExitStatus`
Expand Down
10 changes: 10 additions & 0 deletions src/libstd/sys/unix/fs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,16 @@ impl Iterator for ReadDir {
}
}

impl AsInner<Dir> for ReadDir {
fn as_inner(&self) -> &Dir { &self.dirp }
}

impl Dir {
pub fn dirfd(&self) -> RawFd {
unsafe { ::libc::dirfd(self.0) }
}
}

impl Drop for Dir {
fn drop(&mut self) {
let r = unsafe { libc::closedir(self.0) };
Expand Down
100 changes: 99 additions & 1 deletion src/libstd/sys/unix/process.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
use prelude::v1::*;
use os::unix::prelude::*;

use collections::HashMap;
use collections::{HashMap, HashSet};
use env;
use ffi::{OsString, OsStr, CString, CStr};
use fmt;
Expand All @@ -37,6 +37,8 @@ pub struct Command {
pub uid: Option<uid_t>,
pub gid: Option<gid_t>,
pub session_leader: bool,
pub leak_fds: bool,
pub leak_fds_whitelist: HashSet<RawFd>,
}

impl Command {
Expand All @@ -49,6 +51,9 @@ impl Command {
uid: None,
gid: None,
session_leader: false,
// Backward compatibility:
leak_fds: true,
leak_fds_whitelist: HashSet::new(),
}
}

Expand Down Expand Up @@ -254,6 +259,81 @@ impl Process {
unsafe { libc::_exit(1) }
}

fn walk_open_fd_brute<F>(action_fd: F) where F: Fn(RawFd) {
// The `getdtablesize(3)` and `sysconf(_SC_OPEN_MAX)` could miss file descriptors if
// the RLIMIT_NOFILE is lowered (cf. #12148) but there is no good way to list them all
// except with procfs (cf. `walk_open_fd_proc()`).
// However, an unprivileged process (i.e. without CAP_SYS_RESOURCE on Linux) is not
// allowed to use a file descriptor upper or equal to it's hard limit nor to change
// this limit upwards.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's simply not true.

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <sys/resource.h>

int main(void)
{
    struct rlimit lim;
    lim.rlim_cur = 0;
    lim.rlim_max = 0;

    setrlimit(RLIMIT_NOFILE, &lim);

    puts("Hello world!");
    return EXIT_SUCCESS;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, my tests was misleading, any process can use a file descriptor upper or equal to it's hard limit.

Do you know a way to get the full list of all usable file descriptors?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, there is not such a method.

let begin = libc::STDERR_FILENO + 1;
let end = unsafe { libc::sysconf(libc::consts::os::sysconf::_SC_OPEN_MAX) } as RawFd;
for fd in begin..end {
action_fd(fd);
}
}

#[cfg(any(target_os = "linux", target_os = "android"))]
fn walk_open_fd_proc<F>(action_fd: F) where F: Fn(RawFd) {
// Take care to look at a real procfs
let is_procfs = match ::fs::metadata("/proc") {
Ok(fs) => {
// The device ID is 3 for a bare Linux but change for other namespaces (e.g.
// with LXC). With Linux user namespace, the UID and GID may be different than
// 0, so we only check the root inode.
fs.is_dir() && fs.ino() == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If /proc is not as expected then the system operator is very deliberately lying to your program. Most probably, you are in a sandboxed environment where /proc/self/fd is a bind mount and the rest of /proc has been very deliberately left out on purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The system operator may have create the /proc directory but forgot to mount it. This is only a safeguard but I'm not sure it's really useful.

I never seen any sandbox bind-mounting /proc/<pid>/fd and I don't see any good reason to do so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sandboxing is an obvious reason for doing this. Lots of people like to run their web servers in very restricted chroots and /proc exposes lots of potentially dangerous functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I known well sandboxing. To be compatible with absent /proc, my patch fall back to the brute force FDs closing.

},
Err(..) => false
};

if !is_procfs {
return walk_open_fd_brute(action_fd);
}

// There is currently no way to atomically stat a file/directory and use it (cf.
// #15643).
match ::fs::read_dir("/proc/self/fd") {
Ok(dir) => {
let current = dir.as_raw_fd();
for ret in dir {
match ret {
Ok(entry) => {
let filename = entry.file_name();
let filename = match filename.to_str() {
Some(s) => s,
None => return walk_open_fd_brute(action_fd),
};
let fd = match ::str::FromStr::from_str(filename) {
Ok(fd) => fd,
Err(..) => return walk_open_fd_brute(action_fd),
};
match fd {
libc::STDIN_FILENO => {}
libc::STDOUT_FILENO => {}
libc::STDERR_FILENO => {}
other => if other != current {
action_fd(other);
}
}
}
Err(..) => return walk_open_fd_brute(action_fd),
}
}
},
Err(..) => walk_open_fd_brute(action_fd),
}
}

#[cfg(not(any(target_os = "linux", target_os = "android")))]
fn walk_open_fd<F>(action_fd: F) where F: Fn(RawFd) {
walk_open_fd_brute(action_fd)
}

#[cfg(any(target_os = "linux", target_os = "android"))]
fn walk_open_fd<F>(action_fd: F) where F: Fn(RawFd) {
walk_open_fd_proc(action_fd)
}

let setup = |src: Stdio, dst: c_int| {
match src {
Stdio::Inherit => true,
Expand Down Expand Up @@ -283,6 +363,24 @@ impl Process {
if !setup(out_fd, libc::STDOUT_FILENO) { fail(&mut output) }
if !setup(err_fd, libc::STDERR_FILENO) { fail(&mut output) }

let close_or_leak = |fd: RawFd| {
if cfg.leak_fds_whitelist.contains(&fd) {
let flag_orig = libc::fcntl(fd, libc::F_GETFD, 0);
if flag_orig < 0 {
return;
}
let flag_new = flag_orig & !libc::FD_CLOEXEC;
if flag_orig != flag_new {
let _ = libc::fcntl(fd, libc::F_SETFD, flag_new);
}
} else {
let _ = libc::close(fd);
}
};
if !cfg.leak_fds {
walk_open_fd(close_or_leak);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. You can't close fds as you walk over the /proc/self/fd directory. That's like modifying a list as you iterate over it. Also, allocating memory (with opendir) after forking (in a multithreaded program) is just plain unsafe.

The "correct" way to do this is to use the FD_CLOEXEC flag to mark the fds you want to close to close on exec as you iterate over /proc/self/fd using the raw getdents system call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called just after the fork, when there is no other thread nor any open(2) call, so no more FD are created (except the one from opendir which is ignored).

This post-fork code may deadlock for OSX and FreeBSD (as says the child_after_fork comment) but walk_open_fd_proc() is only called on Linux. Not sure it's worth using the raw getdents(2) here.

We could use fcntl/FD_CLOEXEC instead of close(2) but it imply two syscalls (one to get the flags and another to update them) which should be slower.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the FIOCLEX ioctl to set the flag atomically and in one system call. I'm not sure what you mean about how no more file descriptors are created because there are no threads. I was talking about how other threads that weren't copied over to the new process by the fork may be holding locks deep inside malloc or many other functions and so cause deadlock in the forked process. I'm not sure what you mean about whether or not getdents is worth it. If you want to implement this feature correctly you have to use getdents. If you cannot implement this feature correctly then you should not implement it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FIOCLEX ioctl seems to fit well.

I'm not sure what you mean about how no more file descriptors are created because there are no threads.

The main (parent) process may have multiple threads. After the fork, the child process have only one thread. The parent threads file descriptors should not be impacted by the FD_CLOEXEC operations on the child (post-fork) file descriptors (which are shared between all threads of a same process).

I was talking about how other threads that weren't copied over to the new process by the fork may be holding locks deep inside malloc or many other functions and so cause deadlock in the forked process. I'm not sure what you mean about whether or not getdents is worth it.

As I said before and like the comment said: "this is what pthread_atfork() takes care of".

If you cannot implement this feature correctly then you should not implement it.

Yeah, of course I want to do it right :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the situation with standard libraries using pthread_atfork to correctly unlock locks has always been messy (see https://bugzilla.redhat.com/show_bug.cgi?id=906468, see https://stackoverflow.com/questions/29106483/glibc-malloc-deadlock-during-string-reserve, see https://code.google.com/p/gperftools/issues/detail?id=496, see https://code.google.com/p/android/issues/detail?id=178033, see https://bugzilla.mozilla.org/show_bug.cgi?id=680190). These were just a few of the bugs I was able to quickly search for. Being able to use malloc after a fork in a multi-threaded process isn't supported by standards and not well supported across platforms.

One horrible hack to support this might be to exec a helper binary (which can be the same file as a linked-in shared object).

}

if let Some(u) = cfg.gid {
if libc::setgid(u as libc::gid_t) != 0 {
fail(&mut output);
Expand Down
Loading