Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop probing for statx unless necessary #106661

Merged
merged 2 commits into from
Jan 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 41 additions & 27 deletions library/std/src/sys/unix/fs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -149,12 +149,13 @@ cfg_has_statx! {{
) -> Option<io::Result<FileAttr>> {
use crate::sync::atomic::{AtomicU8, Ordering};

// Linux kernel prior to 4.11 or glibc prior to glibc 2.28 don't support `statx`
// We store the availability in global to avoid unnecessary syscalls.
// 0: Unknown
// 1: Not available
// 2: Available
static STATX_STATE: AtomicU8 = AtomicU8::new(0);
// Linux kernel prior to 4.11 or glibc prior to glibc 2.28 don't support `statx`.
// We check for it on first failure and remember availability to avoid having to
// do it again.
#[repr(u8)]
enum STATX_STATE{ Unknown = 0, Present, Unavailable }
static STATX_SAVED_STATE: AtomicU8 = AtomicU8::new(STATX_STATE::Unknown as u8);

syscall! {
fn statx(
fd: c_int,
Expand All @@ -165,31 +166,44 @@ cfg_has_statx! {{
) -> c_int
}

match STATX_STATE.load(Ordering::Relaxed) {
0 => {
// It is a trick to call `statx` with null pointers to check if the syscall
// is available. According to the manual, it is expected to fail with EFAULT.
// We do this mainly for performance, since it is nearly hundreds times
// faster than a normal successful call.
let err = cvt(statx(0, ptr::null(), 0, libc::STATX_ALL, ptr::null_mut()))
.err()
.and_then(|e| e.raw_os_error());
// We don't check `err == Some(libc::ENOSYS)` because the syscall may be limited
// and returns `EPERM`. Listing all possible errors seems not a good idea.
// See: https://github.com/rust-lang/rust/issues/65662
if err != Some(libc::EFAULT) {
STATX_STATE.store(1, Ordering::Relaxed);
return None;
}
STATX_STATE.store(2, Ordering::Relaxed);
}
1 => return None,
_ => {}
if STATX_SAVED_STATE.load(Ordering::Relaxed) == STATX_STATE::Unavailable as u8 {
return None;
}

let mut buf: libc::statx = mem::zeroed();
if let Err(err) = cvt(statx(fd, path, flags, mask, &mut buf)) {
return Some(Err(err));
if STATX_SAVED_STATE.load(Ordering::Relaxed) == STATX_STATE::Present as u8 {
return Some(Err(err));
}
Comment on lines +175 to +177
Copy link
Member

@the8472 the8472 Jan 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not quite correct to rely on this because syscall filters can be installed during program runtime. So the behavior can change. At that point ENOSYS could bubble up to user code because the flag now says all errors should be bubbled up.

If ENOSYS and EPERM can be reliably distinguished from other errors then updating the atomic every time such an error occurs is better, as we do for splice and sendfile
If EPERM can also convey a real error then the pre-probing behavior that existed before this PR is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a real issue here.

Detection only showed up because some deployments got rust utilizing statx running on kernels which did not have it yet or otherwise did not have configuration updated to cope with it, resulting in EPERM from seccomp.

To my understanding statx is the way to stat going forward and as such disabling it on purpose is a misconfiguration of the system, on par with disabling a syscall like open.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any sane seccomp filters are deny-by-default, which means any filter list that doesn't explicitly opt into statx can end up blocking the call.
Combine that with the fact that people run outdated software (which can contain equally outdated filter lists) it is possible that you end up with a situation where stat is allowed but statx isn't.

And what happens in the wild is not really relevant for correctness. For correctness it only matters that syscall availability can change at runtime.

People do all kinds of crazy stuff and we need to be defensive.

Copy link
Contributor Author

@mjguzik mjguzik Jan 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case I have to stress that with and without the patch the rust binary will ultimately detect statx (un)availabilty the same way, whether the kernel does not support statx to begin with or seccomp is configured to disable it, as long as the latter happened prior to running detection code.

I have to concede it is possible someone has outdated seccomp filters, which happen to be disabled when a rust binary starts execing and only get enabled later. In such a case errors from seccomp will indeed be returned. But this was already the case with the code prior to me patching it (read: no regression on that front).

I also feel inclined to note the sendfile example does not "update the atomic every time". The code speculatively executes the syscall, presumably to avoid paying specifically for detection, and the atomic store is a one time ordeal per exec -- any new calls past that fail to get there. Well, one may nitpick a multithreaded program can have several stores, but the point stands.

If anything my patch moved statx handling closer to that spirit, but in contrast it still does not handle such a failure at an arbitrary moment.

Now, legally encountered permission problems return EACCESS. The kernel has some hairy code to support all of this and I definitely would not stack my neck on EPERM being only returned by seccomp.

All that said, as mentioned in another comment, I plan to post a followup to fix the pre-existing bugs I mentioned in FIXME comments. If it does not get hairy, I may as well throw an extra check for seccomp if EPERM is seen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, totally agreed, it's an edge-case. But seccomp filters getting installed during the runtime of a program after doing some setup work is definitely a thing that happens in daemons that lock themselves down.

All that said, as mentioned in another comment, I plan to post a followup to fix the pre-existing bugs

👍


// Availability not checked yet.
//
// First try the cheap way.
if err.raw_os_error() == Some(libc::ENOSYS) {
STATX_SAVED_STATE.store(STATX_STATE::Unavailable as u8, Ordering::Relaxed);
return None;
}

// Error other than `ENOSYS` is not a good enough indicator -- it is
// known that `EPERM` can be returned as a result of using seccomp to
// block the syscall.
// Availability is checked by performing a call which expects `EFAULT`
// if the syscall is usable.
// See: https://github.com/rust-lang/rust/issues/65662
// FIXME this can probably just do the call if `EPERM` was received, but
// previous iteration of the code checked it for all errors and for now
// this is retained.
// FIXME what about transient conditions like `ENOMEM`?
let err2 = cvt(statx(0, ptr::null(), 0, libc::STATX_ALL, ptr::null_mut()))
.err()
.and_then(|e| e.raw_os_error());
if err2 == Some(libc::EFAULT) {
STATX_SAVED_STATE.store(STATX_STATE::Present as u8, Ordering::Relaxed);
return Some(Err(err));
} else {
STATX_SAVED_STATE.store(STATX_STATE::Unavailable as u8, Ordering::Relaxed);
return None;
}
}

// We cannot fill `stat64` exhaustively because of private padding fields.
Expand Down
5 changes: 3 additions & 2 deletions src/tools/miri/tests/pass-dep/shims/libc-fs-with-isolation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ fn main() {
}

// test `stat`
assert_eq!(fs::metadata("foo.txt").unwrap_err().kind(), ErrorKind::PermissionDenied);
let err = fs::metadata("foo.txt").unwrap_err();
assert_eq!(err.kind(), ErrorKind::PermissionDenied);
// check that it is the right kind of `PermissionDenied`
assert_eq!(Error::last_os_error().raw_os_error(), Some(libc::EACCES));
assert_eq!(err.raw_os_error(), Some(libc::EACCES));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this change. However, the raw_os_error docs do indicate that this does return the error from the previous stdlib call, so isn't it strange that this changed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errno is changed to EFAULT because the statx state is checked after the initial call returned an error. Since EFAULT indicates that statx is available, the error returned is the one by the first statx call, which has a different error code (like EACCES).

}