-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the robustness of closing open file descriptors in jailer #3543
Conversation
678ef96
to
108c764
Compare
Firecracker currently has full supported for 4.14 (kernel-policy). I believe the
Please correct me if I'm wrong, but otherwise I think this blocks this. (I recall this was what prevented us using this approach previously) |
@JonathanWoollett-Light that's why I introduced fallbacks for older kernels. Essentially this patch on: Please provide feedback if having those fall-backs is feasible in firecracker. |
108c764
to
7020692
Compare
@JonathanWoollett-Light I just pushed a V2 of this patch. Changes compared to V1:
|
2286426
to
813f471
Compare
@JonathanWoollett-Light renamed the patch to a more appropriate name |
I think we can do this check at compile time. If we include a fn main() {
let process = std::process::Command::new("uname").arg("-r").output().unwrap();
assert!(process.status.success());
let version = std::str::from_utf8(&process.stdout).unwrap();
let mut iter = version.split(".");
let major = iter.next().unwrap().parse::<u32>().unwrap();
let minor = iter.next().unwrap().parse::<u32>().unwrap();
// Firecracker and the jailer officaily supports 2 kernel versions 5.10 and 4.14.
//
// Some improved functionality is supported in the 5.10 version that is not supported in 4.14,
// to accmodate this we set conditionl compilation flags for these versions.
//
// In developing Firecracker and the jailer a newer kernel may be used, to support this we don't
// match the versions exactly.
match (major,minor) {
(5.., 10..) => println!("cargo:rustc-cfg=kernel=\"5.10\""),
(4, 14..) => println!("cargo:rustc-cfg=kernel=\"4.14\""),
_ => panic!("unsupported kernel version")
}
} and then change fn sanitize_process() {
// First thing to do is make sure we don't keep any inherited FDs
// other that IN, OUT and ERR.
#[cfg(kernel="5.10")]
{
// This syscall was introduced in 5.9 and is not supported in 4.14.
// SAFETY: The values are constant and valid.
unsafe {
libc::syscall(
libc::SYS_close_range,
3,
libc::c_uint::MAX,
libc::CLOSE_RANGE_UNSHARE,
)
}
}
#[cfg(kernel="4.14")]
{
// SAFETY: Always safe.
let fd_limit = i32::try_from(unsafe { libc::sysconf(libc::_SC_OPEN_MAX) }).unwrap();
// Close all file descriptors excluding 0 (STDIN), 1 (STDOUT) and 2 (STDERR).
for fd in 3..fd_limit {
// SAFETY: Safe because close() cannot fail when passed a valid parameter.
unsafe {
libc::close(fd);
}
}
}
// Cleanup environment variables
clean_env_vars();
} it should fix this. Does this solution appear good to you? This can be altered but I think doing this at compile time is preferable. |
@JonathanWoollett-Light
This PR suggest having 3 methods for doing the same thing and in case of errors fall backing to the other methods in runtime until one of the approaches works. If having the same functionality written in 3 different ways is a problem then only reading /proc using the nix crate would be OK and solve the initial issue, the drawback to this is that it won't work on systems without /proc mounted. The jailer setup is a one time cost and one might have a pool of jailer instances ready ahead of time. Please correct me if I'm wrong, but I think adding one additional syscall during startup on the 4.14 kernel when jailer was already doing thousands of them to close all possible fds won't change the performance much. |
Introduces multiple methods for closing all open FDs in jailer. The method is choosen based on the environment. On host kernel >= 5.9 uses the close_range syscall On host kernel < 5.9 when OPEN_MAX <= 2**16 iterates over all FDs On host kernel < 5.9 when OPEN_MAX > 2**16 discovers open FDs by reading from /proc/self/fds using the nix crate Fixes a bug where firecracker would end up spending minutes starting which arises when running on systems with OPEN_MAX set to a very high number. Signed-off-by: Grzegorz Uriasz <gorbak25@gmail.com>
813f471
to
447fae1
Compare
@JonathanWoollett-Light any feedback on what approach this PR should take? |
I think what you suggest is right. A 1st commit could implement the fix on >5.9, this can look like: fn sanitize_process() {
// First thing to do is make sure we don't keep any inherited FDs
// other that IN, OUT and ERR.
// Try using the close_range syscall to close all open FDs in the range of 4..UINT_MAX
// SAFETY: The parameters are valid constants.
let res = unsafe { libc::syscall(
libc::SYS_close_range,
3,
libc::c_uint::MAX,
libc::CLOSE_RANGE_UNSHARE,
) };
// If `SYS_close_range` fails as it is not supported (as is the case on a pre 5.9 kernel), we
// instead close by iterating through all possible file descriptors.
match res {
0 => (),
// The syscall failed due to not being supported.
// SAFETY: Always safe.
-1 => if unsafe { *libc::__errno_location } == libc::ENOSYS {
// SAFETY: Always safe.
let fd_limit = i32::try_from(unsafe { libc::sysconf(libc::_SC_OPEN_MAX) }).unwrap();
// Close all file descriptors excluding 0 (STDIN), 1 (STDOUT) and 2 (STDERR).
for fd in 3..fd_limit {
// SAFETY: Safe because close() cannot fail when passed a valid parameter.
unsafe {
libc::close(fd);
}
}
},
else { panic!("Failed to close inherited FDs."); }
},
_ => unreachable!()
}
// Cleanup environment variables
clean_env_vars();
} I am unsure about pulling in Nix, if this is required for the fallback, this will need to be discussed. If we want to get the 1st commit merged quickly I think using Nix to iterate through |
@JonathanWoollett-Light will come back to this PR on wednesday, I'm on a trip without my laptop right now. |
hello @gorbak25, Thanks a lot for the proposed fix. The changes look reasonable to me in general, there are a couple of things I would like to adjust:
Since this issue introduces a customer-facing regression, I would like to speed up the merging process so I opened a separate PR that starts from your commit and contains the proposed adjustments. Your contribution is mentioned there and kept you as the commit author! Please take a look at this alternate PR and review. Thanks a lot again for looking into this! |
Changes
Closes all open FDs in a single syscall. In case close_range is not available then fallback to the old approach. In case the fallback fails then discover open FDs by reading from /proc/self/fds.
Reason
Fixes #3542 - a bug where firecracker would end up spending minutes starting which arises when running on systems with OPEN_MAX set to a very high number.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
CHANGELOG.md
.TODO
s link to an issue.rust-vmm
.