Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for removing unsafe code from bootstrap #109859

Closed
2 of 8 tasks
jyn514 opened this issue Apr 2, 2023 · 7 comments
Closed
2 of 8 tasks

Tracking Issue for removing unsafe code from bootstrap #109859

jyn514 opened this issue Apr 2, 2023 · 7 comments
Labels
C-cleanup Category: PRs that clean code up or issues documenting cleanup. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. O-unix Operating system: Unix-like O-windows Operating system: Windows T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)

Comments

@jyn514
Copy link
Member

jyn514 commented Apr 2, 2023

This is a tracking issue for removing unsafe code from bootstrap. This is an implementation detail of the build system and there is no associated RFC.

About tracking issues

Tracking issues are used to record the overall progress of implementation.
They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions.
A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature.
Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

If you are interested in working on this issue, please leave a comment saying which part you are working on. Do not use @rustbot claim; this issue is too large to be done by a single person. Do not claim a checkbox that has already been claimed without first asking the existing assignee.

Note that much of this code is platform-specific. I strongly recommend not working on Windows-specific code if you don't have a computer that can run Windows, and vice-versa for Unix.

You can ask for help in https://rust-lang.zulipchat.com/#narrow/stream/326414-t-infra.2Fbootstrap, #dark-arts on the community discord (https://discord.gg/rust-lang-community), or #windows-dev on the community discord. Please do not ask for help on this tracking issue since it will quickly get overwhelming.

Here are the current uses of unsafe:

  • Creating a directory junction on Windows:

    rust/src/bootstrap/util.rs

    Lines 149 to 249 in d6f2740

    // Creating a directory junction on windows involves dealing with reparse
    // points and the DeviceIoControl function, and this code is a skeleton of
    // what can be found here:
    //
    // http://www.flexhex.com/docs/articles/hard-links.phtml
    #[cfg(windows)]
    fn symlink_dir_inner(target: &Path, junction: &Path) -> io::Result<()> {
    use std::ffi::OsStr;
    use std::os::windows::ffi::OsStrExt;
    use windows::{
    core::PCWSTR,
    Win32::Foundation::{CloseHandle, HANDLE},
    Win32::Storage::FileSystem::{
    CreateFileW, FILE_ACCESS_FLAGS, FILE_FLAG_BACKUP_SEMANTICS,
    FILE_FLAG_OPEN_REPARSE_POINT, FILE_SHARE_DELETE, FILE_SHARE_READ, FILE_SHARE_WRITE,
    MAXIMUM_REPARSE_DATA_BUFFER_SIZE, OPEN_EXISTING,
    },
    Win32::System::Ioctl::FSCTL_SET_REPARSE_POINT,
    Win32::System::SystemServices::{GENERIC_WRITE, IO_REPARSE_TAG_MOUNT_POINT},
    Win32::System::IO::DeviceIoControl,
    };
    #[allow(non_snake_case)]
    #[repr(C)]
    struct REPARSE_MOUNTPOINT_DATA_BUFFER {
    ReparseTag: u32,
    ReparseDataLength: u32,
    Reserved: u16,
    ReparseTargetLength: u16,
    ReparseTargetMaximumLength: u16,
    Reserved1: u16,
    ReparseTarget: u16,
    }
    fn to_u16s<S: AsRef<OsStr>>(s: S) -> io::Result<Vec<u16>> {
    Ok(s.as_ref().encode_wide().chain(Some(0)).collect())
    }
    // We're using low-level APIs to create the junction, and these are more
    // picky about paths. For example, forward slashes cannot be used as a
    // path separator, so we should try to canonicalize the path first.
    let target = fs::canonicalize(target)?;
    fs::create_dir(junction)?;
    let path = to_u16s(junction)?;
    let h = unsafe {
    CreateFileW(
    PCWSTR(path.as_ptr()),
    FILE_ACCESS_FLAGS(GENERIC_WRITE),
    FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
    None,
    OPEN_EXISTING,
    FILE_FLAG_OPEN_REPARSE_POINT | FILE_FLAG_BACKUP_SEMANTICS,
    HANDLE::default(),
    )
    }
    .map_err(|_| io::Error::last_os_error())?;
    unsafe {
    #[repr(C, align(8))]
    struct Align8<T>(T);
    let mut data = Align8([0u8; MAXIMUM_REPARSE_DATA_BUFFER_SIZE as usize]);
    let db = data.0.as_mut_ptr() as *mut REPARSE_MOUNTPOINT_DATA_BUFFER;
    let buf = core::ptr::addr_of_mut!((*db).ReparseTarget) as *mut u16;
    let mut i = 0;
    // FIXME: this conversion is very hacky
    let v = br"\??\";
    let v = v.iter().map(|x| *x as u16);
    for c in v.chain(target.as_os_str().encode_wide().skip(4)) {
    *buf.offset(i) = c;
    i += 1;
    }
    *buf.offset(i) = 0;
    i += 1;
    (*db).ReparseTag = IO_REPARSE_TAG_MOUNT_POINT;
    (*db).ReparseTargetMaximumLength = (i * 2) as u16;
    (*db).ReparseTargetLength = ((i - 1) * 2) as u16;
    (*db).ReparseDataLength = ((*db).ReparseTargetLength + 12) as u32;
    let mut ret = 0u32;
    DeviceIoControl(
    h,
    FSCTL_SET_REPARSE_POINT,
    Some(db.cast()),
    (*db).ReparseDataLength + 8,
    None,
    0,
    Some(&mut ret),
    None,
    )
    .ok()
    .map_err(|_| io::Error::last_os_error())?;
    }
    unsafe { CloseHandle(h) };
    Ok(())
    }
    This can use https://docs.rs/junction/latest/junction/fn.create.html instead.
  • std::path::absolute, copied into bootstrap:

    rust/src/bootstrap/util.rs

    Lines 560 to 585 in d6f2740

    unsafe {
    // encode the path as UTF-16
    let path: Vec<u16> = path.as_os_str().encode_wide().chain([0]).collect();
    let mut buffer = Vec::new();
    // Loop until either success or failure.
    loop {
    // Try to get the absolute path
    let len = GetFullPathNameW(
    path.as_ptr(),
    buffer.len().try_into().unwrap(),
    buffer.as_mut_ptr(),
    null_mut(),
    );
    match len as usize {
    // Failure
    0 => return Err(Error::last_os_error()),
    // Buffer is too small, resize.
    len if len > buffer.len() => buffer.resize(len, 0),
    // Success!
    len => {
    buffer.truncate(len);
    return Ok(OsString::from_wide(&buffer).into());
    }
    }
    }
    }
    This is blocked on stabilization: Tracking Issue for std::path::absolute #92750
  • rusage metadata on unix:
    #[cfg(unix)]
    /// Tries to build a string with human readable data for several of the rusage
    /// fields. Note that we are focusing mainly on data that we believe to be
    /// supplied on Linux (the `rusage` struct has other fields in it but they are
    /// currently unsupported by Linux).
    fn format_rusage_data(_child: Child) -> Option<String> {
    let rusage: libc::rusage = unsafe {
    let mut recv = std::mem::zeroed();
    // -1 is RUSAGE_CHILDREN, which means to get the rusage for all children
    // (and grandchildren, etc) processes that have respectively terminated
    // and been waited for.
    let retval = libc::getrusage(-1, &mut recv);
    if retval != 0 {
    return None;
    }
    recv
    };
    // Mac OS X reports the maxrss in bytes, not kb.
    let divisor = if env::consts::OS == "macos" { 1024 } else { 1 };
    let maxrss = (rusage.ru_maxrss + (divisor - 1)) / divisor;
    let mut init_str = format!(
    "user: {USER_SEC}.{USER_USEC:03} \
    sys: {SYS_SEC}.{SYS_USEC:03} \
    max rss (kb): {MAXRSS}",
    USER_SEC = rusage.ru_utime.tv_sec,
    USER_USEC = rusage.ru_utime.tv_usec,
    SYS_SEC = rusage.ru_stime.tv_sec,
    SYS_USEC = rusage.ru_stime.tv_usec,
    MAXRSS = maxrss
    );
    // The remaining rusage stats vary in platform support. So we treat
    // uniformly zero values in each category as "not worth printing", since it
    // either means no events of that type occurred, or that the platform
    // does not support it.
    let minflt = rusage.ru_minflt;
    let majflt = rusage.ru_majflt;
    if minflt != 0 || majflt != 0 {
    init_str.push_str(&format!(" page reclaims: {} page faults: {}", minflt, majflt));
    }
    let inblock = rusage.ru_inblock;
    let oublock = rusage.ru_oublock;
    if inblock != 0 || oublock != 0 {
    init_str.push_str(&format!(" fs block inputs: {} fs block outputs: {}", inblock, oublock));
    }
    let nvcsw = rusage.ru_nvcsw;
    let nivcsw = rusage.ru_nivcsw;
    if nvcsw != 0 || nivcsw != 0 {
    init_str.push_str(&format!(
    " voluntary ctxt switches: {} involuntary ctxt switches: {}",
    nvcsw, nivcsw
    ));
    }
    return Some(init_str);
    }
    This can be replaced with https://docs.rs/nix/latest/nix/sys/resource/fn.getrusage.html.
  • rusage metadata on windows:
    #[cfg(windows)]
    fn format_rusage_data(child: Child) -> Option<String> {
    use std::os::windows::io::AsRawHandle;
    use windows::{
    Win32::Foundation::HANDLE,
    Win32::System::ProcessStatus::{
    K32GetProcessMemoryInfo, PROCESS_MEMORY_COUNTERS, PROCESS_MEMORY_COUNTERS_EX,
    },
    Win32::System::Threading::GetProcessTimes,
    Win32::System::Time::FileTimeToSystemTime,
    };
    let handle = HANDLE(child.as_raw_handle() as isize);
    let mut user_filetime = Default::default();
    let mut user_time = Default::default();
    let mut kernel_filetime = Default::default();
    let mut kernel_time = Default::default();
    let mut memory_counters = PROCESS_MEMORY_COUNTERS::default();
    unsafe {
    GetProcessTimes(
    handle,
    &mut Default::default(),
    &mut Default::default(),
    &mut kernel_filetime,
    &mut user_filetime,
    )
    }
    .ok()
    .ok()?;
    unsafe { FileTimeToSystemTime(&user_filetime, &mut user_time) }.ok().ok()?;
    unsafe { FileTimeToSystemTime(&kernel_filetime, &mut kernel_time) }.ok().ok()?;
    // Unlike on Linux with RUSAGE_CHILDREN, this will only return memory information for the process
    // with the given handle and none of that process's children.
    unsafe {
    K32GetProcessMemoryInfo(
    handle,
    &mut memory_counters,
    std::mem::size_of::<PROCESS_MEMORY_COUNTERS_EX>() as u32,
    )
    }
    .ok()
    .ok()?;
    // Guide on interpreting these numbers:
    // https://docs.microsoft.com/en-us/windows/win32/psapi/process-memory-usage-information
    let peak_working_set = memory_counters.PeakWorkingSetSize / 1024;
    let peak_page_file = memory_counters.PeakPagefileUsage / 1024;
    let peak_paged_pool = memory_counters.QuotaPeakPagedPoolUsage / 1024;
    let peak_nonpaged_pool = memory_counters.QuotaPeakNonPagedPoolUsage / 1024;
    Some(format!(
    "user: {USER_SEC}.{USER_USEC:03} \
    sys: {SYS_SEC}.{SYS_USEC:03} \
    peak working set (kb): {PEAK_WORKING_SET} \
    peak page file usage (kb): {PEAK_PAGE_FILE} \
    peak paged pool usage (kb): {PEAK_PAGED_POOL} \
    peak non-paged pool usage (kb): {PEAK_NONPAGED_POOL} \
    page faults: {PAGE_FAULTS}",
    USER_SEC = user_time.wSecond + (user_time.wMinute * 60),
    USER_USEC = user_time.wMilliseconds,
    SYS_SEC = kernel_time.wSecond + (kernel_time.wMinute * 60),
    SYS_USEC = kernel_time.wMilliseconds,
    PEAK_WORKING_SET = peak_working_set,
    PEAK_PAGE_FILE = peak_page_file,
    PEAK_PAGED_POOL = peak_paged_pool,
    PEAK_NONPAGED_POOL = peak_nonpaged_pool,
    PAGE_FAULTS = memory_counters.PageFaultCount,
    ))
    }
    Not sure what to replace this with, maybe there's something in docs.rs/windows? https://microsoft.github.io/windows-docs-rs/doc/windows/Win32/System/Threading/fn.GetProcessTimes.html appears to also be unsafe, though.
  • libc::setpriority:
    pub unsafe fn setup(build: &mut crate::Build) {
    if build.config.low_priority {
    libc::setpriority(libc::PRIO_PGRP as _, 0, 10);
    This does not appear to have a counterpart in nix.
  • libc::getuid:

    rust/src/bootstrap/lib.rs

    Lines 349 to 358 in 249198c

    #[cfg(unix)]
    // keep this consistent with the equivalent check in x.py:
    // https://github.com/rust-lang/rust/blob/a8a33cf27166d3eabaffc58ed3799e054af3b0c6/src/bootstrap/bootstrap.py#L796-L797
    let is_sudo = match env::var_os("SUDO_USER") {
    Some(_sudo_user) => {
    let uid = unsafe { libc::getuid() };
    uid == 0
    }
    None => false,
    };
    This can be replaced with https://docs.rs/nix/latest/nix/unistd/struct.Uid.html#method.current.
  • everything in job.rs, I don't really understand what's going on there:

    rust/src/bootstrap/job.rs

    Lines 50 to 143 in d6f2740

    pub unsafe fn setup(build: &mut Build) {
    // Enable the Windows Error Reporting dialog which msys disables,
    // so we can JIT debug rustc
    let mode = SetErrorMode(THREAD_ERROR_MODE::default());
    let mode = THREAD_ERROR_MODE(mode);
    SetErrorMode(mode & !SEM_NOGPFAULTERRORBOX);
    // Create a new job object for us to use
    let job = CreateJobObjectW(None, PCWSTR::null()).unwrap();
    // Indicate that when all handles to the job object are gone that all
    // process in the object should be killed. Note that this includes our
    // entire process tree by default because we've added ourselves and our
    // children will reside in the job by default.
    let mut info = JOBOBJECT_EXTENDED_LIMIT_INFORMATION::default();
    info.BasicLimitInformation.LimitFlags = JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE;
    if build.config.low_priority {
    info.BasicLimitInformation.LimitFlags |= JOB_OBJECT_LIMIT_PRIORITY_CLASS;
    info.BasicLimitInformation.PriorityClass = BELOW_NORMAL_PRIORITY_CLASS.0;
    }
    let r = SetInformationJobObject(
    job,
    JobObjectExtendedLimitInformation,
    &info as *const _ as *const c_void,
    mem::size_of_val(&info) as u32,
    )
    .ok();
    assert!(r.is_ok(), "{}", io::Error::last_os_error());
    // Assign our process to this job object. Note that if this fails, one very
    // likely reason is that we are ourselves already in a job object! This can
    // happen on the build bots that we've got for Windows, or if just anyone
    // else is instrumenting the build. In this case we just bail out
    // immediately and assume that they take care of it.
    //
    // Also note that nested jobs (why this might fail) are supported in recent
    // versions of Windows, but the version of Windows that our bots are running
    // at least don't support nested job objects.
    let r = AssignProcessToJobObject(job, GetCurrentProcess()).ok();
    if r.is_err() {
    CloseHandle(job);
    return;
    }
    // If we've got a parent process (e.g., the python script that called us)
    // then move ownership of this job object up to them. That way if the python
    // script is killed (e.g., via ctrl-c) then we'll all be torn down.
    //
    // If we don't have a parent (e.g., this was run directly) then we
    // intentionally leak the job object handle. When our process exits
    // (normally or abnormally) it will close the handle implicitly, causing all
    // processes in the job to be cleaned up.
    let pid = match env::var("BOOTSTRAP_PARENT_ID") {
    Ok(s) => s,
    Err(..) => return,
    };
    let parent = match OpenProcess(PROCESS_DUP_HANDLE, false, pid.parse().unwrap()).ok() {
    Some(parent) => parent,
    _ => {
    // If we get a null parent pointer here, it is possible that either
    // we have an invalid pid or the parent process has been closed.
    // Since the first case rarely happens
    // (only when wrongly setting the environmental variable),
    // it might be better to improve the experience of the second case
    // when users have interrupted the parent process and we haven't finish
    // duplicating the handle yet. We just need close the job object if that occurs.
    CloseHandle(job);
    return;
    }
    };
    let mut parent_handle = HANDLE::default();
    let r = DuplicateHandle(
    GetCurrentProcess(),
    job,
    parent,
    &mut parent_handle,
    0,
    false,
    DUPLICATE_SAME_ACCESS,
    )
    .ok();
    // If this failed, well at least we tried! An example of DuplicateHandle
    // failing in the past has been when the wrong python2 package spawned this
    // build system (e.g., the `python2` package in MSYS instead of
    // `mingw-w64-x86_64-python2`. Not sure why it failed, but the "failure
    // mode" here is that we only clean everything up when the build system
    // dies, not when the python parent does, so not too bad.
    if r.is_err() {
    CloseHandle(job);
    }
    }
  • all the unsafe code in cache.rs:
    unsafe impl<T> Send for Interned<T> {}
    unsafe impl<T> Sync for Interned<T> {}
    impl fmt::Display for Interned<String> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
    let s: &str = &*self;
    f.write_str(s)
    }
    }
    impl<T, U: ?Sized + fmt::Debug> fmt::Debug for Interned<T>
    where
    Self: Deref<Target = U>,
    {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
    let s: &U = &*self;
    f.write_fmt(format_args!("{:?}", s))
    }
    }
    impl<T: Internable + Hash> Hash for Interned<T> {
    fn hash<H: Hasher>(&self, state: &mut H) {
    let l = T::intern_cache().lock().unwrap();
    l.get(*self).hash(state)
    }
    }
    impl<T: Internable + Deref> Deref for Interned<T> {
    type Target = T::Target;
    fn deref(&self) -> &Self::Target {
    let l = T::intern_cache().lock().unwrap();
    unsafe { mem::transmute::<&Self::Target, &Self::Target>(l.get(*self)) }
    }
    }
    impl<T: Internable + AsRef<U>, U: ?Sized> AsRef<U> for Interned<T> {
    fn as_ref(&self) -> &U {
    let l = T::intern_cache().lock().unwrap();
    unsafe { mem::transmute::<&U, &U>(l.get(*self).as_ref()) }
    }
    }
    We can replace this with https://docs.rs/lasso/

cc @rust-lang/bootstrap

Unresolved Questions

Are there any safe wrappers for the following APIs?

  • job.rs
  • libc::setpriority
  • rusage on Windows

Implementation history

@jyn514 jyn514 added C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC O-windows Operating system: Windows O-unix Operating system: Unix-like E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. labels Apr 2, 2023
@Mark-Simulacrum
Copy link
Member

I'm not convinced that replacing trivial unsafe for calling libc or windows APIs with dependencies makes sense. That code is rarely changed and the dependency isn't necessarily better vetted (certainly on updates more work).

IMO unsafe code is fine where there's not much in the way of alternative implementations.

@jyn514
Copy link
Member Author

jyn514 commented Apr 2, 2023

The symlink code at least I think makes sense to replace, it's quite complicated and I don't know anyone on the team who could maintain it. I could be convinced getuid and setpriority are ok, but the rusage code is also non-trivial. And once we pull in nix for rusage I don't see a reason not to also use it for getuid and setpriority.

What worries me most is that we don't have any sorts of tests for this code :/ @saethlin has already found one unsoundness in the cache code which broke local-rebuild for a stable release until we backported a fix. I'm not convinced there isn't more unsoundness that we just haven't noticed yet.

@Mark-Simulacrum
Copy link
Member

Third party code that matches our code is just as likely to have problems, imo, and is harder to patch. So I don't really see soundness or lack thereof as a strong argument in this case.

The majority of this code has worked fine for years at this point; I believe the soundness problem wasn't actually leading to UB? But in any case, certainly if we're changing the code replacing it with a library that does equivalent things is fine. But I wouldn't take a goal of "unsafe is special" here.

@Noratrieb
Copy link
Member

cc #109960 :)

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 14, 2023
…afe, r=clubby789

add `SAFETY` block on the usage of unsafe `getuid`

We pointed out this unsafe usage in rust-lang#109859, and as a result, we received a fix PR rust-lang#116476. However, it's important to note that the `libc::getuid()` never actually fails. This PR aims to clarify its safety.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 14, 2023
…afe, r=clubby789

add `SAFETY` block on the usage of unsafe `getuid`

We pointed out this unsafe usage in rust-lang#109859, and as a result, we received a fix PR rust-lang#116476. However, it's important to note that the `libc::getuid()` never actually fails. This PR aims to clarify its safety.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Oct 14, 2023
Rollup merge of rust-lang#116577 - onur-ozkan:add-safety-block-on-unsafe, r=clubby789

add `SAFETY` block on the usage of unsafe `getuid`

We pointed out this unsafe usage in rust-lang#109859, and as a result, we received a fix PR rust-lang#116476. However, it's important to note that the `libc::getuid()` never actually fails. This PR aims to clarify its safety.
@GrigorenkoPV
Copy link
Contributor

GrigorenkoPV commented Jul 27, 2024

The OP is a bit outdated, here's the current state of affairs.

@jieyouxu
Copy link
Member

Closing because:

  • cache.rs had unsound usages but have gone through multiple reviews now.
  • jobs.rs contains unsafe and syscalls, but using a dependency means another dependency (and its transitive deps) and we'll have to audit those anyway.
  • Replacing syscalls with dependencies (which means adding more dependencies) is not a trade-off that seems very reasonable.

E.g. replacing std::path::absolute and cleanups are very beneficial, but Just Using Deps is not really a strict improvement.

@onur-ozkan
Copy link
Member

For more context, see the relevant Zulip thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-cleanup Category: PRs that clean code up or issues documenting cleanup. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. O-unix Operating system: Unix-like O-windows Operating system: Windows T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

No branches or pull requests

6 participants