Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bounds check fail/integer overflow in unicode_data::skip_search via cc when building with LTO #82890

Closed
saethlin opened this issue Mar 8, 2021 · 23 comments
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group P-critical Critical priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly.
Milestone

Comments

@saethlin
Copy link
Member

saethlin commented Mar 8, 2021

RUSTFLAGS="-Cembed-bitcode=yes -Clto=fat -Cdebuginfo=2" cargo install cargo-make --force
  thread 'main' panicked at 'index out of bounds: the len is 31 but the index is 18446744073709551615', library/core/src/unicode/unicode_data.rs:82:62

I forgot the --target=x86_64-unknown-linux-gnu that would be required to make this actually work (build will eventually fail because LTO and proc-macro dylib crates). If I add the --target, I don't see any crashes. About half the time I run the cargo install invocation, I get a panic and the backtrace below (-Cdebuginfo=2 for the backtrace, doesn't seem related to the crash). I do not see this panic if I swap out -Clto=fat for -Copt-level=3.

Meta

rustc --version --verbose:

rustc 1.52.0-nightly (234781afe 2021-03-07)
binary: rustc
commit-hash: 234781afe33d3f339b002f85f948046d8476cfc9
commit-date: 2021-03-07
host: x86_64-unknown-linux-gnu
release: 1.52.0-nightly
LLVM version: 12.0.0
Backtrace

error: failed to run custom build command for `openssl-sys v0.9.60`

Caused by:
  process didn't exit successfully: `/tmp/cargo-installjmgYde/release/build/openssl-sys-ad87746b93d0b377/build-script-main` (exit code: 101)
  --- stdout
  cargo:rustc-cfg=const_fn
  cargo:rerun-if-env-changed=X86_64_UNKNOWN_LINUX_GNU_OPENSSL_LIB_DIR
  X86_64_UNKNOWN_LINUX_GNU_OPENSSL_LIB_DIR unset
  cargo:rerun-if-env-changed=OPENSSL_LIB_DIR
  OPENSSL_LIB_DIR unset
  cargo:rerun-if-env-changed=X86_64_UNKNOWN_LINUX_GNU_OPENSSL_INCLUDE_DIR
  X86_64_UNKNOWN_LINUX_GNU_OPENSSL_INCLUDE_DIR unset
  cargo:rerun-if-env-changed=OPENSSL_INCLUDE_DIR
  OPENSSL_INCLUDE_DIR unset
  cargo:rerun-if-env-changed=X86_64_UNKNOWN_LINUX_GNU_OPENSSL_DIR
  X86_64_UNKNOWN_LINUX_GNU_OPENSSL_DIR unset
  cargo:rerun-if-env-changed=OPENSSL_DIR
  OPENSSL_DIR unset
  cargo:rerun-if-env-changed=OPENSSL_NO_PKG_CONFIG
  cargo:rerun-if-env-changed=PKG_CONFIG
  cargo:rerun-if-env-changed=OPENSSL_STATIC
  cargo:rerun-if-env-changed=OPENSSL_DYNAMIC
  cargo:rerun-if-env-changed=PKG_CONFIG_ALL_STATIC
  cargo:rerun-if-env-changed=PKG_CONFIG_ALL_DYNAMIC
  cargo:rerun-if-env-changed=PKG_CONFIG_PATH_x86_64-unknown-linux-gnu
  cargo:rerun-if-env-changed=PKG_CONFIG_PATH_x86_64_unknown_linux_gnu
  cargo:rerun-if-env-changed=HOST_PKG_CONFIG_PATH
  cargo:rerun-if-env-changed=PKG_CONFIG_PATH
  cargo:rerun-if-env-changed=PKG_CONFIG_LIBDIR_x86_64-unknown-linux-gnu
  cargo:rerun-if-env-changed=PKG_CONFIG_LIBDIR_x86_64_unknown_linux_gnu
  cargo:rerun-if-env-changed=HOST_PKG_CONFIG_LIBDIR
  cargo:rerun-if-env-changed=PKG_CONFIG_LIBDIR
  cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR_x86_64-unknown-linux-gnu
  cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR_x86_64_unknown_linux_gnu
  cargo:rerun-if-env-changed=HOST_PKG_CONFIG_SYSROOT_DIR
  cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR
  cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR
  cargo:rerun-if-env-changed=SYSROOT
  cargo:rerun-if-env-changed=OPENSSL_STATIC
  cargo:rerun-if-env-changed=OPENSSL_DYNAMIC
  cargo:rerun-if-env-changed=PKG_CONFIG_ALL_STATIC
  cargo:rerun-if-env-changed=PKG_CONFIG_ALL_DYNAMIC
  cargo:rustc-link-lib=ssl
  cargo:rustc-link-lib=crypto
  cargo:rerun-if-env-changed=PKG_CONFIG
  cargo:rerun-if-env-changed=OPENSSL_STATIC
  cargo:rerun-if-env-changed=OPENSSL_DYNAMIC
  cargo:rerun-if-env-changed=PKG_CONFIG_ALL_STATIC
  cargo:rerun-if-env-changed=PKG_CONFIG_ALL_DYNAMIC
  cargo:rerun-if-env-changed=PKG_CONFIG_PATH_x86_64-unknown-linux-gnu
  cargo:rerun-if-env-changed=PKG_CONFIG_PATH_x86_64_unknown_linux_gnu
  cargo:rerun-if-env-changed=HOST_PKG_CONFIG_PATH
  cargo:rerun-if-env-changed=PKG_CONFIG_PATH
  cargo:rerun-if-env-changed=PKG_CONFIG_LIBDIR_x86_64-unknown-linux-gnu
  cargo:rerun-if-env-changed=PKG_CONFIG_LIBDIR_x86_64_unknown_linux_gnu
  cargo:rerun-if-env-changed=HOST_PKG_CONFIG_LIBDIR
  cargo:rerun-if-env-changed=PKG_CONFIG_LIBDIR
  cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR_x86_64-unknown-linux-gnu
  cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR_x86_64_unknown_linux_gnu
  cargo:rerun-if-env-changed=HOST_PKG_CONFIG_SYSROOT_DIR
  cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR

  --- stderr
  thread 'main' panicked at 'index out of bounds: the len is 31 but the index is 18446744073709551615', library/core/src/unicode/unicode_data.rs:82:62
  stack backtrace:
     0: rust_begin_unwind
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/std/src/panicking.rs:493:5
     1: core::panicking::panic_fmt
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/panicking.rs:92:14
     2: core::panicking::panic_bounds_check
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/panicking.rs:69:5
     3: core::unicode::unicode_data::skip_search::{{closure}}
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/unicode/unicode_data.rs:82:62
     4: core::option::Option<T>::map
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/option.rs:487:29
     5: core::unicode::unicode_data::skip_search
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/unicode/unicode_data.rs:82:9
     6: core::unicode::unicode_data::grapheme_extend::lookup
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/unicode/unicode_data.rs:305:9
     7: core::char::methods::<impl char>::is_grapheme_extended
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/char/methods.rs:859:9
     8: core::char::methods::<impl char>::escape_debug_ext
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/char/methods.rs:415:46
     9: core::char::methods::<impl char>::escape_debug
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/char/methods.rs:461:9
    10: <str as core::fmt::Debug>::fmt
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/fmt/mod.rs:2057:23
    11: <alloc::string::String as core::fmt::Debug>::fmt
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/alloc/src/string.rs:1966:9
    12: <&T as core::fmt::Debug>::fmt
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/fmt/mod.rs:2010:62
    13: core::fmt::builders::DebugTuple::field::{{closure}}
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/fmt/builders.rs:345:17
    14: core::result::Result<T,E>::and_then
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/result.rs:704:22
    15: core::fmt::builders::DebugTuple::field
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/fmt/builders.rs:332:23
    16: <core::option::Option<T> as core::fmt::Debug>::fmt
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/option.rs:158:48
    17: core::fmt::write
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/fmt/mod.rs:1092:17
    18: core::fmt::Write::write_fmt
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/fmt/mod.rs:182:9
    19: alloc::fmt::format
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/alloc/src/fmt.rs:578:5
    20: cc::Build::getenv
               at /home/ben/.cargo/registry/src/github.com-1ecc6299db9ec823/cc-1.0.67/src/lib.rs:2553:21
    21: cc::Build::getenv_unwrap
               at /home/ben/.cargo/registry/src/github.com-1ecc6299db9ec823/cc-1.0.67/src/lib.rs:2559:15
    22: cc::Build::get_opt_level
               at /home/ben/.cargo/registry/src/github.com-1ecc6299db9ec823/cc-1.0.67/src/lib.rs:2520:24
    23: cc::Build::try_get_compiler
               at /home/ben/.cargo/registry/src/github.com-1ecc6299db9ec823/cc-1.0.67/src/lib.rs:1291:25
    24: cc::Build::try_expand
               at /home/ben/.cargo/registry/src/github.com-1ecc6299db9ec823/cc-1.0.67/src/lib.rs:1217:24
    25: build_script_main::validate_headers
               at ./build/main.rs:138:26
    26: build_script_main::find_normal::try_pkg_config
               at ./build/find_normal.rs:205:5
    27: build_script_main::find_normal::find_openssl_dir
               at ./build/find_normal.rs:75:5
    28: build_script_main::find_normal::get_openssl::{{closure}}
               at ./build/find_normal.rs:15:68
    29: core::option::Option<T>::unwrap_or_else
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/option.rs:427:21
    30: build_script_main::find_normal::get_openssl
               at ./build/find_normal.rs:15:31
    31: build_script_main::find_openssl
               at ./build/main.rs:55:5
    32: build_script_main::main
               at ./build/main.rs:63:34
    33: core::ops::function::FnOnce::call_once
               at /rustc/234781afe33d3f339b002f85f948046d8476cfc9/library/core/src/ops/function.rs:227:5
  note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
warning: build failed, waiting for other jobs to finish...
error: failed to compile `cargo-make v0.32.13`, intermediate artifacts can be found at `/tmp/cargo-installjmgYde`

@saethlin saethlin added the C-bug Category: This is a bug. label Mar 8, 2021
@saethlin saethlin changed the title Bounds check fail in unicode_data::skip_search via cc when building with LTO Bounds check fail/integer overflow in unicode_data::skip_search via cc when building with LTO Mar 8, 2021
@slightlyoutofphase
Copy link
Contributor

slightlyoutofphase commented Mar 8, 2021

I can confirm that this issue also exists when running tests in release mode with LTO for the x86_64-pc-windows-msvc target for at least the past couple of nightlies.

For me, it fails with exactly the same unicode_data.rs error and points to this line of code in my crate, which oddly is not even "code" really but just the opening triple-backticks of a previously-working doctest example.

@moxian
Copy link
Contributor

moxian commented Mar 13, 2021

I've spent some time trying to repro @slightlyoutofphase's error. Turns out rustdoc does weird things to tests... Anyhow

[profile.release]
opt-level = 0  # important!
lto = true 
codegen-units = 1

[dependencies]
staticvec = {git = "https://github.com/slightlyoutofphase/staticvec.git", rev="ad812d18da514fbb551aea09e37279c602742508"}
use staticvec::{StaticString, StringError};

#[derive(Debug)]
pub struct User {
  pub username: StaticString<20>,
  pub role: StaticString<5>,
}

fn main() -> Result<(), StringError> {
  let user = User {
    username: StaticString::try_from_str("user")?,
    role: StaticString::try_from_str("admin")?,
  };
  println!("{:?}", user);
  Ok(())
}

and

> cargo +nightly run --release
    Finished release [unoptimized] target(s) in 0.29s
     Running `target\release\rust-issue-82890.exe`
thread 'main' panicked at 'index out of bounds: the len is 31 but the index is 18446744073709551615', library\core\src\unicode\unicode_data.rs:82:62
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
User { username: StaticString { array: "error: process didn't exit successfully: `target\release\rust-issue-82890.exe` (exit code: 101)

@moxian
Copy link
Contributor

moxian commented Mar 13, 2021

Minified, kinda (as in: it is now just a single self-contained file with no deps) (the Cargo.toml from above post still necessary):

#![feature(
    const_mut_refs,
    const_panic,
    const_ptr_is_null,
    const_raw_ptr_deref,
    const_slice_from_raw_parts,
)]

use core::char::DecodeUtf16Error;
use core::fmt::{self, Debug, Formatter};
use core::mem::{size_of, MaybeUninit};
use core::str::Utf8Error;

#[derive(Debug, Clone, Eq, PartialEq)]
pub struct CapacityError<const N: usize>;

/// This enum represents several different possible "error states" that may be encountered
/// while using a [`StaticString`](crate::string::StaticString).
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum StringError {
    /// Indicates a failed conversion from a `u8` slice to a
    /// [`StaticString`](crate::string::StaticString).
    Utf8(Utf8Error),
    /// Indicates a failed conversion from a `u16` slice to a
    /// [`StaticString`](crate::string::StaticString).
    Utf16(DecodeUtf16Error),
    /// Indicates an attempted access of an invalid UTF-8 character index.
    NotCharBoundary,
    /// Indicates an out-of-bounds indexed access of a [`StaticString`](crate::string::StaticString)
    /// instance.
    OutOfBounds,
}
impl<const N: usize> From<CapacityError<N>> for StringError {
    #[inline(always)]
    fn from(_err: CapacityError<N>) -> Self {
        Self::OutOfBounds
    }
}

/// A [`Vec`](alloc::vec::Vec)-like struct (mostly directly API-compatible where it can be)
/// implemented with const generics around an array of fixed `N` capacity.
struct StaticVec<T, const N: usize> {
    // We create this field in an uninitialized state, and write to it element-wise as needed
    // via pointer methods. At no time should `assume_init` *ever* be called through it.
    data: MaybeUninit<[T; N]>,
    // The constant `N` parameter (and thus the total span of `data`) represent capacity for us,
    // while the field below represents, as its name suggests, the current length of a StaticVec
    // (that is, the current number of "live" elements) just as is the case for a regular `Vec`.
    length: usize,
}

/// A local (identically written) `const fn` version of `slice::from_raw_parts`.
#[inline(always)]
pub(crate) const fn slice_from_raw_parts<'a, T>(data: *const T, length: usize) -> &'a [T] {
    debug_assert!(
        /*
        is_aligned_and_not_null(data),
        "Attempted to create an unaligned or null slice!"
        */
        // See comment starting at line 154 for more info about what's going on here.
        !data.is_null(),
        "Attempted to create a null slice!"
    );
    debug_assert!(
        size_of::<T>().saturating_mul(length) <= isize::MAX as usize,
        "Attempted to create a slice covering at least half of the address space!"
    );
    unsafe { &*core::ptr::slice_from_raw_parts(data, length) }
}

impl<T, const N: usize> StaticVec<T, N> {
    #[inline(always)]
    const fn new() -> Self {
        Self {
            data: Self::new_data_uninit(),
            length: 0,
        }
    }
    #[inline(always)]
    pub(crate) const fn new_data_uninit() -> MaybeUninit<[T; N]> {
        MaybeUninit::uninit()
    }
    #[inline(always)]
    const fn remaining_capacity(&self) -> usize {
        N - self.length
    }
    #[inline(always)]
    const fn len(&self) -> usize {
        self.length
    }
    #[inline(always)]
    pub(crate) const fn first_ptr_mut(this: &mut MaybeUninit<[T; N]>) -> *mut T {
        this as *mut MaybeUninit<[T; N]> as *mut T
    }
    #[inline(always)]
    const fn as_mut_ptr(&mut self) -> *mut T {
        Self::first_ptr_mut(&mut self.data)
    }
    #[inline(always)]
    unsafe fn mut_ptr_at_unchecked(&mut self, index: usize) -> *mut T {
        // We check against `N` as opposed to `length` in our debug assertion here, as these
        // `_unchecked` versions of `ptr_at` and `mut_ptr_at` are primarily intended for
        // initialization-related purposes (and used extensively that way internally throughout the
        // crate.)
        debug_assert!(
            index <= N,
            "In `StaticVec::mut_ptr_at_unchecked`, provided index {} must be within `0..={}`!",
            index,
            N
        );
        self.as_mut_ptr().add(index)
    }
    #[inline(always)]
    unsafe fn set_len(&mut self, new_len: usize) {
        // Most of the `unsafe` functions in this crate that are heavily used internally
        // have debug-build-only assertions where it's useful.
        debug_assert!(
            new_len <= N,
            "In `StaticVec::set_len`, provided length {} exceeds the maximum capacity of {}!",
            new_len,
            N
        );
        self.length = new_len;
    }
    #[inline(always)]
    pub(crate) const fn first_ptr(this: &MaybeUninit<[T; N]>) -> *const T {
        this as *const MaybeUninit<[T; N]> as *const T
    }
    #[inline(always)]
    const fn as_ptr(&self) -> *const T {
        Self::first_ptr(&self.data)
    }

    #[inline(always)]
    const fn as_slice(&self) -> &[T] {
        // Safety: `self.as_ptr()` is a pointer to an array for which the first `length`
        // elements are guaranteed to be initialized. Therefore this is a valid slice.
        slice_from_raw_parts(self.as_ptr(), self.length)
    }
}

pub(crate) struct StaticString<const N: usize> {
    pub(crate) vec: StaticVec<u8, N>,
}

impl<const N: usize> StaticString<N> {
    #[inline(always)]
    pub(crate) const fn new() -> Self {
        Self {
            vec: StaticVec::new(),
        }
    }
    #[inline(always)]
    pub(crate) fn try_from_str<S: AsRef<str>>(string: S) -> Result<Self, CapacityError<N>> {
        let mut res = Self::new();
        res.try_push_str(string)?;
        Ok(res)
    }

    #[inline(always)]
    pub(crate) const fn len(&self) -> usize {
        self.vec.len()
    }

    #[inline(always)]
    pub(crate) const fn remaining_capacity(&self) -> usize {
        self.vec.remaining_capacity()
    }

    #[inline(always)]
    pub(crate) unsafe fn push_str_unchecked(&mut self, string: &str) {
        let string_length = string.len();
        debug_assert!(string_length <= self.remaining_capacity());
        let old_length = self.len();
        let dest = self.vec.mut_ptr_at_unchecked(old_length);
        string.as_ptr().copy_to_nonoverlapping(dest, string_length);
        self.vec.set_len(old_length + string_length);
    }

    #[inline(always)]
    pub(crate) fn try_push_str<S: AsRef<str>>(
        &mut self,
        string: S,
    ) -> Result<(), CapacityError<N>> {
        let string_ref = string.as_ref();
        match self.vec.remaining_capacity() < string_ref.len() {
            false => {
                unsafe { self.push_str_unchecked(string_ref) };
                Ok(())
            }
            true => Err(CapacityError {}),
        }
    }
    #[inline(always)]
    pub(crate) const fn as_str(&self) -> &str {
        unsafe { &*(self.as_bytes() as *const [u8] as *const str) }
    }

    #[inline(always)]
    pub(crate) const fn as_bytes(&self) -> &[u8] {
        self.vec.as_slice()
    }
}

impl<const N: usize> Debug for StaticString<N> {
    #[inline(always)]
    fn fmt(&self, f: &mut Formatter) -> fmt::Result {
        f.debug_struct("StaticString")
            .field("array", &self.as_str())
            .field("size", &self.len())
            .finish()
    }
}

#[derive(Debug)]
struct User {
    username: StaticString<20>,
    role: StaticString<5>,
}

fn main() -> Result<(), StringError> {
    let user = User {
        username: StaticString::try_from_str("user")?,
        role: StaticString::try_from_str("admin")?,
    };
    println!("{:?}", user);
    Ok(())
}

P.s.: and minimized further:

use core::mem::MaybeUninit;

struct StaticString<const N: usize> {
    data: MaybeUninit<[u8; N]>,
    length: usize,
}

impl<const N: usize> StaticString<N> {
    fn try_from_str(string: &str) -> Self {
        let mut data = MaybeUninit::uninit();
        let length = string.len();

        unsafe {
            let dest = &mut data as *mut MaybeUninit<[u8; N]> as *mut u8;
            string.as_ptr().copy_to_nonoverlapping(dest, length);
        };

        Self { data, length }
    }

    fn as_slice(&self) -> &[u8] {
        unsafe {
            &*core::ptr::slice_from_raw_parts(
                &self.data as *const MaybeUninit<[u8; N]> as *const u8,
                self.length,
            )
        }
    }
}

fn main() {
    let username = StaticString::<20>::try_from_str("user");
    // works
    println!("{}  ", &unsafe { &*(username.as_slice() as *const [u8] as *const str) });  
    // doesn't work
    println!("{:?}", &unsafe { &*(username.as_slice() as *const [u8] as *const str) });  
}

@moxian
Copy link
Contributor

moxian commented Mar 13, 2021

The code does use a bit of unsafe, but at a glance looks roughly sane (as in: I've spent 5 minutes staring at it and still see no issues). So this might be unsoundness somewhere in rustc/core?
(And, yes, this did start breaking with upgrade to LLVM 12; nightly-2021-03-04 is the last working version)

@rustbot label: +I-prioritize

@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 13, 2021
@moxian
Copy link
Contributor

moxian commented Mar 13, 2021

update:

fn main() {
    let data_ref: &[u8] = &[0u8];  // or `&[b'a']` or something

    let username_str: &str = std::str::from_utf8(data_ref).unwrap();
    println!("{:?}", username_str);
}

This is firmly in the "I-unsound 💥" now 😛

Edit: Thanks to @danielhenrymantilla for an even more amusing

fn main() {
  println!("{:?}", "\0");  // or "hello world", or any string really
}

@Stupremee Stupremee added regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Mar 13, 2021
@Aaron1011
Copy link
Member

@rustbot ping icebreakers-llvm

@rustbot
Copy link
Collaborator

rustbot commented Mar 13, 2021

Hey LLVM ICE-breakers! This bug has been identified as a good
"LLVM ICE-breaking candidate". In case it's useful, here are some
instructions for tackling these sorts of bugs. Maybe take a look?
Thanks! <3

cc @camelid @comex @cuviper @DutchGhost @hdhoang @henryboisdequin @heyrutvik @higuoxing @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique

@rustbot rustbot added the ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group label Mar 13, 2021
@nagisa
Copy link
Member

nagisa commented Mar 13, 2021

Is this reproduced after #83030? (EDIT: yes.)

@slightlyoutofphase
Copy link
Contributor

slightlyoutofphase commented Mar 13, 2021

@moxian Nice investigation! I will note that that the error was happening for me with a regular release profile for which opt-level is 3, and not 0, though. The lto and codegen-units settings you had do match mine exactly however.

@moxian
Copy link
Contributor

moxian commented Mar 13, 2021

@slightlyoutofphase when rustdoc runs doctests, it unsets opt-level but happens to keep lto. Which was very confusing trying to figure out - compiling the code with opt-level=3 was stubbornly refusing to repro despite "clearly" having that in the original Cargo.toml.

If you happen to have a different piece of code that demonstrates the issue with "honest" rustc -C opt-level=3 that might be nice to know though! (as in: that might be a different bug 😛 )

@slightlyoutofphase
Copy link
Contributor

@moxian Ah, I didn't know that about doctests. I don't think I have something that demonstrates anything different then - in fact it seemed extra strange because I did try copying that code out into a normal program with the same release profile, and in that context the error failed to surface.

So it seems like your narrowing it down to opt-level = 0 is specifically correct for sure.

@danielhenrymantilla
Copy link
Contributor

danielhenrymantilla commented Mar 13, 2021

FWIW, here is a Playground repro; seems to be related to an integer underflow: could .checked_sub None branch somehow have been deemed unreachable?

@Aaron1011
Copy link
Member

This can be reproduced with format! instead of println!

@danielhenrymantilla
Copy link
Contributor

danielhenrymantilla commented Mar 13, 2021

@moxian
Copy link
Contributor

moxian commented Mar 13, 2021

Minimized to not have any fmt machinery at all:

pub fn main() {
  'a'.escape_debug();
}

@camelid camelid added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Mar 14, 2021
@nagisa
Copy link
Member

nagisa commented Mar 14, 2021

Some investigation I did inside a debugger so far:

let last_idx =
match short_offset_runs.binary_search_by_key(&(needle << 11), |header| header << 11) {
Ok(idx) => idx + 1,
Err(idx) => idx,
};

gets computed as 0 initially, which then has 1 subtracted from it:

let prev =
last_idx.checked_sub(1).map(|prev| decode_prefix_sum(short_offset_runs[prev])).unwrap_or(0);

resulting in a -1 stored in the A register, which is then stored on a stack and a byte of the same register is reset to 1 (indicating that an unsigned overflow has occurred)

   0x0000557f25342a61 <+449>:   sub    $0x1,%rax
   0x0000557f25342a65 <+453>:   mov    %rax,0x78(%rsp)
   0x0000557f25342a6a <+458>:   setb   %al

However the next immediate instruction is...

   0x0000557f25342a6d <+461>:   xor    %eax,%eax
   0x0000557f25342a6f <+463>:   mov    %eax,0x84(%rsp)
   0x0000557f25342a76 <+470>:   jb     0x557f25342ab6 <_ZN4core7unicode12unicode_data15grapheme_extend6lookup17hb68bf130fc9c0d7eE+534>

which is a short for… clear the 32 bits of A register, in an apparent attempt to clear a stack slot? This, unfortunately ends up entirely ignoring the fact an overflow has occurred (xor also clears the flags).

This suggests to me that there's an issue with register liveness tracking (the register holding the overflow flag ends up being killed early), and since in the underlying LLVM-IR the overflowing sub operation returns a { T, i1 }, there's going to be an extractvalue somewhere in there. As thus the bug sounds extremely similar to the issue fixed in #83030 (https://bugs.llvm.org/show_bug.cgi?id=49467), except this time it apparently crops up through LTO.

@Mark-Simulacrum Mark-Simulacrum added this to the 1.52.0 milestone Mar 14, 2021
@nikic
Copy link
Contributor

nikic commented Mar 14, 2021

IR for the relevant function: https://gist.github.com/nikic/74a6461a4c09cdf2bf6b2f6ce9d8582a Reproduces under llc -O0.

Initial machine instrs produced by FastISel already look broken:

  %14:gr64 = PHI %47:gr64, %bb.7, %56:gr64, %bb.8
  %64:gr64_nosp = SUB64ri8 %11:gr64_nosp(tied-def 0), 1, implicit-def $eflags
  %65:gr8 = SETCCr 2, implicit $eflags
  %60:gr32 = MOV32r0 implicit-def $eflags
  JCC_1 %bb.13, 2, implicit $eflags

@nikic
Copy link
Contributor

nikic commented Mar 14, 2021

Upstream report: https://bugs.llvm.org/show_bug.cgi?id=49587

@nikic nikic self-assigned this Mar 14, 2021
@JohnTitor
Copy link
Member

JohnTitor commented Mar 16, 2021

Assigning P-critical as discussed as part of the Prioritization Working Group procedure and removing I-prioritize.

@JohnTitor JohnTitor removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 16, 2021
@JohnTitor JohnTitor added the P-critical Critical priority label Mar 16, 2021
@nikic
Copy link
Contributor

nikic commented Mar 29, 2021

Upstream fix: llvm/llvm-project@7669455

@jhwgh1968
Copy link

I see that the LLVM submodule has been updated a couple times since the upstream fix, and there are a couple of PRs currently in queue for other LLVM issues.

Any idea when this fix will get into nightly? Or did it already?

@nikic
Copy link
Contributor

nikic commented Apr 16, 2021

Should be fixed by #84230.

@Mark-Simulacrum
Copy link
Member

Fixed in 1.52 by #84271 as well, backporting #84230.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group P-critical Critical priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly.
Projects
None yet
Development

No branches or pull requests