-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bounds check fail/integer overflow in unicode_data::skip_search via cc when building with LTO #82890
Comments
I can confirm that this issue also exists when running tests in release mode with LTO for the For me, it fails with exactly the same |
I've spent some time trying to repro @slightlyoutofphase's error. Turns out rustdoc does weird things to tests... Anyhow [profile.release]
opt-level = 0 # important!
lto = true
codegen-units = 1
[dependencies]
staticvec = {git = "https://github.com/slightlyoutofphase/staticvec.git", rev="ad812d18da514fbb551aea09e37279c602742508"} use staticvec::{StaticString, StringError};
#[derive(Debug)]
pub struct User {
pub username: StaticString<20>,
pub role: StaticString<5>,
}
fn main() -> Result<(), StringError> {
let user = User {
username: StaticString::try_from_str("user")?,
role: StaticString::try_from_str("admin")?,
};
println!("{:?}", user);
Ok(())
} and > cargo +nightly run --release
Finished release [unoptimized] target(s) in 0.29s
Running `target\release\rust-issue-82890.exe`
thread 'main' panicked at 'index out of bounds: the len is 31 but the index is 18446744073709551615', library\core\src\unicode\unicode_data.rs:82:62
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
User { username: StaticString { array: "error: process didn't exit successfully: `target\release\rust-issue-82890.exe` (exit code: 101) |
Minified, kinda (as in: it is now just a single self-contained file with no deps) (the #![feature(
const_mut_refs,
const_panic,
const_ptr_is_null,
const_raw_ptr_deref,
const_slice_from_raw_parts,
)]
use core::char::DecodeUtf16Error;
use core::fmt::{self, Debug, Formatter};
use core::mem::{size_of, MaybeUninit};
use core::str::Utf8Error;
#[derive(Debug, Clone, Eq, PartialEq)]
pub struct CapacityError<const N: usize>;
/// This enum represents several different possible "error states" that may be encountered
/// while using a [`StaticString`](crate::string::StaticString).
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum StringError {
/// Indicates a failed conversion from a `u8` slice to a
/// [`StaticString`](crate::string::StaticString).
Utf8(Utf8Error),
/// Indicates a failed conversion from a `u16` slice to a
/// [`StaticString`](crate::string::StaticString).
Utf16(DecodeUtf16Error),
/// Indicates an attempted access of an invalid UTF-8 character index.
NotCharBoundary,
/// Indicates an out-of-bounds indexed access of a [`StaticString`](crate::string::StaticString)
/// instance.
OutOfBounds,
}
impl<const N: usize> From<CapacityError<N>> for StringError {
#[inline(always)]
fn from(_err: CapacityError<N>) -> Self {
Self::OutOfBounds
}
}
/// A [`Vec`](alloc::vec::Vec)-like struct (mostly directly API-compatible where it can be)
/// implemented with const generics around an array of fixed `N` capacity.
struct StaticVec<T, const N: usize> {
// We create this field in an uninitialized state, and write to it element-wise as needed
// via pointer methods. At no time should `assume_init` *ever* be called through it.
data: MaybeUninit<[T; N]>,
// The constant `N` parameter (and thus the total span of `data`) represent capacity for us,
// while the field below represents, as its name suggests, the current length of a StaticVec
// (that is, the current number of "live" elements) just as is the case for a regular `Vec`.
length: usize,
}
/// A local (identically written) `const fn` version of `slice::from_raw_parts`.
#[inline(always)]
pub(crate) const fn slice_from_raw_parts<'a, T>(data: *const T, length: usize) -> &'a [T] {
debug_assert!(
/*
is_aligned_and_not_null(data),
"Attempted to create an unaligned or null slice!"
*/
// See comment starting at line 154 for more info about what's going on here.
!data.is_null(),
"Attempted to create a null slice!"
);
debug_assert!(
size_of::<T>().saturating_mul(length) <= isize::MAX as usize,
"Attempted to create a slice covering at least half of the address space!"
);
unsafe { &*core::ptr::slice_from_raw_parts(data, length) }
}
impl<T, const N: usize> StaticVec<T, N> {
#[inline(always)]
const fn new() -> Self {
Self {
data: Self::new_data_uninit(),
length: 0,
}
}
#[inline(always)]
pub(crate) const fn new_data_uninit() -> MaybeUninit<[T; N]> {
MaybeUninit::uninit()
}
#[inline(always)]
const fn remaining_capacity(&self) -> usize {
N - self.length
}
#[inline(always)]
const fn len(&self) -> usize {
self.length
}
#[inline(always)]
pub(crate) const fn first_ptr_mut(this: &mut MaybeUninit<[T; N]>) -> *mut T {
this as *mut MaybeUninit<[T; N]> as *mut T
}
#[inline(always)]
const fn as_mut_ptr(&mut self) -> *mut T {
Self::first_ptr_mut(&mut self.data)
}
#[inline(always)]
unsafe fn mut_ptr_at_unchecked(&mut self, index: usize) -> *mut T {
// We check against `N` as opposed to `length` in our debug assertion here, as these
// `_unchecked` versions of `ptr_at` and `mut_ptr_at` are primarily intended for
// initialization-related purposes (and used extensively that way internally throughout the
// crate.)
debug_assert!(
index <= N,
"In `StaticVec::mut_ptr_at_unchecked`, provided index {} must be within `0..={}`!",
index,
N
);
self.as_mut_ptr().add(index)
}
#[inline(always)]
unsafe fn set_len(&mut self, new_len: usize) {
// Most of the `unsafe` functions in this crate that are heavily used internally
// have debug-build-only assertions where it's useful.
debug_assert!(
new_len <= N,
"In `StaticVec::set_len`, provided length {} exceeds the maximum capacity of {}!",
new_len,
N
);
self.length = new_len;
}
#[inline(always)]
pub(crate) const fn first_ptr(this: &MaybeUninit<[T; N]>) -> *const T {
this as *const MaybeUninit<[T; N]> as *const T
}
#[inline(always)]
const fn as_ptr(&self) -> *const T {
Self::first_ptr(&self.data)
}
#[inline(always)]
const fn as_slice(&self) -> &[T] {
// Safety: `self.as_ptr()` is a pointer to an array for which the first `length`
// elements are guaranteed to be initialized. Therefore this is a valid slice.
slice_from_raw_parts(self.as_ptr(), self.length)
}
}
pub(crate) struct StaticString<const N: usize> {
pub(crate) vec: StaticVec<u8, N>,
}
impl<const N: usize> StaticString<N> {
#[inline(always)]
pub(crate) const fn new() -> Self {
Self {
vec: StaticVec::new(),
}
}
#[inline(always)]
pub(crate) fn try_from_str<S: AsRef<str>>(string: S) -> Result<Self, CapacityError<N>> {
let mut res = Self::new();
res.try_push_str(string)?;
Ok(res)
}
#[inline(always)]
pub(crate) const fn len(&self) -> usize {
self.vec.len()
}
#[inline(always)]
pub(crate) const fn remaining_capacity(&self) -> usize {
self.vec.remaining_capacity()
}
#[inline(always)]
pub(crate) unsafe fn push_str_unchecked(&mut self, string: &str) {
let string_length = string.len();
debug_assert!(string_length <= self.remaining_capacity());
let old_length = self.len();
let dest = self.vec.mut_ptr_at_unchecked(old_length);
string.as_ptr().copy_to_nonoverlapping(dest, string_length);
self.vec.set_len(old_length + string_length);
}
#[inline(always)]
pub(crate) fn try_push_str<S: AsRef<str>>(
&mut self,
string: S,
) -> Result<(), CapacityError<N>> {
let string_ref = string.as_ref();
match self.vec.remaining_capacity() < string_ref.len() {
false => {
unsafe { self.push_str_unchecked(string_ref) };
Ok(())
}
true => Err(CapacityError {}),
}
}
#[inline(always)]
pub(crate) const fn as_str(&self) -> &str {
unsafe { &*(self.as_bytes() as *const [u8] as *const str) }
}
#[inline(always)]
pub(crate) const fn as_bytes(&self) -> &[u8] {
self.vec.as_slice()
}
}
impl<const N: usize> Debug for StaticString<N> {
#[inline(always)]
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
f.debug_struct("StaticString")
.field("array", &self.as_str())
.field("size", &self.len())
.finish()
}
}
#[derive(Debug)]
struct User {
username: StaticString<20>,
role: StaticString<5>,
}
fn main() -> Result<(), StringError> {
let user = User {
username: StaticString::try_from_str("user")?,
role: StaticString::try_from_str("admin")?,
};
println!("{:?}", user);
Ok(())
} P.s.: and minimized further: use core::mem::MaybeUninit;
struct StaticString<const N: usize> {
data: MaybeUninit<[u8; N]>,
length: usize,
}
impl<const N: usize> StaticString<N> {
fn try_from_str(string: &str) -> Self {
let mut data = MaybeUninit::uninit();
let length = string.len();
unsafe {
let dest = &mut data as *mut MaybeUninit<[u8; N]> as *mut u8;
string.as_ptr().copy_to_nonoverlapping(dest, length);
};
Self { data, length }
}
fn as_slice(&self) -> &[u8] {
unsafe {
&*core::ptr::slice_from_raw_parts(
&self.data as *const MaybeUninit<[u8; N]> as *const u8,
self.length,
)
}
}
}
fn main() {
let username = StaticString::<20>::try_from_str("user");
// works
println!("{} ", &unsafe { &*(username.as_slice() as *const [u8] as *const str) });
// doesn't work
println!("{:?}", &unsafe { &*(username.as_slice() as *const [u8] as *const str) });
} |
The code does use a bit of unsafe, but at a glance looks roughly sane (as in: I've spent 5 minutes staring at it and still see no issues). So this might be unsoundness somewhere in rustc/core? @rustbot label: +I-prioritize |
update: fn main() {
let data_ref: &[u8] = &[0u8]; // or `&[b'a']` or something
let username_str: &str = std::str::from_utf8(data_ref).unwrap();
println!("{:?}", username_str);
} This is firmly in the "I-unsound 💥" now 😛 Edit: Thanks to @danielhenrymantilla for an even more amusing fn main() {
println!("{:?}", "\0"); // or "hello world", or any string really
} |
@rustbot ping icebreakers-llvm |
Hey LLVM ICE-breakers! This bug has been identified as a good cc @camelid @comex @cuviper @DutchGhost @hdhoang @henryboisdequin @heyrutvik @higuoxing @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique |
Is this reproduced after #83030? (EDIT: yes.) |
@moxian Nice investigation! I will note that that the error was happening for me with a regular release profile for which |
@slightlyoutofphase when rustdoc runs doctests, it unsets If you happen to have a different piece of code that demonstrates the issue with "honest" |
@moxian Ah, I didn't know that about doctests. I don't think I have something that demonstrates anything different then - in fact it seemed extra strange because I did try copying that code out into a normal program with the same release profile, and in that context the error failed to surface. So it seems like your narrowing it down to |
FWIW, here is a Playground repro; seems to be related to an integer underflow: could |
This can be reproduced with |
Minimized to not have any pub fn main() {
'a'.escape_debug();
} |
Some investigation I did inside a debugger so far: rust/library/core/src/unicode/unicode_data.rs Lines 69 to 73 in acca818
gets computed as 0 initially, which then has 1 subtracted from it: rust/library/core/src/unicode/unicode_data.rs Lines 81 to 82 in acca818
resulting in a 0x0000557f25342a61 <+449>: sub $0x1,%rax
0x0000557f25342a65 <+453>: mov %rax,0x78(%rsp)
0x0000557f25342a6a <+458>: setb %al However the next immediate instruction is... 0x0000557f25342a6d <+461>: xor %eax,%eax
0x0000557f25342a6f <+463>: mov %eax,0x84(%rsp)
0x0000557f25342a76 <+470>: jb 0x557f25342ab6 <_ZN4core7unicode12unicode_data15grapheme_extend6lookup17hb68bf130fc9c0d7eE+534> which is a short for… clear the 32 bits of This suggests to me that there's an issue with register liveness tracking (the register holding the overflow flag ends up being killed early), and since in the underlying LLVM-IR the overflowing sub operation returns a |
IR for the relevant function: https://gist.github.com/nikic/74a6461a4c09cdf2bf6b2f6ce9d8582a Reproduces under Initial machine instrs produced by FastISel already look broken:
|
Upstream report: https://bugs.llvm.org/show_bug.cgi?id=49587 |
Assigning |
Upstream fix: llvm/llvm-project@7669455 |
I see that the LLVM submodule has been updated a couple times since the upstream fix, and there are a couple of PRs currently in queue for other LLVM issues. Any idea when this fix will get into nightly? Or did it already? |
Should be fixed by #84230. |
I forgot the
--target=x86_64-unknown-linux-gnu
that would be required to make this actually work (build will eventually fail because LTO and proc-macro dylib crates). If I add the--target
, I don't see any crashes. About half the time I run thecargo install
invocation, I get a panic and the backtrace below (-Cdebuginfo=2
for the backtrace, doesn't seem related to the crash). I do not see this panic if I swap out-Clto=fat
for-Copt-level=3
.Meta
rustc --version --verbose
:Backtrace
The text was updated successfully, but these errors were encountered: