Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment of the bytes of Allocation to match align parameter #100467

Closed
wants to merge 25 commits into from
Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
0fe75b8
Stop x.py from being mean
maurer Aug 2, 2022
f37fe37
DO NOT MERGE - scratchwork
maurer Aug 3, 2022
fde4235
point to my miri fork
Aug 3, 2022
e0e8e00
no more direct hole punch to alloc bytes
Aug 4, 2022
a7b7f19
no more holepunch to bytes
Aug 4, 2022
f7a991b
commented out alignment check for int-aligned addrs -- need to make t…
Aug 6, 2022
e2ed272
proper check for size of allocation being aligned
Aug 7, 2022
51269b8
using the align parameter to properly align allocations
Aug 8, 2022
bca203e
redoing allocation bytes realignment -- still testing with miri
Aug 12, 2022
0ddff36
allocation bytes alignment, and cleanup .gitmodules
Aug 12, 2022
a72a057
double free detected in tcache 2: could not compile core in stage 1 c…
Aug 12, 2022
04f29dc
using slice directly, no intermediate vec
Aug 12, 2022
2fd7606
removing miri submodule updates
Aug 15, 2022
ab1a61f
removing miri submodule updates
Aug 15, 2022
b87f5ef
removing miri submodule updates
Aug 15, 2022
cade1c1
going back to previous version of miri to match commit of rustc
Aug 15, 2022
c31d404
moving AllocBytes into a trait -- partially done
emarteca Sep 5, 2022
e993680
more moving allocbytes into trait
emarteca Sep 7, 2022
17ac36b
allocbytes moved into a trait, implemented for `Box<[u8]>` as default
emarteca Sep 7, 2022
c2e142b
merging in changes from upstream master
emarteca Sep 8, 2022
99f6708
cleanup
emarteca Sep 8, 2022
a12d111
cleanup
emarteca Sep 8, 2022
f75649b
propagating Allocation Bytes type
emarteca Sep 14, 2022
8db066f
adding deref and derefmut to allocbytes, removing now unnecessary met…
emarteca Nov 9, 2022
f075a12
nit
emarteca Nov 9, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions Cargo.lock
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@
# It is not intended for manual editing.
version = 3

[[package]]
name = "abort_on_panic"
version = "2.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "955f37ac58af2416bac687c8ab66a4ccba282229bd7422a28d2281a5e66a6116"

[[package]]
name = "addr2line"
version = "0.16.0"
Expand Down Expand Up @@ -2151,6 +2157,26 @@ dependencies = [
"rustc-std-workspace-core",
]

[[package]]
name = "libffi"
version = "3.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0e08093a2ddeee94bd0c830a53d895ff91f1f3bb0f9b3c8c6b00739cdf76bc1d"
dependencies = [
"abort_on_panic",
"libc",
"libffi-sys",
]

[[package]]
name = "libffi-sys"
version = "2.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ab4106b7f09d7b87d021334d5618fac1dfcfb824d4c5fe111ff0074dfd242e15"
dependencies = [
"cc",
]

[[package]]
name = "libgit2-sys"
version = "0.13.2+1.4.2"
Expand Down Expand Up @@ -2482,6 +2508,8 @@ dependencies = [
"getrandom 0.2.0",
"lazy_static",
"libc",
"libffi",
"libloading",
"log",
"measureme",
"rand 0.8.5",
Expand Down
7 changes: 7 additions & 0 deletions compiler/rustc_const_eval/src/interpret/memory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -518,6 +518,13 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {
)
}

/// Get the base address for the bytes in an `Allocation` specified by the
/// `AllocID` passed in; error if no such allocation exists.
pub fn get_alloc_base_addr(&self, id: AllocId) -> InterpResult<'tcx, Size> {
let alloc = self.get_alloc_raw(id)?;
Ok(alloc.get_bytes_addr())
}

/// Gives raw access to the `Allocation`, without bounds or alignment checks.
/// The caller is responsible for calling the access hooks!
fn get_alloc_raw(
Expand Down
2 changes: 2 additions & 0 deletions compiler/rustc_middle/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
//! This API is completely unstable and subject to change.

#![doc(html_root_url = "https://doc.rust-lang.org/nightly/nightly-rustc/")]
#![feature(maybe_uninit_write_slice)]
#![feature(allocator_api)]
#![feature(array_windows)]
#![feature(assert_matches)]
Expand Down Expand Up @@ -60,6 +61,7 @@
#![feature(intra_doc_pointers)]
#![feature(yeet_expr)]
#![feature(const_option)]
#![feature(vec_into_raw_parts)]
#![recursion_limit = "512"]
#![allow(rustc::potential_query_instability)]

Expand Down
74 changes: 55 additions & 19 deletions compiler/rustc_middle/src/mir/interpret/allocation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ use std::hash;
use std::iter;
use std::ops::{Deref, Range};
use std::ptr;
use std::mem::MaybeUninit;

use rustc_ast::Mutability;
use rustc_data_structures::intern::Interned;
Expand Down Expand Up @@ -207,8 +208,19 @@ impl<Prov> Allocation<Prov> {
align: Align,
mutability: Mutability,
) -> Self {
let bytes = Box::<[u8]>::from(slice.into());
let size = Size::from_bytes(bytes.len());
let slice: Cow<'a, [u8]> = slice.into();
let size = Size::from_bytes(slice.len());
let align_usize: usize = align.bytes().try_into().unwrap();
let layout = std::alloc::Layout::from_size_align(slice.len(), align_usize).unwrap();
let bytes = unsafe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my biggest concern is that this introduces unsafe code deep in the core of the interpreter engine. This is a rather significant cost and risk that the interpreter has to carry, even the compile-time interpreter that definitely does not want or need this.

At the least, we should find a way to factor this such that all the unsafe code lives in the Miri repository. Even that is still not great though, it is exactly the kind of global cost I was worried about.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see your point, but I don't think we can align the bytes of the allocation without unsafe code.
If we want to factor this out into Miri, then one option could be to refactor Allocation to expose bytes so that the bytes alignment can be done on the Miri side. Although, this would be a bigger change to Allocation.
What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we certainly don't want to expose bytes. But we can make Allocation generic over how bytes are allocated.

You can add a new type parameter for that:

pub struct Allocation<Prov = AllocId, Extra = (), Bytes = Box<[u8]>> {
  bytes: Bytes

and then define a suitable trait

trait AllocationBytes

and add a bound Bytes: AllocationBytes whereveer needed.

The trait needs to then contain all the operations you need. The implementation in rustc can use Box<[u8]> and be completely safe, and that way only Miri needs unsafe code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's smart -- I had tried earlier to make Allocation generic over the allocator for bytes which didn't work. But this makes sense!

let buf = std::alloc::alloc(layout);
let mut boxed = Box::<[MaybeUninit<u8>]>::from_raw(std::slice::from_raw_parts_mut(buf as *mut MaybeUninit<u8>, slice.len()));
the8472 marked this conversation as resolved.
Show resolved Hide resolved
MaybeUninit::write_slice(&mut boxed, &slice);
boxed.assume_init()
};

assert!(bytes.as_ptr() as u64 % align.bytes() == 0);

Self {
bytes,
relocations: Relocations::new(),
Expand All @@ -228,22 +240,29 @@ impl<Prov> Allocation<Prov> {
///
/// If `panic_on_fail` is true, this will never return `Err`.
pub fn uninit<'tcx>(size: Size, align: Align, panic_on_fail: bool) -> InterpResult<'tcx, Self> {
let bytes = Box::<[u8]>::try_new_zeroed_slice(size.bytes_usize()).map_err(|_| {
// This results in an error that can happen non-deterministically, since the memory
// available to the compiler can change between runs. Normally queries are always
// deterministic. However, we can be non-deterministic here because all uses of const
// evaluation (including ConstProp!) will make compilation fail (via hard error
// or ICE) upon encountering a `MemoryExhausted` error.
if panic_on_fail {
panic!("Allocation::uninit called with panic_on_fail had allocation failure")
}
ty::tls::with(|tcx| {
tcx.sess.delay_span_bug(DUMMY_SP, "exhausted memory during interpretation")
});
InterpError::ResourceExhaustion(ResourceExhaustionInfo::MemoryExhausted)
})?;
// SAFETY: the box was zero-allocated, which is a valid initial value for Box<[u8]>
let bytes = unsafe { bytes.assume_init() };
let align_usize: usize = align.bytes().try_into().unwrap();
let layout = std::alloc::Layout::from_size_align(size.bytes_usize(), align_usize).unwrap();
let vec_align = unsafe {
// https://doc.rust-lang.org/nightly/std/alloc/trait.GlobalAlloc.html#tymethod.alloc
// std::alloc::alloc returns null to indicate an allocation failure:
// "Returning a null pointer indicates that either memory is exhausted
// or layout does not meet this allocator’s size or alignment constraints."
let buf = std::alloc::alloc(layout);
// Handle allocation failure.
if buf.is_null() {
if panic_on_fail {
panic!("Allocation::uninit called with panic_on_fail had allocation failure")
}
ty::tls::with(|tcx| {
tcx.sess.delay_span_bug(DUMMY_SP, "exhausted memory during interpretation")
});
Err(InterpError::ResourceExhaustion(ResourceExhaustionInfo::MemoryExhausted))?
}
Vec::from_raw_parts(buf as *mut u8, size.bytes_usize(), size.bytes_usize())
};

let bytes = vec_align.into_boxed_slice();
assert!(bytes.as_ptr() as u64 % align.bytes() == 0);
Ok(Allocation {
bytes,
relocations: Relocations::new(),
Expand All @@ -265,7 +284,17 @@ impl Allocation {
mut adjust_ptr: impl FnMut(Pointer<AllocId>) -> Result<Pointer<Prov>, Err>,
) -> Result<Allocation<Prov, Extra>, Err> {
// Compute new pointer provenance, which also adjusts the bytes.
let mut bytes = self.bytes;
// Realign the pointer
let align_usize: usize = self.align.bytes().try_into().unwrap();
let layout = std::alloc::Layout::from_size_align(self.bytes.len(), align_usize).unwrap();
let mut bytes = unsafe {
let buf = std::alloc::alloc(layout);
let mut boxed = Box::<[MaybeUninit<u8>]>::from_raw(std::slice::from_raw_parts_mut(buf as *mut MaybeUninit<u8>, self.bytes.len()));
the8472 marked this conversation as resolved.
Show resolved Hide resolved
MaybeUninit::write_slice(&mut boxed, &self.bytes);
boxed.assume_init()
};
assert!(bytes.as_ptr() as usize % align_usize == 0);

let mut new_relocations = Vec::with_capacity(self.relocations.0.len());
let ptr_size = cx.data_layout().pointer_size.bytes_usize();
let endian = cx.data_layout().endian;
Expand All @@ -278,6 +307,8 @@ impl Allocation {
write_target_uint(endian, ptr_bytes, ptr_offset.bytes().into()).unwrap();
new_relocations.push((offset, ptr_prov));
}
assert!(bytes.as_ptr() as u64 % self.align.bytes() == 0);

// Create allocation.
Ok(Allocation {
bytes,
Expand Down Expand Up @@ -321,6 +352,11 @@ impl<Prov, Extra> Allocation<Prov, Extra> {

/// Byte accessors.
impl<Prov: Provenance, Extra> Allocation<Prov, Extra> {
/// Get the pointer of the [u8] of bytes.
pub fn get_bytes_addr(&self) -> Size {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is deliberately the host address, right? So it should have return type usize, not Size.

Also, to actually allow C code to use this address, we need to be a bit more careful with aliasing here. The pointer you are exposing here will be invalidated the next time someone mutates this memory, causing UB in Miri itself due to violating the Rust aliasing rules.

Under my proposal where bytes: Bytes, I think the AllocationBytes trait should have a method expose_base_addr with return type usize, and this function here simply calls expose_base_addr. (It should probably also be called expose_base_addr.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point -- thanks for the suggestion, that makes sense -- AllocationBytes trait is a clean solution.

Size::from_bytes(self.bytes.as_ptr() as u64)
}

/// This is the entirely abstraction-violating way to just grab the raw bytes without
/// caring about relocations. It just deduplicates some code between `read_scalar`
/// and `get_bytes_internal`.
Expand Down