Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Allocation::bytes private #63561

Merged
merged 12 commits into from
Sep 3, 2019
20 changes: 8 additions & 12 deletions src/librustc/ich/impls_ty.rs
Original file line number Diff line number Diff line change
Expand Up @@ -168,25 +168,21 @@ impl<'a> HashStable<StableHashingContext<'a>> for mir::interpret::AllocId {
}
}

// Allocations treat their relocations specially
impl<'a> HashStable<StableHashingContext<'a>> for mir::interpret::Allocation {
// `Relocations` with default type parameters is a sorted map.
impl<'a, Tag> HashStable<StableHashingContext<'a>>
for mir::interpret::Relocations<Tag>
where
Tag: HashStable<StableHashingContext<'a>>,
{
fn hash_stable<W: StableHasherResult>(
&self,
hcx: &mut StableHashingContext<'a>,
hasher: &mut StableHasher<W>,
) {
let mir::interpret::Allocation {
bytes, relocations, undef_mask, align, mutability,
extra: _,
} = self;
bytes.hash_stable(hcx, hasher);
relocations.len().hash_stable(hcx, hasher);
for reloc in relocations.iter() {
self.len().hash_stable(hcx, hasher);
for reloc in self.iter() {
reloc.hash_stable(hcx, hasher);
}
undef_mask.hash_stable(hcx, hasher);
align.hash_stable(hcx, hasher);
mutability.hash_stable(hcx, hasher);
}
}

Expand Down
140 changes: 131 additions & 9 deletions src/librustc/mir/interpret/allocation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,33 @@ use rustc_data_structures::sorted_map::SortedMap;
use rustc_target::abi::HasDataLayout;
use std::borrow::Cow;

#[derive(Clone, Debug, Eq, PartialEq, PartialOrd, Ord, Hash, RustcEncodable, RustcDecodable)]
// NOTE: When adding new fields, make sure to adjust the Snapshot impl in
// `src/librustc_mir/interpret/snapshot.rs`.
#[derive(
Clone,
Debug,
Eq,
PartialEq,
PartialOrd,
Ord,
Hash,
RustcEncodable,
RustcDecodable,
HashStable,
)]
pub struct Allocation<Tag=(),Extra=()> {
/// The actual bytes of the allocation.
/// Note that the bytes of a pointer represent the offset of the pointer
pub bytes: Vec<u8>,
/// Note that the bytes of a pointer represent the offset of the pointer.
bytes: Vec<u8>,
/// Maps from byte addresses to extra data for each pointer.
/// Only the first byte of a pointer is inserted into the map; i.e.,
/// every entry in this map applies to `pointer_size` consecutive bytes starting
/// at the given offset.
pub relocations: Relocations<Tag>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this private and provide a read-only accessor, similar to what you did for undef_mask?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There already exists a method relocations, which returns relocations within a range specified by ptr and size, so I'd name the corresponding accessor all_relocations instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, interpret/memory.rs tries to access it directly. I'd move code in a similar manner as compressed undef mask.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not available until Wednesday next week but the rest should be ready.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take all the time you need

Copy link
Contributor Author

@HeroicKatora HeroicKatora Aug 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name fn relocations for the accessor is already taken:

/// Returns all relocations overlapping with the given ptr-offset pair.
pub fn relocations(
&self,
cx: &impl HasDataLayout,
ptr: Pointer<Tag>,
size: Size,
) -> &[(Size, (Tag, AllocId))] {

I'd rename the existing method to get_relocations which would be aligned with get_bytes, check_bytes, check_relocations to be concerned with interpreter execution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.
Makes me wonder who will even need the general read-only general accessor. Maybe that should also be raw_relocations or so, to make it less appealing to call?

/// Denotes undefined memory. Reading from undefined memory is forbidden in miri
pub undef_mask: UndefMask,
/// Denotes which part of this allocation is initialized.
undef_mask: UndefMask,
/// The size of the allocation. Currently, must always equal `bytes.len()`.
pub size: Size,
/// The alignment of the allocation to detect unaligned reads.
pub align: Align,
/// Whether the allocation is mutable.
Expand Down Expand Up @@ -85,11 +100,12 @@ impl<Tag> Allocation<Tag> {
/// Creates a read-only allocation initialized by the given bytes
pub fn from_bytes<'a>(slice: impl Into<Cow<'a, [u8]>>, align: Align) -> Self {
let bytes = slice.into().into_owned();
let undef_mask = UndefMask::new(Size::from_bytes(bytes.len() as u64), true);
let size = Size::from_bytes(bytes.len() as u64);
Self {
bytes,
relocations: Relocations::new(),
undef_mask,
undef_mask: UndefMask::new(size, true),
size,
align,
mutability: Mutability::Immutable,
extra: (),
Expand All @@ -106,13 +122,34 @@ impl<Tag> Allocation<Tag> {
bytes: vec![0; size.bytes() as usize],
relocations: Relocations::new(),
undef_mask: UndefMask::new(size, false),
size,
align,
mutability: Mutability::Mutable,
extra: (),
}
}
}

/// Raw accessors. Provide access to otherwise private bytes.
impl<Tag, Extra> Allocation<Tag, Extra> {
pub fn len(&self) -> usize {
self.size.bytes() as usize
}

/// Look at a slice which may describe undefined bytes or describe a relocation. This differs
/// from `get_bytes_with_undef_and_ptr` in that it does no relocation checks (even on the
/// edges) at all. It further ignores `AllocationExtra` callbacks.
/// This must not be used for reads affecting the interpreter execution.
pub fn inspect_with_undef_and_ptr_outside_interpreter(&self, range: Range<usize>) -> &[u8] {
&self.bytes[range]
}

/// View the undef mask.
pub fn undef_mask(&self) -> &UndefMask {
&self.undef_mask
}
}

impl<'tcx> rustc_serialize::UseSpecializedDecodable for &'tcx Allocation {}

/// Byte accessors
Expand All @@ -132,9 +169,9 @@ impl<'tcx, Tag: Copy, Extra: AllocationExtra<Tag>> Allocation<Tag, Extra> {
);
let end = end.bytes() as usize;
assert!(
end <= self.bytes.len(),
end <= self.len(),
"Out-of-bounds access at offset {}, size {} in allocation of size {}",
offset.bytes(), size.bytes(), self.bytes.len()
offset.bytes(), size.bytes(), self.len()
);
(offset.bytes() as usize)..end
}
Expand Down Expand Up @@ -536,6 +573,91 @@ impl<'tcx, Tag, Extra> Allocation<Tag, Extra> {
}
}

/// Run-length encoding of the undef mask.
/// Used to copy parts of a mask multiple times to another allocation.
pub struct AllocationDefinedness {
ranges: smallvec::SmallVec::<[u64; 1]>,
first: bool,
}

/// Transferring the definedness mask to other allocations.
impl<Tag, Extra> Allocation<Tag, Extra> {
/// Creates a run-length encoding of the undef_mask.
pub fn compress_defined_range(
&self,
src: Pointer<Tag>,
size: Size,
) -> AllocationDefinedness {
// Since we are copying `size` bytes from `src` to `dest + i * size` (`for i in 0..repeat`),
// a naive undef mask copying algorithm would repeatedly have to read the undef mask from
// the source and write it to the destination. Even if we optimized the memory accesses,
// we'd be doing all of this `repeat` times.
// Therefor we precompute a compressed version of the undef mask of the source value and
// then write it back `repeat` times without computing any more information from the source.

// a precomputed cache for ranges of defined/undefined bits
// 0000010010001110 will become
// [5, 1, 2, 1, 3, 3, 1]
// where each element toggles the state

let mut ranges = smallvec::SmallVec::<[u64; 1]>::new();
let first = self.undef_mask.get(src.offset);
let mut cur_len = 1;
let mut cur = first;

for i in 1..size.bytes() {
// FIXME: optimize to bitshift the current undef block's bits and read the top bit
if self.undef_mask.get(src.offset + Size::from_bytes(i)) == cur {
cur_len += 1;
} else {
ranges.push(cur_len);
cur_len = 1;
cur = !cur;
}
}

ranges.push(cur_len);

AllocationDefinedness { ranges, first, }
}

/// Apply multiple instances of the run-length encoding to the undef_mask.
pub fn mark_compressed_range(
&mut self,
defined: &AllocationDefinedness,
dest: Pointer<Tag>,
size: Size,
repeat: u64,
) {
// an optimization where we can just overwrite an entire range of definedness bits if
// they are going to be uniformly `1` or `0`.
if defined.ranges.len() <= 1 {
self.undef_mask.set_range_inbounds(
dest.offset,
dest.offset + size * repeat,
defined.first,
);
return;
}

for mut j in 0..repeat {
j *= size.bytes();
j += dest.offset.bytes();
let mut cur = defined.first;
for range in &defined.ranges {
let old_j = j;
j += range;
self.undef_mask.set_range_inbounds(
Size::from_bytes(old_j),
Size::from_bytes(j),
cur,
);
cur = !cur;
}
}
}
}

/// Relocations
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Debug, RustcEncodable, RustcDecodable)]
pub struct Relocations<Tag=(), Id=AllocId>(SortedMap<Size, (Tag, Id)>);
Expand Down
10 changes: 8 additions & 2 deletions src/librustc/ty/print/pretty.rs
Original file line number Diff line number Diff line change
Expand Up @@ -944,10 +944,16 @@ pub trait PrettyPrinter<'tcx>:
.get_bytes(&self.tcx(), ptr, Size::from_bytes(n)).unwrap())
},
(ConstValue::Slice { data, start, end }, ty::Slice(t)) if *t == u8 => {
Some(&data.bytes[start..end])
// The `inspect` here is okay since we checked the bounds, and there are no
// relocations (we have an active slice reference here). We don't use this
// result to affect interpreter execution.
Some(data.inspect_with_undef_and_ptr_outside_interpreter(start..end))
},
(ConstValue::Slice { data, start, end }, ty::Str) => {
let slice = &data.bytes[start..end];
// The `inspect` here is okay since we checked the bounds, and there are no
// relocations (we have an active `str` reference here). We don't use this
// result to affect interpreter execution.
let slice = data.inspect_with_undef_and_ptr_outside_interpreter(start..end);
let s = ::std::str::from_utf8(slice)
.expect("non utf8 str from miri");
p!(write("{:?}", s));
Expand Down
55 changes: 48 additions & 7 deletions src/librustc_codegen_llvm/consts.rs
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,21 @@ pub fn const_alloc_to_llvm(cx: &CodegenCx<'ll, '_>, alloc: &Allocation) -> &'ll
assert_eq!(offset as usize as u64, offset);
let offset = offset as usize;
if offset > next_offset {
llvals.push(cx.const_bytes(&alloc.bytes[next_offset..offset]));
// This `inspect` is okay since we have check that it is not within a relocation, it is
// within the bounds of the allocation, and it doesn't affect interpreter execution (we
// inspect the result after interpreter execution). Any undef byte is replaced with
// some arbitrary byte value.
//
// FIXME: relay undef bytes to codegen as undef const bytes
let bytes = alloc.inspect_with_undef_and_ptr_outside_interpreter(next_offset..offset);
llvals.push(cx.const_bytes(bytes));
}
let ptr_offset = read_target_uint(
dl.endian,
&alloc.bytes[offset..(offset + pointer_size)],
// This `inspect` is okay since it is within the bounds of the allocation, it doesn't
// affect interpreter execution (we inspect the result after interpreter execution),
// and we properly interpret the relocation as a relocation pointer offset.
alloc.inspect_with_undef_and_ptr_outside_interpreter(offset..(offset + pointer_size)),
).expect("const_alloc_to_llvm: could not read relocation pointer") as u64;
llvals.push(cx.scalar_to_backend(
Pointer::new(alloc_id, Size::from_bytes(ptr_offset)).into(),
Expand All @@ -51,8 +61,16 @@ pub fn const_alloc_to_llvm(cx: &CodegenCx<'ll, '_>, alloc: &Allocation) -> &'ll
));
next_offset = offset + pointer_size;
}
if alloc.bytes.len() >= next_offset {
llvals.push(cx.const_bytes(&alloc.bytes[next_offset ..]));
if alloc.len() >= next_offset {
let range = next_offset..alloc.len();
// This `inspect` is okay since we have check that it is after all relocations, it is
// within the bounds of the allocation, and it doesn't affect interpreter execution (we
// inspect the result after interpreter execution). Any undef byte is replaced with some
// arbitrary byte value.
//
// FIXME: relay undef bytes to codegen as undef const bytes
let bytes = alloc.inspect_with_undef_and_ptr_outside_interpreter(range);
llvals.push(cx.const_bytes(bytes));
}

cx.const_struct(&llvals, true)
Expand Down Expand Up @@ -437,7 +455,23 @@ impl StaticMethods for CodegenCx<'ll, 'tcx> {
//
// We could remove this hack whenever we decide to drop macOS 10.10 support.
if self.tcx.sess.target.target.options.is_like_osx {
let sect_name = if alloc.bytes.iter().all(|b| *b == 0) {
assert_eq!(alloc.relocations.len(), 0);

let is_zeroed = {
// Treats undefined bytes as if they were defined with the byte value that
// happens to be currently assigned in mir. This is valid since reading
// undef bytes may yield arbitrary values.
//
// FIXME: ignore undef bytes even with representation `!= 0`.
//
// The `inspect` method is okay here because we checked relocations, and
// because we are doing this access to inspect the final interpreter state
// (not as part of the interpreter execution).
alloc.inspect_with_undef_and_ptr_outside_interpreter(0..alloc.len())
.iter()
.all(|b| *b == 0)
};
let sect_name = if is_zeroed {
CStr::from_bytes_with_nul_unchecked(b"__DATA,__thread_bss\0")
} else {
CStr::from_bytes_with_nul_unchecked(b"__DATA,__thread_data\0")
Expand All @@ -456,10 +490,17 @@ impl StaticMethods for CodegenCx<'ll, 'tcx> {
section.as_str().as_ptr() as *const _,
section.as_str().len() as c_uint,
);
assert!(alloc.relocations.is_empty());

// The `inspect` method is okay here because we checked relocations, and
// because we are doing this access to inspect the final interpreter state (not
// as part of the interpreter execution).
let bytes = alloc.inspect_with_undef_and_ptr_outside_interpreter(
0..alloc.len());
let alloc = llvm::LLVMMDStringInContext(
self.llcx,
alloc.bytes.as_ptr() as *const _,
alloc.bytes.len() as c_uint,
bytes.as_ptr() as *const _,
bytes.len() as c_uint,
);
let data = [section, alloc];
let meta = llvm::LLVMMDNodeInContext(self.llcx, data.as_ptr(), 2);
Expand Down
2 changes: 1 addition & 1 deletion src/librustc_mir/interpret/intrinsics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ impl<'mir, 'tcx, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {
let alloc = alloc_type_name(self.tcx.tcx, substs.type_at(0));
let name_id = self.tcx.alloc_map.lock().create_memory_alloc(alloc);
let id_ptr = self.memory.tag_static_base_pointer(name_id.into());
let alloc_len = alloc.bytes.len() as u64;
let alloc_len = alloc.size.bytes();
let name_val = Immediate::new_slice(Scalar::Ptr(id_ptr), alloc_len, self);
self.write_immediate(name_val, dest)?;
}
Expand Down
2 changes: 1 addition & 1 deletion src/librustc_mir/interpret/intrinsics/type_name.rs
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ pub fn type_name<'tcx>(tcx: TyCtxt<'tcx>, ty: Ty<'tcx>) -> &'tcx ty::Const<'tcx>
val: ConstValue::Slice {
data: alloc,
start: 0,
end: alloc.bytes.len(),
end: alloc.len(),
},
ty: tcx.mk_static_str(),
})
Expand Down
Loading