False positive when int-to-ptr "confuses" which allocation (provenance) to use for new ptr #1866

niluxv · 2021-08-04T13:47:00Z

Miri accepts dereferencing a (fat) pointer to a slice of ZSTs, except when the pointer to the slice lies in a previously freed allocation.

Example 1 (fresh dangling pointer, succeeds):

fn main() {
    // create fresh dangling ptr `ptr`
    let ptr: *const () = std::ptr::NonNull::<()>::dangling().as_ptr();
    let slice_ptr: *const [()] = std::ptr::slice_from_raw_parts(ptr, 1);
    let slice_ref: &[()] = unsafe { &*slice_ptr } ;
}

playground

Example 2 (pointer to freed allocation, errors):

fn main() {
    let vec: Vec<u8> = vec![1, 2, 3];
    let vec_ptr: *const u8 = vec.as_ptr();
    drop(vec);
    // now `vec_ptr` is dangling, and we create a dangling `ptr` from it
    let ptr: *const () = vec_ptr as *const ();
    let slice_ptr: *const [()] = std::ptr::slice_from_raw_parts(ptr, 1);
    let slice_ref: &[()] = unsafe { &*slice_ptr } ; // ERROR
    // error: Undefined Behavior: pointer to alloc1414 was dereferenced after this allocation got freed
}

playground

But soundness of code does not depend on previous allocations (I hope); in both cases ptr is 'just' a *const () dangling non-null pointer. Therefore either both examples are unsound/UB, or both are sound.

(I think both examples are sound, so that this is a false positive. At least the entire bitvec crate seems to be build around the idea that this is sound. Issue #135 in bitvec an example of this issue in the wild.)

The text was updated successfully, but these errors were encountered:

RalfJung · 2021-08-06T14:06:09Z

Thanks for the report!

This is surprising, but deliberate. See for example here

Even for operations of size zero, the pointer must not be pointing to deallocated memory, i.e., deallocation makes pointers invalid even for zero-sized operations. However, casting any non-zero integer literal to a pointer is valid for zero-sized accesses, even if some memory happens to exist at that address and gets deallocated. This corresponds to writing your own allocator: allocating zero-sized objects is not very hard. The canonical way to obtain a pointer that is valid for zero-sized accesses is NonNull::dangling.

I would love to change this, but LLVM currently won't let me -- see rust-lang/unsafe-code-guidelines#299 and rust-lang/unsafe-code-guidelines#93. I hope some day we can convince LLVM to adjust its spec so that we can make Rust have more reasonable behavior.

So, closing as "Miri works according to its spec; the spec might be strange but that's a separate and complicated problem".

niluxv · 2021-08-06T15:37:25Z

Interesting, I didn't know that. Working with ZSTs is even trickier than I thought!

But there still seems to be a problem. The documentation you linked says

However, casting any non-zero integer literal to a pointer is valid for zero-sized accesses, even if some memory happens to exist at that address and gets deallocated.

But miri errors on the following example (which is sound by the previous citation):

fn main() {
    let vec: Vec<u8> = vec![1, 2, 3];
    drop(vec);
    let zst_ptr: *const () = 165000 as *const ();
    let zst_ref: &() = unsafe { &*zst_ptr } ; // ERROR
    // error: Undefined Behavior: pointer to alloc1415 was dereferenced after this allocation got freed
}

playground

RalfJung · 2021-08-06T15:44:33Z

Hm... yeah that's a nasty example. :/ (It becomes even more nasty if you move the drop down by one line.)

165000 happens to be inside a former allocation so Miri "confuses" the pointers. We might need a much more complicated model of provenance to more precisely describe this. OTOH that would be a huge waste of effort if we can convince LLVM to slightly adjust their spec.^^

I'll reopen this because the "pointer confusion" parts is more of a Miri issue than a spec issue (though writing a precise spec that avoids this problem is on its own a hard problem).

tavianator · 2021-12-03T19:23:35Z

Here is a reproducer of what I think is happening in the bitvec tests.

$ cat foo.rs
fn foo() -> u64 {
    0
}

fn main() {
    for _ in 0..1024 {
        let n = 0u64;
        let ptr: *const u64 = &n;
        foo();
        let iptr = ptr as usize;
        unsafe {
            let start = &*std::ptr::slice_from_raw_parts(iptr as *const (), 1);
            let end = &*std::ptr::slice_from_raw_parts((iptr + 8) as *const (), 1);
            assert_eq!(start.len(), end.len());
        }
    }
}
$ ./miri run foo.rs
error: Undefined Behavior: pointer to alloc2000 was dereferenced after this allocation got freed
  --> ./foo.rs:13:23
   |
13 |             let end = &*std::ptr::slice_from_raw_parts((iptr + 8) as *const (), 1);
   |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pointer to alloc2000 was dereferenced after this allocation got freed

Basically we try to create a ZST slice at the very end of an allocation (&iptr + 8). But sometimes that's the start of another allocation that's already deallocated. ptr_from_addr() will find the start of the dead alloc rather than the end of the live one. The loop is just to try for a zero-slack allocation.

This patch might be too simplistic but it fixes it:

diff --git a/src/intptrcast.rs b/src/intptrcast.rs
index 665a1341..c61b3675 100644
--- a/src/intptrcast.rs
+++ b/src/intptrcast.rs
@@ -92,7 +92,7 @@ impl<'mir, 'tcx> GlobalState {
                 let slack = {
                     let mut rng = memory.extra.rng.borrow_mut();
                     // This means that `(global_state.next_base_addr + slack) % 16` is uniformly distributed.
-                    rng.gen_range(0..16)
+                    rng.gen_range(1..16)
                 };
                 // From next_base_addr + slack, round up to adjust for alignment.
                 let base_addr = global_state.next_base_addr.checked_add(slack).unwrap();

RalfJung · 2021-12-03T20:20:38Z

Ah, integer-pointer-casts and one-past-the-end pointers... fun, fun, fun...

I don't have a proper fix for this. Your patch avoids the problem by simply never having allocations touch so you need more than just-at-the-edge pointers to cause issues, you need to go OOB for real. In principle there will still be correct programs that this rejects but they are much less likely. Though instead of changing slack, we should change this to always add size+1 when computing next_base_addr.

However, I am also curious why the code is going through an integer in the first place. Is there no way to preserve the original provenance by keeping things at pointer type?

Ptr-to-int casts are evil and should be avoided at all cost. ;) Well actually I am only half-joking... they do require the operational semantics to "guess" a provenance, and I am not sure if there is a right answer here. An operation like this is much more well-behaved since it avoids the guessing:

// Converts 'addr' to a pointer using the provenance of 'prov'.
fn int_to_ptr_with_provenance<T>(addr: usize, prov: *const T) -> *const T {
  let ptr = prov.cast::<u8>();
  ptr.wrapping_add(addr.wrapping_sub(ptr as usize)).cast()
}

tavianator · 2021-12-03T20:28:57Z

The bitvec implementation uses the trailing bits of aligned pointers to store some extra information. This necessitates bit operations on pointers. It's possible that the provenance could be preserved through those operations but I'm not familiar enough with the bitvec code to ensure that.

RalfJung · 2021-12-03T20:40:32Z

In theory, what could be done is to do the bitops on usize but then cast back to a ptr using int_to_ptr_with_provenance to get a ptr with the encoded extra information and the original provenance. Then you never have to use <int> as <ptr>, avoiding all these nasty problems. Whether that actually works well in practice I do not know. ;)

When two objects directly follow each other in memory, what is the provenance of an integer cast to a pointer that points directly between them? For a zero-size region, it could point into the end of the first object, or the start of the second. We can avoid answering this difficult question by simply never allocating two objects directly beside each other. This fixes some of the false positives from rust-lang#1866.

RalfJung · 2021-12-03T21:54:44Z

I created rust-lang/unsafe-code-guidelines#313 for the underlying problem here.

When two objects directly follow each other in memory, what is the provenance of an integer cast to a pointer that points directly between them? For a zero-size region, it could point into the end of the first object, or the start of the second. We can avoid answering this difficult question by simply never allocating two objects directly beside each other. This fixes some of the false positives from rust-lang#1866.

niluxv · 2021-12-04T17:42:03Z

Edit: as pointed out in the comment below this comment was wrong

@tavianator Hm, I was thinking of the following explanation for `bitvec`s test failures under miri (but when reconsidering the probability that this occurs really should be negligible): In `bitvec::ptr::span::BitSpan` there is this function: ```rust pub(crate) fn to_bitslice_ref<'a>(self) -> &'a BitSlice { unsafe { &*self.to_bitslice_ptr() } } ``` (it is crate local but ends up being called all over the place) which creates something which should function as a 'reference to a bit in a bitvector/bitarray'. What is important: - `BitSlice` is a ZST - `self.to_bitslice_ptr()` returns a pointer that doesn't directly point into the allocation where the bits are stored; instead it uses a pointer which is essentially computed like this: `ptr_data | ptr_head`, where `ptr_data` is the real pointer and `ptr_head` codes the bit index (inside the byte) in the high bits (head) of the pointer (it is a [tagged pointer](https://en.wikipedia.org/wiki/Tagged_pointer)) - dereferencing `self.to_bitslice_ptr()` (or casting this raw ptr to a reference) is fine as long as this part of the address space is never used for allocations (EDIT: well, fine is too big a word; I don't think the current UB rules say that this is fine, but `miri` at least doesn't complain in this case) - on 64 bit systems the high bits of addresses are never used, as the address space doesn't span the entire 2^{64} possible addresses (IIUC), so this assumption should hold in practise (on 64-bit machines) - miri is platform independent and can freely use the entire address space of 2^{64} addresses, so it is possible that the memory at the `self.to_bitslice_ptr()` address has been part of an allocation, which is now deallocated

@RalfJung
I don't think a int_to_ptr_with_provenance would work here because there is no source of correct provenance for the tagged pointer.

tavianator · 2021-12-04T18:02:44Z

@niluxv The BitSpan type already contains a pointer. It is that pointer that needs the correct provenance. That pointer is created here, and doing int_to_ptr_with_provenance(ptr_data | ptr_head, addr.to_const()) instead does fix the errors reported by Miri. I'm not sure that's the only int->ptr cast that needs to be adjusted, but it's the main one.

The pointer tagging scheme used by bitvec is described here. It is the low bits that are used for tag data, the high bits come from the actual pointer to a live allocation so it should be safe.

niluxv · 2021-12-04T18:21:59Z

Oops, yes. I should have looked up the precise encoding again.

RalfJung · 2021-12-06T01:19:11Z

That pointer is created here, and doing int_to_ptr_with_provenance(ptr_data | ptr_head, addr.to_const()) instead does fix the errors reported by Miri.

Nice. :)
Indeed the idea is that when you have data that might have provenance, you should use a pointer type to store it -- even if the pointer is dangling (since the data is not a plain pointer but somewhat modified). This approach probably doesn't always work, and it wouldn't work in C with its strict rules on pointers always being inbounds, but in Rust it works well and it can maintain provenance with our current set of language primitives and no special new rules for how provenance behaves.

intptrcast: Never allocate two objects directly adjecent When two objects directly follow each other in memory, what is the provenance of an integer cast to a pointer that points directly between them? For a zero-size region, it could point into the end of the first object, or the start of the second. We can avoid answering this difficult question by simply never allocating two objects directly beside each other. This fixes some of the false positives from #1866.

the8472 · 2021-12-11T17:10:42Z

The bitvec implementation uses the trailing bits of aligned pointers to store some extra information. This necessitates bit operations on pointers.

Wouldn't it be possible to use a *const () + PhantomData instead so that the pointer is u8-aligned which allows you to manipulate all bits as needed via ptr.offset?
It's practically the same as int_to_ptr_with_provenance but perhaps more readable.

@tavianator

Fix the miri failures in doctests, see issue ferrilab#135. The issue is that miri doesn't guess correct provenance in the int-to-ptr cast in `BitSpan::new_unchecked`, as was found by @tavianator [here](rust-lang/miri#1866 (comment)). The solution is to preserve provenance and was proposed by @tavianator [here](rust-lang/miri#1866 (comment)). With this change the entire test suite passes under miri.

796: epoch: Remove ptr-to-int casts r=taiki-e a=taiki-e Use [this hack](rust-lang/miri#1866 (comment)) to fix compatibility issues with Miri (see #490 (comment) for details). Due to the #545, still not compatible with stacked borrows. This will be fixed by the subsequent PR (#871). Note: this is a breaking change because changes API of Pointable and Pointer traits Fixes #579 881: Remove deprecated items r=taiki-e a=taiki-e This removes the following deprecated items: - crossbeam-epoch: - `CompareAndSetError` - `CompareAndSetOrdering` - `Atomic::compare_and_set` - `Atomic::compare_and_set_weak` - crossbeam-utils: - `AtomicCell::compare_and_swap` Co-authored-by: Taiki Endo <te316e89@gmail.com>

796: epoch: Remove ptr-to-int casts r=taiki-e a=taiki-e Use [this hack](rust-lang/miri#1866 (comment)) to fix compatibility issues with Miri (see #490 (comment) for details). Due to the #545, still not compatible with stacked borrows. This will be fixed by the subsequent PR (#871). Note: this is a breaking change because changes API of Pointable and Pointer traits Fixes #579 Co-authored-by: Taiki Endo <te316e89@gmail.com>

RalfJung closed this as completed Aug 6, 2021

RalfJung reopened this Aug 6, 2021

gui1117 mentioned this issue Sep 9, 2021

Fix CI failures on master paritytech/parity-scale-codec#288

Merged

RalfJung mentioned this issue Dec 3, 2021

Miri tests failing ferrilab/bitvec#135

Closed

tavianator mentioned this issue Dec 3, 2021

intptrcast: Never allocate two objects directly adjecent #1930

Merged

RalfJung mentioned this issue Dec 3, 2021

What is the provenance of an int-to-ptr result for not strictly inbounds live pointers? rust-lang/unsafe-code-guidelines#313

Closed

RalfJung changed the title ~~False positive: ZST slice pointer deref to freed allocation~~ False positive when int-to-ptr "confuses" which allocation (provenance) to use for new ptr Dec 16, 2021

RalfJung added A-intptrcast Area: affects int2ptr and ptr2int casts C-bug Category: This is a bug. labels Dec 16, 2021

RalfJung mentioned this issue Dec 16, 2021

3 case studies for -Zmiri-tag-raw-pointers #1936

Closed

niluxv mentioned this issue Dec 25, 2021

Fix miri failures by preserving provenance in BitSpan::new_unchecked ferrilab/bitvec#153

Closed

RalfJung mentioned this issue Mar 2, 2022

Miri can't track pointer through atomic bit-flip, which happens as usize #1993

Closed

taiki-e mentioned this issue Mar 3, 2022

epoch: Remove ptr-to-int casts crossbeam-rs/crossbeam#796

Merged

RalfJung mentioned this issue May 20, 2022

The plan for provenance #2133

Closed

6 tasks

RalfJung mentioned this issue Jun 27, 2022

Enable permissive provenance by default #2275

Merged

bors closed this as completed in 7fafbde Jun 28, 2022

bors closed this as completed in #2275 Jun 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

False positive when int-to-ptr "confuses" which allocation (provenance) to use for new ptr #1866

False positive when int-to-ptr "confuses" which allocation (provenance) to use for new ptr #1866

niluxv commented Aug 4, 2021

RalfJung commented Aug 6, 2021 •

edited

Loading

niluxv commented Aug 6, 2021

RalfJung commented Aug 6, 2021

tavianator commented Dec 3, 2021

RalfJung commented Dec 3, 2021 •

edited

Loading

tavianator commented Dec 3, 2021

RalfJung commented Dec 3, 2021 •

edited

Loading

RalfJung commented Dec 3, 2021

niluxv commented Dec 4, 2021 •

edited

Loading

tavianator commented Dec 4, 2021

niluxv commented Dec 4, 2021

RalfJung commented Dec 6, 2021

the8472 commented Dec 11, 2021

False positive when int-to-ptr "confuses" which allocation (provenance) to use for new ptr #1866

False positive when int-to-ptr "confuses" which allocation (provenance) to use for new ptr #1866

Comments

niluxv commented Aug 4, 2021

Example 1 (fresh dangling pointer, succeeds):

Example 2 (pointer to freed allocation, errors):

RalfJung commented Aug 6, 2021 • edited Loading

niluxv commented Aug 6, 2021

RalfJung commented Aug 6, 2021

tavianator commented Dec 3, 2021

RalfJung commented Dec 3, 2021 • edited Loading

tavianator commented Dec 3, 2021

RalfJung commented Dec 3, 2021 • edited Loading

RalfJung commented Dec 3, 2021

niluxv commented Dec 4, 2021 • edited Loading

tavianator commented Dec 4, 2021

niluxv commented Dec 4, 2021

RalfJung commented Dec 6, 2021

the8472 commented Dec 11, 2021

RalfJung commented Aug 6, 2021 •

edited

Loading

RalfJung commented Dec 3, 2021 •

edited

Loading

RalfJung commented Dec 3, 2021 •

edited

Loading

niluxv commented Dec 4, 2021 •

edited

Loading