Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a ZST reference like a raw pointer to avoid asserting aliasing #314

Open
kupiakos opened this issue Dec 15, 2021 · 8 comments
Open

Comments

@kupiakos
Copy link

I'm working on possible solutions for tock/tock#2882, where kernel code needs to handle buffers as possibly mutating when shared with an application, even if derived from a &[u8]. Currently, it transmutes a &[u8] to &[Cell<u8>], which is UB.

The obvious solution to this is to only use raw pointers. A *const [u8] can soundly read from either a &[u8] or &[Cell<u8>]. However, this would involve non-trivial code changes and would preclude using useful traits like Index.

One solution I was thinking of is to use a reference to ZST as a sort of "medium-rare pointer", somewhere between a &T and *const T. Since ZST references don't assert the aliasing properties of any memory, they could theoretically be used to hold an address and be read/written from from like a raw pointer, while erasing the difference between &T and &Cell<T>. I developed this idea into an experimental ZST type that I call "existential references", implemented in this crate. Example usage:

// The only other way to have this read from `&i32` and `&Cell<i32>` is with `*const i32`.
fn get_val_plus_two(x: &Exists<i32>) -> i32 {
    x.get() + 2
}
let x = 10;
let mut y = 20;
let z = Cell::new(30);
assert_eq!(get_val_plus_two(Exists::from_ref(&x)), 12);
assert_eq!(get_val_plus_two(Exists::from_mut(&mut y)), 22);
assert_eq!(get_val_plus_two(Exists::from_cell(&z)), 32);

However, I'm not entirely confident in the soundness of this design. This passes under Miri, but only with raw pointer tagging disabled:

let x: u64 = 10;

// `&x as *const _` creates an untagged SharedReadOnly at the top of its borrow stack
// y has its own tag that will be thrown away when turned back into a raw pointer
let y: &() = unsafe { &*(&x as *const _ as *const ()) };

// The read checks x's borrow stack, and since the top is untagged SRO, this is fine.
// However, with -Zmiri-tag-raw-pointers, a different tag than what's on the borrow stack is used!
assert_eq!(unsafe { *(y as *const _ as *const u64) }, 10);

So this leads to the question: how sound is this pattern? Are existential references a pipe dream?

  • Can a reference soundly round-trip from &T to *const T to &(), and then back to &T, where size_of::<T>() > 0?
  • Does pointer provenance as implemented act like -Zmiri-tag-raw-pointers, which declares the above invalid due to a retag? Or does it treat the provenance of raw pointers as less strict than for references?
  • What is the risk of introducing this pattern to production code? Other than causing -Zmiri-tag-raw-pointers to always fail.
  • Would special-casing ZST references as described in A really sad solution to extern type: special case reference-to-ZST #305 explicitly cause this pattern to be supported?
@thomcc
Copy link
Member

thomcc commented Dec 15, 2021

Can a reference soundly round-trip from &T to *const T to &(), and then back to &T, where size_of::<T>() > 0?

Not currently, see #256 and #134.

@RalfJung
Copy link
Member

RalfJung commented Dec 15, 2021

The underlying issue is a duplicate of #303.

@RalfJung
Copy link
Member

And indeed the ZST-based proposal fails under current Stacked Borrows since the &() reference (and everything derived from it) may only be used to access the 0 bytes of memory that they point do...

@kupiakos
Copy link
Author

kupiakos commented Dec 15, 2021

Not currently, see #256 and #134.

The primary difference I see between this and #134 is that in that case, the pointer is initially derived from a part of the object. In this case, the initial pointer is derived from the entire object. This seems to be pretty similar to #256, though, agreed. Would extern type being supported by SB then allow this to be sound, by instead of storing a ZST, storing an extern type?

since the &() reference (and everything derived from it) may only be used to access the 0 bytes of memory that they point do...

I'm a bit confused about this, and about raw pointer tagging in general. If raw pointers are truly untagged as stated here, then two raw pointers in the same allocation with the same permission/protector should be equivalent, right? The &() was derived from a *const T, so the object is accessible by an untagged SRO pointer. Casting back to a *const T results in an untagged SRO pointer. The language in the SB reference is pretty clear to me that this should work - it requires a change in the rules, that raw pointers are tagged, to cause a problem.

If the problem is incorrect provenance information deriving a *const T from a &(), then would first casting to usize erase provenance information and be sound?

(apologies on the accidental close/reopen, cat walked on keyboard)

@kupiakos kupiakos reopened this Dec 15, 2021
@RalfJung
Copy link
Member

Would extern type being supported by SB then allow this to be sound, by instead of storing a ZST, storing an extern type?

It's more that whatever needs to be done to support extern type will probably let us fix or at least greatly improve the other issues as well and hence might fix the problem here.

I'm a bit confused about this, and about raw pointer tagging in general. If raw pointers are truly untagged as stated here, then two raw pointers in the same allocation with the same permission/protector should be equivalent, right? The &() was derived from a *const T, so the object is accessible by an untagged SRO pointer. Casting back to a *const T results in an untagged SRO pointer. The language in the SB reference is pretty clear to me that this should work - it requires a change in the rules, that raw pointers are tagged, to cause a problem.

Indeed, with untagged raw pointers, Stacked Borrows gets "sufficiently confused" that this works out, But raw pointers being untagged is just a cheap way to support ptr-int-ptr roundtrips; to match what LLVM does, raw pointers do need to be tagged.

If the problem is incorrect provenance information deriving a *const T from a &(), then would first casting to usize erase provenance information and be sound?

usize roundtrips just make everything extremely speculative (LLVM is self-contradicting in its support for them, as shown by examples such as this one), but I don't think they are actually a good way to solve anything.^^

@comex
Copy link

comex commented Dec 18, 2021

In my opinion, it's reasonably safe to assume that #303 is a spec bug and transmuting &[T] to &[UnsafeCell<T>] isn't "really" undefined behavior. UnsafeCell is supposed to weaken assumptions (about immutability), not strengthen them.

The same can't be said of accessing references beyond the bounds of the type, which is very intentionally UB (or at least potential UB) to enable specific optimizations. I think there should be an opt-out of some sort, but, other than extern type, it doesn't exist yet...

@RalfJung
Copy link
Member

In my opinion, it's reasonably safe to assume that #303 is a spec bug and transmuting &[T] to &[UnsafeCell] isn't "really" undefined behavior. UnsafeCell is supposed to weaken assumptions (about immutability), not strengthen them.

I tend to agree.

The same can't be said of accessing references beyond the bounds of the type, which is very intentionally UB (or at least potential UB) to enable specific optimizations. I think there should be an opt-out of some sort, but, other than extern type, it doesn't exist yet...

I don't think this actually enables any optimizations. What is important is that you cannot access things that are in-bounds of other references in conflicting ways, but accessing "unclaimed memory" that does not really belong to any currently active reference should be fine. Stacked Borrows is IMO unnecessarily restrictive here.

@comex
Copy link

comex commented Dec 21, 2021

I don't think this actually enables any optimizations. What is important is that you cannot access things that are in-bounds of other references in conflicting ways, but accessing "unclaimed memory" that does not really belong to any currently active reference should be fine. Stacked Borrows is IMO unnecessarily restrictive here.

Hrm… I agree that forbidding accessing "unclaimed memory" is not necessary for the main optimizations in question, and it would be nice not to forbid it, but I'm hesitant to guarantee it won't end up being forbidden anyway for implementation convenience.

For example, imagine enforcing reference bounds on slice indexing by adding the inrange flag on the corresponding LLVM getelementptr instructions. From the LangRef:

If the inrange keyword is present before any index, loading from or storing to any pointer derived from the getelementptr has undefined behavior if the load or store would access memory outside of the bounds of the element selected by the index marked as inrange.

The LangRef does go on to mention some things that disqualify inrange from being usable for that purpose in its current state…

The result of a pointer comparison or ptrtoint (including ptrtoint-like operations involving memory) involving a pointer derived from a getelementptr with the inrange keyword is undefined, with the exception of comparisons in the case where both operands are in the range of the element selected by the inrange keyword, inclusive of the address one past the end of that element.

But suppose those restrictions were hypothetically loosened, and we started to use inrange for slice indexing. Then it would be UB to access "unclaimed memory" past the end of a slice from any reference that was derived from indexing into that slice. Combine with std::slice::from_ref for more havoc.

Perhaps this drawback would be enough to disqualify inrange by itself, but I'm hesitant to guarantee it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants