-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support field projection in any #[repr(transparent)]
wrapper type
#196
Comments
Unaligned
(and other wrappers?)Unalign
(and other wrappers?)
Unalign
(and other wrappers?)#[repr(transparent)]
wrapper type
Note that this allowes |
Good catch, thanks for that, @SkiFire13! Took a quick stab at fixing it and ran into inference issues (specifically when assigning |
A couple of ideas for this:
|
Found unsoundness in the way we're using |
Ended up taking this approach, thanks for the suggestion, @SkiFire13! |
Another challenge we've just run into: The code as it's currently designed allows for an arbitrary sequence of field accesses, index operations, and slice operations. However, if a field is a reference type, this has the consequence of indirecting through the reference. This breaks the soundness of the We need to figure out a way to reject any projection through references. |
Overview
Support field projection inside of
#[repr(transparent)]
wrapper types defined in zerocopy and in the standard library.Many thanks to @jswrenn, @kupiakos, @djkoloski, @SkiFire13, @cuviper, @danielhenrymantilla, and @DanielKeep for invaluable feedback and input on this design.
Motivation
There are a number of wrapper types - both in zerocopy and in the standard library - whose length and field positions are identical to those of a single, wrapped type, but which modify other aspects of the memory of that type in some way. Examples include:
MaybeUninit<T>
- layout is identical toT
, but any sequence of bytes (including uninitialized bytes) are a valid instanceUnsafeCell<T>
- layout and bit validity are identical toT
, but permits aliased mutation through a shared reference (&
)Cell<T>
- likeUnsafeCell<T>
, but limits the allowed mutations to those that can be guaranteed to be soundUnalign<T>
- identical toT
, except with alignment 1ByteArray<T>
- identical toT
, except with alignment 1 and any sequence of bytes (not including uninitialized bytes) are a valid instanceOur most important uses of field projection are:
MaybeValid<T>
type required by ourTryFromBytes
design in order to support deriving theTryFromBytes
trait on users' types.Unalign<T>
type, which is exposed to users of zerocopyMost of the complexity of field projection is not specific to the wrapper type. Thus, by solving field projection in the general case, we avoid having to solve it multiple times for different types (e.g. for
MaybeValid
andUnalign
), and it gives us one central location to encapsulate all of the complexity. In other words, the primary motivations for this design are to support theMaybeValid
andUnalign
types, but field projection in other types comes for free.API
User
A user who wishes to perform field projection does so using the
project!
macro, which can be invoked in a number of ways:Container author
Container authors implement the
Projectable
trait:Design
This design is being prototyped in the field-project branch.
The core of the design is captured in this code snippet, which is explained below:
In order to explain this design, consider the following hypothetical wrapper type:
Projectable
and projection validityProjectable
can only be implemented by types for which projection is valid. In particular, it must be the case that, if a memory region contains aWrapper<T>
, andT
contains a field of typeF
at a particular offset, it is valid to treat the bytes at the field offset as aWrapper<F>
instead of anF
. Roughly speaking, this means thatWrapper
must be#[repr(transparent)]
or some equivalent repr (like#[repr(C, packed)]
). Examples of types which do not support projection are:PantomData<T>
- a zero-sized type regardless ofT
; any byte offset other than 0 is out-of-bounds (see "Open Questions and Alternatives" for a caveat)#[repr(C)] struct Foo<T> { t: T, u: u8 }
- performing field projection would cause the trailingu
field to overlap with other parts ofT
Thankfully, all of the wrapper types this design is aimed at support projection.
Field offset computation
One of the core challenges of supporting field projection is to calculate the field's offset. For example, consider projecting
&Wrapper<(u8, u16)>
into&Wrapper<u16>
. Given the byte offset of theu16
field within the(u8, u16)
type, field projection is trivial - perform the appropriate pointer offset math, and then materialize a&Wrapper<u16>
at the computed memory address. So how do we determine the field offset?The standard library provides the
addr_of!
macro for this purpose. It operates on a "place" expression such as:Thus, given a
&Wrapper<(u8, u16)>
, we can convert it to a*const (u8, u16)
and then useaddr_of!
to compute the address at which we should materialize our&Wrapper<u16>
.The role of the
inner_to_field
argument to theproject
andproject_mut
functions is to encapsulate a call toaddr_of!
oraddr_of_mut!
. These calls themselves cannot happen inside ofproject
orproject_mut
because the field access operation must operate on a concrete type, and all types are generic inside ofproject
/project_mut
. Theinner_to_field
argument allows theaddr_of!
/addr_of_mut!
call to be synthesized inside of theproject!
macro, where the types are concrete.Alignment
Currently, the
addr_of!
macro requires that any dereferences that happen - even dereferences that don't result in a load from memory - can only operate on properly-aligned pointers. This means that the following naive implementation would be unsound:Since
inner_ptr
may not be properly aligned as required by(u8, u16)
, the*inner_ptr
inaddr_of!
may be unsound.In order to avoid this problem, we don't operate directly on the
Projectable::Inner
type. Instead of converting a*const P
to a*const P::Inner
, we convert it to a*const Unalign<P::Inner>
(whereUnalign
is a#[doc(hidden)]
type used only byproject!
). Then, instead of emitting anaddr_of!
call like this (which works ifinner: *const P::Inner
):...we insert a
.0
sinceinner
is actually*const Unalign<P::Inner>
:This solves the alignment problem, but has the unfortunate side effect of preventing us from supporting projection from unsized types. The reason is that
Unalign
isrepr(packed)
, andrepr(packed)
types cannot be unsized. Luckily, the behavior ofaddr_of!
may soon change in rust-lang/reference#1387 to permit dereferencing unaligned pointers. If that happens, we can remove theUnalign
trick entirely and support unsized projection (this would also depend upon Miri being taught that unaligned dereferences are sound, which is implemented in rust-lang/rust#114330).Projectable
's safety invariantsThe
Projectable
trait's safety invariants might seem to someone familiar with unsafe Rust to be surprisingly complex. See this document for an explanation of why supporting unsized types introduces subtlety that necessitates such complex safety invariants.Preventing field projection through references or other
Deref
/DerefMut
fieldsCurrently,
project!
is unsafe to call. In order to see why, consider the following invocation:In the first code block, we perform a normal field projection into the type
Foo
, which works as expected, and is sound. While the containedFoo
might be uninitialized, we at least know it's layout and field offsets. Thus, based on wherem0
lives in memory, we can compute the address of the containedFoo::b
field. Since we return it as&MaybeUninit<u16>
, we're not exposing any uninitialized memory.In the second code block, we perform a field projection into the type
&Foo
. This is unsound. Unlike in the first code block,Foo::b
does not live inm1
, but instead lives at some memory address which is referred to bym1
. Since the contents ofm1
might be uninitialized, the address that we need to dereference might be uninitialized. Thus, it's possible that the&MaybeUninit<u16>
returned fromproject!
might itself point anywhere in memory (and semantically, the address itself - not its referent - might be uninitialized).Thus, we need to prevent field projection through references like
&Foo
. References at the top level aren't the only issue - we also need to prevent field projection through references which are nested arbitrarily deep within another type such as(u8, &Foo)
, etc. Even projection through non-reference fields which implementDeref
/DerefMut
(e.g.,Box
) need to be prevented.Unfortunately, there doesn't seem to be any good way to statically forbid such projections without significantly worsening the usability of
project!
, for example by requiring the caller to provide a full copy of the definition of the type at the call site.For this reason, we've decided to simply make the
project!
macro unsafe, and put the onus on the caller to avoid projection through references.Here are some other alternatives we considered and rejected:
project
to take aFnOnce(P::Inner) -> F
closure. This closure would never be called, but it would only compile if it was possible to moveP::Inner
by ownership and extract an ownedF
. This has a few problems:let { a, b } = foo
; you need to be able to nameFoo { a, b }
)inner.a.b.c
because that would work behind references orDeref
/DerefMut
if one of the types behind the reference isCopy
P::Inner
andF
need to be moved by valueoffset_of!
macro. We can't use it while it's unstable, of course, but it also has other drawbacks: It currently doesn't support unsized types or array indexing, both of which our macro supports.Solving this problem without requiring
project!
to be unsafe may require a change to the language such as the addition of an alternative toaddr_of!
which bans any memory loads in the expression that is its argument.Why
project
/project_mut
take anUnsafeToken
instead of beingunsafe fn
sAn earlier version of this design had
project
andproject_mut
asunsafe fn
s, resulting in macro code like (simplified for brevity):This has the effect of allowing the the caller to pass unsafe code to
project!
(in$c
or$f
), and that code will silently be placed inside anunsafe
block, permitting the caller to invoke unsafe code without needing to writeunsafe
themselves. The current version of the design shadows this soundness hole by still requiring anunsafe
block in order to callpromise_no_deref
; however, if a future version of the design were to statically prevent projection-through-deref and thus remove this call, the soundness hole would be uncovered.An obvious solution to this problem is simply move the macro variable expansion outside of the
unsafe
block by assigning to variables:Unfortunately, due to a quirk of rustc's type inference algorithm, this causes type inference to fail. The use of
UnsafeToken
allows us to avoid both the type inference issue and the unsafe-code-smuggling issue by passing theinner_to_field
closure inline and not needing to place it inside anunsafe
block.Credit to @SkiFire13 for pointing out the unsafe-code-smuggling issue and for suggesting the
UnsafeToken
solution.Why does
Projectable
have type parameters?Naively, we might expect
Projectable
to be defined like this:In other words, since we have GATs, we shouldn't need to have
Projectable
be parameteric over every pair of field type and wrapped-version-of-that-field type (ie,F
andW
). We should be able to use aWrapped
GAT to express this.However, this doesn't play nicely with wrapper types that don't support wrapping unsized types. Consider what would happen if we tried to implement
Projectable
as defined here forMaybeUninit
, which doesn't support unsized types:Alternatively, we could remove the
T: ?Sized
bound, but then we wouldn't be able to support projection into unsized fields. By contrast, placingF
andW
in the definition ofProjectable
allows types to support or not support unsized types as desired:Does this design support projection into an
UnsafeCell
?We might expect that it'd be valid to support projection into an
UnsafeCell
:This design could easily support such projection by implementing
Projectable
forUnsafeCell
(and wrapper types likeCell
).As of this writing, it seems that the intention is for this to be sound, but that it's not currently guaranteed. In fact, there was very recently a bug in the standard library that rendered this unsound in practice (it's been fixed).
Thus, while it will likely be officially sound at some point, we need to wait until that decision is made before we can implement
Projectable
forUnsafeCell
and support this.Miscellaneous
A few other aspects of this design are worth calling out:
project
andproject_mut
functions aren't strictly necessary - all of their logic could be inlined inside ofproject!
. However, in a macro context, there's no way to name the concrete types being operated on, and similarly no way to name bounds on those types. There are ways to hack around these limitations to generate code that will only compile in the right circumstances, but it's unwieldy and error-prone. Theproject
andproject_mut
functions provide a place to encode these types and bounds so that theproject!
macro can be simpler.Projectable::Inner
type is unnecessary - if theProjectable
type is#[repr(transparent)]
, then it doesn't matter whether the field offset is calculated from a pointer of typeSelf
or of typeSelf::Inner
. However, the call toaddr_of!
must happen in a context in which the inner type is known and concrete, which in turn requires that theinner_to_field
argument toproject
/project_mut
takes aP::Inner
pointer as its argument. The only way to avoid this would be to get rid ofproject
/project_mut
entirely, and instead put all logic inside ofproject!
, which is undesirable for the reasons discussed above.Performance
Testing on Godbolt confirms that generated code is well-optimized (
-C opt-level=3
):Open Questions
project!
uses$c.borrow()
to make it so that it works with both borrowed and owned containers. Is there a way to do this that wouldn't conflict if a type had an inherent method calledborrow
(ie, make sureBorrow::borrow
is always used)? One possibility would be to add our own extension trait that is implemented for allT: Borrow
with a method whose name is much less likely to conflict like__project_borrow
.dyn Trait
s? It is legal toas
castdyn Trait
fat pointers, so it's not clear how the safety requirements onProjectable
apply to them. It's probably fine in practice because you can't access the fields of adyn Trait
object, and so you couldn't generate a valid call toproject!
. That said, what about something likeMaybeUninit<(u8, dyn Foo)>
? What would happen if you tried to project into.1
?Potential future improvements
PhantomData<T>
beProjectable
? SincePhantomData
is a ZST, maybe it's fine to materialize references to it which are technically out-of-bounds?f
, directly, I can call aget_f(&self) -> &F
method.transmute
or a union (the latter might be necessary if the type system doesn't know that both pointers are fat and so have the same size) instead of anas
cast.W<[T]>
->[W<T>]
,W<[T; N]>
->[W<T>; N]
, etc).Alternatives
MaybeValid
type as part of the implementation ofTryFromBytes
, and for theUnalign
type) rather than supporting generic field projection? We could, but most of the complexity in this design is inherent to field projection rather than specific to the wrapper type. We would have to solve the same problems twice - once for each type - and again for any future type we wanted to support. Supporting generic field projection is not significantly more complicated, cuts our work roughly in half, and allows us to encapsulate all of the complexity of field projection in a single place.The text was updated successfully, but these errors were encountered: