Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Pointer metadata & VTable #2580

Merged
merged 31 commits into from
Jan 29, 2021
Merged
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
0bb757e
RFC: DynTrait and VTable
SimonSapin Oct 25, 2018
d4acf9b
Rename to vtable RFC to ptr-meta
SimonSapin Oct 26, 2018
8de354c
RFC: Pointer metadata
SimonSapin Oct 26, 2018
24a2ff8
Pointer metadata: add bounds on the associated type
kennytm Oct 28, 2018
dd3de2f
Pointer metadata: typo fix
oli-obk Oct 28, 2018
637647f
Pointer metadata: typo fix
oli-obk Oct 28, 2018
99fbed9
Pointer metadata: typo fix
oli-obk Oct 28, 2018
74e185c
Pointer metadata: typo fix
oli-obk Oct 28, 2018
65418d0
Pointer metadata: grammar
oli-obk Oct 28, 2018
ef9ee32
Pointer metadata: grammar
oli-obk Oct 28, 2018
f6d95d7
Pointer metadata: grammar
oli-obk Oct 28, 2018
e1aa0df
Pointer metadata: fix unfinished sentences
SimonSapin Oct 28, 2018
18f1fef
Pointer metadata: remove paragraph made redundant
SimonSapin Oct 28, 2018
3dd1484
Pointer metadata: remove `VTable::drop_in_place`?
SimonSapin Oct 31, 2018
8ede231
Pointer metadata: more NonNull API
SimonSapin Oct 31, 2018
fb6ebad
Pointer metadata: generic code can assume `T: Pointee`.
SimonSapin Oct 31, 2018
bf80163
Pointer metadata: rewrite Guide-level explanation with more useful ex…
SimonSapin Oct 31, 2018
2edbe38
Pointer metadata: more unresolved questions
SimonSapin Oct 31, 2018
bb5e09a
Pointer metadata: typo
SimonSapin Nov 14, 2018
bc2a2e7
Pointer metadata: add an `Metadata: Unpin` bound
SimonSapin Nov 14, 2018
7be2349
Pointer metadata RFC: remove `VTable::drop_in_place`
SimonSapin Jun 21, 2019
538b9aa
Pointer metadata RFC: replace `&'static VTable` with `DynMetadata`
SimonSapin Jun 21, 2019
e47aa6e
Pointer metadata RFC: mention extern types in doc-comment.
SimonSapin Jun 22, 2019
29c7547
Pointer metadata RFC: use `NonNull` instead of `&'static` for vtable …
SimonSapin Sep 13, 2020
1e5622f
Pointer metadata RFC: add unresolved question for parameterizing `Dyn…
SimonSapin Sep 13, 2020
04ba25f
Pointer metadata RFC: the Drop impl example written two years ago app…
SimonSapin Sep 14, 2020
f94f627
Pointer metadata: remove 'static bound on the Metadata associated type
SimonSapin Dec 29, 2020
236e657
Pointer metadata: add unresolved question for `into_raw_parts`
SimonSapin Dec 29, 2020
7b0175e
Pointer metadata: parameterize `DynMetadata` over its `dyn Trait` type
SimonSapin Dec 29, 2020
3ff39f0
Typo fix
SimonSapin Jan 23, 2021
50b567b
RFC 2580: Pointer metadata
KodrAus Jan 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
397 changes: 397 additions & 0 deletions text/2580-ptr-meta.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,397 @@
- Feature Name: `ptr-meta`
- Start Date: 2018-10-26
- RFC PR: https://github.com/rust-lang/rfcs/pull/2580
- Rust Issue: https://github.com/rust-lang/rust/issues/81513

# Summary
[summary]: #summary

Add generic APIs that allow manipulating the metadata of fat pointers:

* Naming the metadata’s type (as an associated type)
* Extracting metadata from a pointer
* Reconstructing a pointer from a data pointer and metadata
* Representing vtables, the metadata for trait objects, as a type with some limited API

This RFC does *not* propose a mechanism for defining custom dynamically-sized types,
but tries to stay compatible with future proposals that do.


# Background
[background]: #background

Typical high-level code doesn’t need to worry about fat pointers,
a reference `&Foo` “just works” wether or not `Foo` is a DST.
But unsafe code such as a custom collection library may want to access a fat pointer’s
components separately.

In Rust 1.11 we *removed* a [`std::raw::Repr`] trait and a [`std::raw::Slice`] type
from the standard library.
`Slice` could be `transmute`d to a `&[U]` or `&mut [U]` reference to a slice
as it was guaranteed to have the same memory layout.
This was replaced with more specific and less wildly unsafe
`std::slice::from_raw_parts` and `std::slice::from_raw_parts_mut` functions,
together with `as_ptr` and `len` methods that extract each fat pointer component separatly.

For trait objects, where we still have an unstable `std::raw::TraitObject` type
that can only be used with `transmute`:

```rust
#[repr(C)]
pub struct TraitObject {
pub data: *mut (),
pub vtable: *mut (),
}
```

[`std::raw::Repr`]: https://doc.rust-lang.org/1.10.0/std/raw/trait.Repr.html
[`std::raw::Slice`]: https://doc.rust-lang.org/1.10.0/std/raw/struct.Slice.html
[`std::raw::TraitObjet`]: https://doc.rust-lang.org/1.30.0/std/raw/struct.TraitObject.html


# Motivation
[motivation]: #motivation

We now have APIs in Stable Rust to let unsafe code freely and reliably manipulate slices,
accessing the separate components of a fat pointers and then re-assembling them.
However `std::raw::TraitObject` is still unstable,
but it’s probably not the style of API that we’ll want to stabilize
as it encourages dangerous `transmute` calls.
This is a “hole” in available APIs to manipulate existing Rust types.

For example [this library][lib] stores multiple trait objects of varying size
in contiguous memory together with their vtable pointers,
and during iteration recreates fat pointers from separate data and vtable pointers.

The new `Thin` trait alias also expanding to [extern types] some APIs
that were unnecessarily restricted to `Sized` types
because there was previously no way to express pointer-thinness in generic code.

[lib]: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2015&gist=bbeecccc025f5a7a0ad06086678e13f3


# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation


Let’s build generic type similar to `Box<dyn Trait>`,
but where the vtable pointer is stored in heap memory next to the value
so that the pointer is thin.
First, let’s get some boilerplate out of the way:

```rust
use std::marker::{PhantomData, Unsize};
use std::ptr::{self, DynMetadata};

trait DynTrait<Dyn> = Pointee<Metadata=DynMetadata<Dyn>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does Pointee come from here? Is it assumed to be a part of the prelude?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s no plan to have it in the prelude. This is a mistake in example code, but then it’s just an example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry, just figured I might as well leave a note about these nits while I was reading through the RFC :)


pub struct ThinBox<Dyn: ?Sized + DynTrait<Dyn>> {
ptr: ptr::NonNull<WithMeta<()>>,
phantom: PhantomData<Dyn>,
}

#[repr(C)]
struct WithMeta<T: ?Sized> {
vtable: DynMetadata,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't DynMetadata need a generic parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I forget to update this example when making that change.

value: T,
}
```

Since [unsized rvalues] are not implemented yet,
our constructor is going to “unsize” from a concrete type that implements our trait.
The `Unsize` bound ensures we can cast from `&S` to a `&Dyn` trait object
and construct the appopriate metadata.

[unsized rvalues]: https://github.com/rust-lang/rust/issues/48055

We let `Box` do the memory layout computation and allocation:

```rust
impl<Dyn: ?Sized + DynTrait> ThinBox<Dyn> {
pub fn new_unsize<S>(value: S) -> Self where S: Unsize<Dyn> {
let vtable = ptr::metadata(&value as &Dyn);
let ptr = NonNull::from(Box::leak(Box::new(WithMeta { vtable, value }))).cast();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use Box::into_raw instead? Box::leak produces a &, which it seems slightly weird to then pass to dealloc in Drop (which takes a *mut _).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. Using NonNull::from avoid the unwrap with NonNull::new or the unsafe with NonNull::unchecked.

leak returns &mut though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, you're right, my bad! Given the &mut, I think it's fine.

ThinBox { ptr, phantom: PhantomData }
}
}
```

(Another possible constructor is `pub fn new_copy(value: &Dyn) where Dyn: Copy`,
but it would involve slightly more code.)

Accessing the value requires knowing its alignment:

```rust
impl<Dyn: ?Sized + DynTrait> ThinBox<Dyn> {
fn data_ptr(&self) -> *mut () {
unsafe {
let offset = std::mem::size_of::<DynMetadata<Dyn>();
let value_align = self.ptr.as_ref().vtable.align();
let offset = align_up_to(offset, value_align);
(self.ptr.as_ptr() as *mut u8).add(offset) as *mut ()
}
}
}

/// <https://github.com/rust-lang/rust/blob/1.30.0/src/libcore/alloc.rs#L199-L219>
fn align_up_to(offset: usize, align: usize) -> usize {
offset.wrapping_add(align).wrapping_sub(1) & !align.wrapping_sub(1)
}

// Similarly Deref
impl<Dyn: ?Sized + DynTrait> DerefMut for ThinBox<Dyn> {
Copy link
Contributor

@jonhoo jonhoo Jan 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above refers to Deref, but this only shows DerefMut.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "similarly" comment is a handwaving this away, leaving it as an exercise to the reader. The point of example code is to show the kind of thing that could be built on top of this RFC’s proposed APIs, not to provide a solid library you can actually copy-paste and use.

Anyway, this is a merged PR. If you feel strongly enough about fixing this example code feel free to submit another PR.

fn deref_mut(&mut self) -> &mut Dyn {
unsafe {
&mut *<*mut Dyn>::from_raw_parts(self.data_ptr(), *self.ptr.as_ref().vtable)
}
}
}
```

Finally, in `Drop` we may not be able to take advantage of `Box` again
since the original `Sized` type `S` is not statically known at this point.

```rust
impl<Dyn: ?Sized + DynTrait> Drop for ThinBox<Dyn> {
fn drop(&mut self) {
unsafe {
let layout = /* left as an exercise for the reader */;
ptr::drop_in_place::<Dyn>(&mut **self);
alloc::dealloc(self.ptr.cast(), layout);
}
}
}
```


# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

The APIs whose full definition is found below
are added to `core::ptr` and re-exported in `std::ptr`:

* A `Pointee` trait,
implemented automatically for all types
(similar to how `Sized` and `Unsize` are implemented automatically).
* A `Thin` [trait alias].
If this RFC is implemented before type aliases are,
uses of `Thin` should be replaced with its definition.
* A `metadata` free function
* A `DynMetadata` struct
* A `from_raw_parts` constructor for each of `*const T`, `*mut T`, and `NonNull<T>`.

The bounds on `null()` and `null_mut()` function in that same module
as well as the `NonNull::dangling` constructor
are changed from (implicit) `T: Sized` to `T: ?Sized + Thin`.
Similarly for the `U` type parameter of the `NonNull::cast` method.
This enables using those functions with [extern types].

The `Pointee` trait is implemented for all types.
This can be relied on in generic code,
even if a type parameter `T` does not have an explicit `T: Pointee` bound.
This is similar to how the `Any` trait can be used without an explicit `T: Any` bound,
only `T: 'static`, because a blanket `impl<T: 'static> Any for T {…}` exists.
(Except that `Pointee` is not restricted to `'static`.)

For the purpose of pointer casts being allowed by the `as` operator,
a pointer to `T` is considered to be thin if `T: Thin` instead of `T: Sized`.
This similarly includes extern types.

`std::raw::TraitObject` and `std::raw` are deprecated and eventually removed.

[trait alias]: https://github.com/rust-lang/rust/issues/41517
[extern types]: https://github.com/rust-lang/rust/issues/43467

```rust
/// This trait is automatically implemented for every type.
///
/// Raw pointer types and reference types in Rust can be thought of as made of two parts:
/// a data pointer that contains the memory address of the value, and some metadata.
///
/// For statically-sized types (that implement the `Sized` traits)
/// as well as for `extern` types,
/// pointers are said to be “thin”: metadata is zero-sized and its type is `()`.
///
/// Pointers to [dynamically-sized types][dst] are said to be “fat”
/// and have non-zero-sized metadata:
///
/// * For structs whose last field is a DST, metadata is the metadata for the last field
/// * For the `str` type, metadata is the length in bytes as `usize`
/// * For slice types like `[T]`, metadata is the length in items as `usize`
/// * For trait objects like `dyn SomeTrait`, metadata is [`DynMetadata<Self>`][DynMetadata]
/// (e.g. `DynMetadata<dyn SomeTrait>`).
///
/// In the future, the Rust language may gain new kinds of types
/// that have different pointer metadata.
Comment on lines +211 to +225
Copy link

@matthieu-m matthieu-m Feb 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After experimenting with custom implementations of Box, I think there is a strong case for having strongly typed meta-data for all kinds of pointers.

The pre-allocator representation of Box is:

struct Box<T: ?Sized> { ptr: NonNull<T>, }

The post-allocator representation is very similar:

struct Box<T: ?Sized, A: Allocator = Global> {
    allocator: A,
    ptr: NonNull<T>,
}

Both automatically implements CoerceUnsized<Box<U>> where T: Unsize<U>, and all is well.

If one wants to make Box generic over its storage, then the representation becomes:

pub struct RawBox<T: ?Sized + Pointee, S: SingleElementStorage> {
    storage: S,
    handle: S::Handle<T>,
}

If S::Handle<T> == NonNull<T>, then Box is still coercible; however, in the case of inline storage, that is:

  • neither possible: when the Box is moved, so is the storage, and therefore any pointer into the storage is invalidated.
  • nor desirable: in the case of inline storage, the pointer is redundant, wasting 8 bytes.

Hence, in the case of inline storage, S::Handle<T> is best defined as <T as Pointee>::Metadata.

In order to have Box<T> : CoerceUnsized<Box<U>> where T: Unsize<U>:

  • We need: S::Handle<T>: CoerceUnsized<S::Handle<U>> where T: Unsize<U>,
  • Which means: <T as Pointee>::Metadata: CoerceUnsized<<U as Pointee>::Metadata>> where T: Unsize<U>.

And of course, Box being coercible is very much desirable.


As a result, I believe a slight change of course is necessary:

  1. All metadata should be strongly typed -- be it Metadata<dyn Debug>, Metadata<[u8]> or Metadata<[u8; 3]> -- no more () or usize.
  2. The compiler should automatically implement Metadata<T>: CoerceUnsized<Metadata<U>> where T: Unsize<U>.

I would note that having a single Metadata<T> rather than SizedMetadata<T>, SliceMetadata<[T]>, DynMetadata<dyn T> is not necessary, only the coercion is, and since the compiler is generating those, it's perfectly free to create them "cross type".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is merged. Consider sending further discussion to the tracking issue: rust-lang/rust#81513

(That said, I don’t see a point in having a Box generic over storage. Especially Box<T, InlineStorage>, isn’t it pretty much the same as T?)

///
/// Pointer metadata can be extracted from a pointer or reference with the [`metadata`] function.
/// The data pointer can be extracted by casting a (fat) pointer
/// to a (thin) pointer to a `Sized` type with the `as` operator,
/// for example `(x: &dyn SomeTrait) as *const SomeTrait as *const ()`
/// or `(x: *const dyn SomeTrait).cast::<()>()`.
///
/// [dst]: https://doc.rust-lang.org/nomicon/exotic-sizes.html#dynamically-sized-types-dsts
#[lang = "pointee"]
pub trait Pointee {
/// The type for metadata in pointers and references to `Self`.
type Metadata: Copy + Send + Sync + Ord + Hash + Unpin;
}

/// Pointers to types implementing this trait alias are “thin”:
///
/// ```rust
/// fn this_never_panics<T: std::ptr::Thin>() {
/// assert_eq!(std::mem::size_of::<&T>(), std::mem::size_of::<usize>())
/// }
/// ```
pub trait Thin = Pointee<Metadata=()>;

/// Extract the metadata component of a pointer.
///
/// Values of type `*mut T`, `&T`, or `&mut T` can be passed directly to this function
/// as they implicitly coerce to `*const T`.
/// For example:
///
/// ```
/// assert_eq(std::ptr::metadata("foo"), 3_usize);
/// ```
///
/// Note that the data component of a (fat) pointer can be extracted by casting
/// to a (thin) pointer to any `Sized` type:
///
/// ```
/// # trait SomeTrait {}
/// # fn example(something: &SomeTrait) {
/// let object: &SomeTrait = something;
/// let data_ptr = object as *const SomeTrait as *const ();
/// # }
/// ```
pub fn metadata<T: ?Sized>(ptr: *const T) -> <T as Pointee>::Metadata {…}

impl<T: ?Sized> *const T {
pub fn from_raw_parts(data: *const (), meta: <T as Pointee>::Metadata) -> Self {…}
}

impl<T: ?Sized> *mut T {
pub fn from_raw_parts(data: *mut (), meta: <T as Pointee>::Metadata) -> Self {…}
}

impl<T: ?Sized> NonNull<T> {
pub fn from_raw_parts(data: NonNull<()>, meta: <T as Pointee>::Metadata) -> Self {
unsafe {
NonNull::new_unchecked(<*mut _>::from_raw_parts(data.as_ptr(), meta))
}
}
}

/// The metadata for a `DynTrait = dyn SomeTrait` trait object type.
///
/// It is a pointer to a vtable (virtual call table)
/// that represents all the necessary information
/// to manipulate the concrete type stored inside a trait object.
/// The vtable notably it contains:
///
/// * type size
/// * type alignment
/// * a pointer to the type’s `drop_in_place` impl (may be a no-op for plain-old-data)
/// * pointers to all the methods for the type’s implementation of the trait
///
/// Note that the first three are special because they’re necessary to allocate, drop,
/// and deallocate any trait object.
///
/// It is possible to name this struct with a type parameter that is not a `dyn` trait object
/// (for example `DynMetadata<u64>`) but not to obtain a meaningful value of that struct.
#[derive(Copy, Clone)]
pub struct DynMetadata<DynTrait: ?Sized> {
// Private fields
vtable_ptr: ptr::NonNull<()>,
phantom: PhantomData<DynTrait>
}

impl<DynTrait: ?Sized> DynMetadata<DynTrait> {
/// Returns the size of the type associated with this vtable.
pub fn size(self) -> usize { ... }

/// Returns the alignment of the type associated with this vtable.
pub fn align(self) -> usize { ... }

/// Returns the size and alignment together as a `Layout`
pub fn layout(self) -> alloc::Layout {
unsafe {
alloc::Layout::from_size_align_unchecked(self.size(), self.align())
}
}
}
```


# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

The status quo is that code (such as linked in [Motivation]) that requires this functionality
needs to transmute to and from `std::raw::TraitObject`
or a copy of it (to be compatible with Stable Rust).
Additionally, in cases where constructing the data pointer
requires knowing the alignment of the concrete type,
a dangling pointer such as `0x8000_0000_usize as *mut ()` needs to be created.
It is not clear whether `std::mem::align_of(&*ptr)` with `ptr: *const dyn SomeTrait`
is Undefined Behavior with a dangling data pointer.

A [previous iteration][2579] of this RFC proposed a `DynTrait`
that would only be implemented for trait objects like `dyn SomeTrait`.
There would be no `Metadata` associated type, `DynMetadata` was hard-coded in the trait.
In addition to being more general
and (hopefully) more compatible with future custom DSTs proposals,
this RFC resolves the question of what happens
if trait objects with super-fat pointers with multiple vtable pointers are ever added.
(Answer: they can use a different metadata type,
possibly like `(DynMetadata<dyn Trait>, DynMetadata<dyn OtherTrait>)`.)

[2579]: https://github.com/rust-lang/rfcs/pull/2579


# Prior art
[prior-art]: #prior-art

A previous [Custom Dynamically-Sized Types][cdst] RFC was postponed.
[Internals thread #6663][6663] took the same ideas
and was even more ambitious in being very general.
Except for `DynMetadata`’s methods, this RFC proposes a subset of what that thread did.

[cdst]: https://github.com/rust-lang/rfcs/pull/1524
[6663]: https://internals.rust-lang.org/t/pre-erfc-lets-fix-dsts/6663


# Unresolved questions
[unresolved-questions]: #unresolved-questions

* The name of `Pointee`. [Internals thread #6663][6663] used `Referent`.

* The location of `DynMetadata`. Is another module more appropriate than `std::ptr`?

* Should `DynMetadata` not have a type parameter?
This might reduce monomorphization cost,
but would force that the size, alignment, and destruction pointers
be in the same location (offset) for every vtable.
But keeping them in the same location is probaly desirable anyway to keep code size small.

* The name of `Thin`.
This name is short and sweet but `T: Thin` suggests that `T` itself is thin,
rather than pointers and references to `T`.

* The location of `Thin`. Better in `std::marker`?

* Should `Thin` be added as a supertrait of `Sized`?
Or could it ever make sense to have fat pointers to statically-sized types?

* Are there other generic standard library APIs like `ptr::null()`
that have an (implicit) `T: Sized` bound that unneccesarily excludes extern types?

* Should `<*mut _>::from_raw_parts` and friends be `unsafe fn`s?

* API design: free functions v.s. methods/constructors on `*mut _` and `*const _`?

* Add `into_raw_parts` that returns `(*const (), T::Metadata)`?
Using the `cast` method to a `Sized` type to extract the address as a thin pointer
is less discoverable.
Possibly *instead* of the metadata function?