Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Pointer metadata & VTable #2580

Merged
merged 31 commits into from
Jan 29, 2021
Merged
Changes from 23 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
0bb757e
RFC: DynTrait and VTable
SimonSapin Oct 25, 2018
d4acf9b
Rename to vtable RFC to ptr-meta
SimonSapin Oct 26, 2018
8de354c
RFC: Pointer metadata
SimonSapin Oct 26, 2018
24a2ff8
Pointer metadata: add bounds on the associated type
kennytm Oct 28, 2018
dd3de2f
Pointer metadata: typo fix
oli-obk Oct 28, 2018
637647f
Pointer metadata: typo fix
oli-obk Oct 28, 2018
99fbed9
Pointer metadata: typo fix
oli-obk Oct 28, 2018
74e185c
Pointer metadata: typo fix
oli-obk Oct 28, 2018
65418d0
Pointer metadata: grammar
oli-obk Oct 28, 2018
ef9ee32
Pointer metadata: grammar
oli-obk Oct 28, 2018
f6d95d7
Pointer metadata: grammar
oli-obk Oct 28, 2018
e1aa0df
Pointer metadata: fix unfinished sentences
SimonSapin Oct 28, 2018
18f1fef
Pointer metadata: remove paragraph made redundant
SimonSapin Oct 28, 2018
3dd1484
Pointer metadata: remove `VTable::drop_in_place`?
SimonSapin Oct 31, 2018
8ede231
Pointer metadata: more NonNull API
SimonSapin Oct 31, 2018
fb6ebad
Pointer metadata: generic code can assume `T: Pointee`.
SimonSapin Oct 31, 2018
bf80163
Pointer metadata: rewrite Guide-level explanation with more useful ex…
SimonSapin Oct 31, 2018
2edbe38
Pointer metadata: more unresolved questions
SimonSapin Oct 31, 2018
bb5e09a
Pointer metadata: typo
SimonSapin Nov 14, 2018
bc2a2e7
Pointer metadata: add an `Metadata: Unpin` bound
SimonSapin Nov 14, 2018
7be2349
Pointer metadata RFC: remove `VTable::drop_in_place`
SimonSapin Jun 21, 2019
538b9aa
Pointer metadata RFC: replace `&'static VTable` with `DynMetadata`
SimonSapin Jun 21, 2019
e47aa6e
Pointer metadata RFC: mention extern types in doc-comment.
SimonSapin Jun 22, 2019
29c7547
Pointer metadata RFC: use `NonNull` instead of `&'static` for vtable …
SimonSapin Sep 13, 2020
1e5622f
Pointer metadata RFC: add unresolved question for parameterizing `Dyn…
SimonSapin Sep 13, 2020
04ba25f
Pointer metadata RFC: the Drop impl example written two years ago app…
SimonSapin Sep 14, 2020
f94f627
Pointer metadata: remove 'static bound on the Metadata associated type
SimonSapin Dec 29, 2020
236e657
Pointer metadata: add unresolved question for `into_raw_parts`
SimonSapin Dec 29, 2020
7b0175e
Pointer metadata: parameterize `DynMetadata` over its `dyn Trait` type
SimonSapin Dec 29, 2020
3ff39f0
Typo fix
SimonSapin Jan 23, 2021
50b567b
RFC 2580: Pointer metadata
KodrAus Jan 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
387 changes: 387 additions & 0 deletions text/0000-ptr-meta.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,387 @@
- Feature Name: `ptr-meta`
- Start Date: 2018-10-26
- RFC PR:
- Rust Issue:

# Summary
[summary]: #summary

Add generic APIs that allow manipulating the metadata of fat pointers:

* Naming the metadata’s type (as an associated type)
* Extracting metadata from a pointer
* Reconstructing a pointer from a data pointer and metadata
* Representing vtables, the metadata for trait objects, as a type with some limited API

This RFC does *not* propose a mechanism for defining custom dynamically-sized types,
but tries to stay compatible with future proposals that do.


# Background
[background]: #background

Typical high-level code doesn’t need to worry about fat pointers,
a reference `&Foo` “just works” wether or not `Foo` is a DST.
But unsafe code such as a custom collection library may want to access a fat pointer’s
components separately.

In Rust 1.11 we *removed* a [`std::raw::Repr`] trait and a [`std::raw::Slice`] type
from the standard library.
`Slice` could be `transmute`d to a `&[U]` or `&mut [U]` reference to a slice
as it was guaranteed to have the same memory layout.
This was replaced with more specific and less wildly unsafe
`std::slice::from_raw_parts` and `std::slice::from_raw_parts_mut` functions,
together with `as_ptr` and `len` methods that extract each fat pointer component separatly.

For trait objects, where we still have an unstable `std::raw::TraitObject` type
that can only be used with `transmute`:

```rust
#[repr(C)]
pub struct TraitObject {
pub data: *mut (),
pub vtable: *mut (),
}
```

[`std::raw::Repr`]: https://doc.rust-lang.org/1.10.0/std/raw/trait.Repr.html
[`std::raw::Slice`]: https://doc.rust-lang.org/1.10.0/std/raw/struct.Slice.html
[`std::raw::TraitObjet`]: https://doc.rust-lang.org/1.30.0/std/raw/struct.TraitObject.html


# Motivation
[motivation]: #motivation

We now have APIs in Stable Rust to let unsafe code freely and reliably manipulate slices,
accessing the separate components of a fat pointers and then re-assembling them.
However `std::raw::TraitObject` is still unstable,
but it’s probably not the style of API that we’ll want to stabilize
as it encourages dangerous `transmute` calls.
This is a “hole” in available APIs to manipulate existing Rust types.

For example [this library][lib] stores multiple trait objects of varying size
in contiguous memory together with their vtable pointers,
and during iteration recreates fat pointers from separate data and vtable pointers.

The new `Thin` trait alias also expanding to [extern types] some APIs
that were unnecessarily restricted to `Sized` types
because there was previously no way to express pointer-thinness in generic code.

[lib]: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2015&gist=bbeecccc025f5a7a0ad06086678e13f3


# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation


Let’s build generic type similar to `Box<dyn Trait>`,
but where the vtable pointer is stored in heap memory next to the value
so that the pointer is thin.
First, let’s get some boilerplate out of the way:

```rust
use std::marker::{PhantomData, Unsize};
use std::ptr::{self, DynMetadata};

trait DynTrait = Pointee<Metadata=DynMetadata>;

pub struct ThinBox<Dyn: ?Sized + DynTrait> {
ptr: ptr::NonNull<WithMeta<()>>,
phantom: PhantomData<Dyn>,
}

#[repr(C)]
struct WithMeta<T: ?Sized> {
vtable: DynMetadata,
value: T,
}
```

Since [unsized rvalues] are not implemented yet,
our constructor is going to “unsize” from a concrete type that implements our trait.
The `Unsize` bound ensures we can cast from `&S` to a `&Dyn` trait object
and construct the appopriate metadata.

[unsized rvalues]: https://github.com/rust-lang/rust/issues/48055

We let `Box` do the memory layout computation and allocation:

```rust
impl<Dyn: ?Sized + DynTrait> ThinBox<Dyn> {
pub fn new_unsize<S>(value: S) -> Self where S: Unsize<Dyn> {
let vtable = ptr::metadata(&value as &Dyn);
let ptr = Box::into_raw_non_null(Box::new(WithMeta { vtable, value })).cast();
ThinBox { ptr, phantom: PhantomData }
}
}
```

(Another possible constructor is `pub fn new_copy(value: &Dyn) where Dyn: Copy`,
but it would involve slightly more code.)

Accessing the value requires knowing its alignment:

```rust
impl<Dyn: ?Sized + DynTrait> ThinBox<Dyn> {
fn data_ptr(&self) -> *mut () {
unsafe {
let offset = std::mem::size_of::<DynMetadata>();
let value_align = self.ptr.as_ref().vtable.align();
let offset = align_up_to(offset, value_align);
(self.ptr.as_ptr() as *mut u8).add(offset) as *mut ()
}
}
}

/// <https://github.com/rust-lang/rust/blob/1.30.0/src/libcore/alloc.rs#L199-L219>
fn align_up_to(offset: usize, align: usize) -> usize {
offset.wrapping_add(align).wrapping_sub(1) & !align.wrapping_sub(1)
}

// Similarly Deref
impl<Dyn: ?Sized + DynTrait> DerefMut for ThinBox<Dyn> {
fn deref_mut(&mut self) -> &mut Dyn {
unsafe {
&mut *<*mut Dyn>::from_raw_parts(self.data_ptr(), *self.ptr.as_ref().vtable)
}
}
}
```

Finally, in `Drop` we can take advantage of `Box` again,
but this time

```rust
impl<Dyn: ?Sized + DynTrait> Drop for ThinBox<Dyn> {
fn drop(&mut self) {
unsafe {
Box::<Dyn>::from_raw(&mut **self);
}
}
}
```


# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

The APIs whose full definition is found below
are added to `core::ptr` and re-exported in `std::ptr`:

* A `Pointee` trait,
implemented automatically for all types
(similar to how `Sized` and `Unsize` are implemented automatically).
* A `Thin` [trait alias].
If this RFC is implemented before type aliases are,
uses of `Thin` should be replaced with its definition.
* A `metadata` free function
* A `DynMetadata` struct
* A `from_raw_parts` constructor for each of `*const T` and `*mut T`

The bounds on `null()` and `null_mut()` function in that same module
as well as the `NonNull::dangling` constructor
are changed from (implicit) `T: Sized` to `T: ?Sized + Thin`.
Similarly for the `U` type parameter of the `NonNull::cast` method.
This enables using those functions with [extern types].

The `Pointee` trait is implemented for all types.
This can be relied on in generic code,
even if a type parameter `T` does not have an explicit `T: Pointee` bound.
This is similar to how the `Any` trait can be used without an explicit `T: Any` bound,
only `T: 'static`, because a blanket `impl<T: 'static> Any for T {…}` exists.
(Except that `Pointee` is not restricted to `'static`.)

For the purpose of pointer casts being allowed by the `as` operator,
a pointer to `T` is considered to be thin if `T: Thin` instead of `T: Sized`.
This similarly includes extern types.

`std::raw::TraitObject` and `std::raw` are deprecated and eventually removed.

[trait alias]: https://github.com/rust-lang/rust/issues/41517
[extern types]: https://github.com/rust-lang/rust/issues/43467

```rust
/// This trait is automatically implement for every type.
SimonSapin marked this conversation as resolved.
Show resolved Hide resolved
///
/// Raw pointer types and reference types in Rust can be thought of as made of two parts:
/// a data pointer that contains the memory address of the value, and some metadata.
///
/// For statically-sized types (that implement the `Sized` traits)
/// as well as for `extern` types,
/// pointers are said to be “thin”: metadata is zero-sized and its type is `()`.
///
/// Pointers to [dynamically-sized types][dst] are said to be “fat”
/// and have non-zero-sized metadata:
///
/// * For structs whose last field is a DST, metadata is the metadata for the last field
/// * For the `str` type, metadata is the length in bytes as `usize`
/// * For slice types like `[T]`, metadata is the length in items as `usize`
/// * For trait objects like `dyn SomeTrait`, metadata is [`DynMetadata`].
///
/// In the future, the Rust language may gain new kinds of types
/// that have different pointer metadata.
///
/// Pointer metadata can be extracted from a pointer or reference with the [`metadata`] function.
/// The data pointer can be extracted by casting a (fat) pointer
/// to a (thin) pointer to a `Sized` type the `as` operator,
/// for example `(x: &dyn SomeTrait) as *const SomeTrait as *const ()`.
///
/// [dst]: https://doc.rust-lang.org/nomicon/exotic-sizes.html#dynamically-sized-types-dsts
#[lang = "pointee"]
pub trait Pointee {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so... I'm assuming the compiler implements

default impl<T: ?Sized> Pointee for T {
    type Metadata = &'static Vtable;
}
impl<T: Sized> Pointee for T {
    type Metadata = ();
}
impl Pointee for str {
    type Metadata = usize;
}
impl<T: Sized> Pointee for [T] {
    type Metadata = usize;
}

Which means theoretically we could make Vtable generic over T allowing the drop_in_place method to take a raw pointer with the correct pointee type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These impls would be accurate in current Rust, but what I had in mind instead was that the compiler would automatically generate impls, similar to what it does for the std::marker::Unsize trait. As far as the standard library is concerned these impls would be "magic", not based on specialization.

Regardless, yes, making VTable generic with a type parameter for the trait object type is possible.

/// The type for metadata in pointers and references to `Self`.
type Metadata: Copy + Send + Sync + Ord + Hash + Unpin + 'static;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to list Unpin here is concerning... does that mean that if we get another auto trait in libstd we'll want to add it here but cannot due to backwards compatibility?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the trait bounds there in the first place? I can understand Copy and maybe Send + Sync, since you copy the metadata when you copy the pointer. But why does metadata need to be Ord or Hash? If some function really needs those, it could always use T: Pointee<Metadata: Ord> or similar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've explained the rationale of this list in #2580 (comment) and Unpin in #2580 (comment).

@comex Because the following works on stable all the way back to 1.0:

fn is_before<T: ?Sized>(a: *const T, b: *const T) -> bool {
    a < b
}

Without introducing ?DynSized, the T: ?Sized have to match all types including slices and trait pointers and extern type and custom DST pointers. The fat-pointer comparison is always implemented as (a.ptr, a.meta) < (b.ptr, b.meta) without requiring T::Metadata: PartialOrd in is_before. This means such bound must be implicitly satisfied everywhere, and thus included in the list.

Every other bound are included for the same reason that *const T and/or &T unconditionally implements that trait.

@RalfJung we could tweak the auto trait rule so that it needs to check T::Metadata first before impl for *const T, *mut T, &T and &mut T, assuming auto-trait is not stabilized before Pointee is implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that mean that if we get another auto trait in libstd we'll want to add it here but cannot due to backwards compatibility?

Does it?

This RFC proposes that this trait is automatically implemented for all types, so "manual" implementations of it cannot exist. (Unless some day the language grows types that cannot be used behind a pointer, but I can’t imagine how that would be useful. And even then, I assume we’d disallow manual impls.)

And it’s only those impls that would potentially be broken by adding a new bound, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fat-pointer comparison is always implemented as (a.ptr, a.meta) < (b.ptr, b.meta)

Is it really? I would expect it to be implemented as a.ptr < b.ptr, ignoring the metadata. That seems like a more reasonable implementation?

This RFC proposes that this trait is automatically implemented for all types, so "manual" implementations of it cannot exist.

I see. That leaves no room for custom DST though. I am aware those are out-of-scope, but they should still be possible to add in a future-compatible way I assume.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect it to be implemented as a.ptr < b.ptr

fn is_before<T: ?Sized>(a: *const T, b: *const T) -> bool {
    a < b
}
#[test]
fn test() {
    let a = &[1,2,3];
    assert!(is_before(&a[..1], &a[..2])); // because 1 < 2
    assert!(!is_before(&a[..2], &a[..1])); // because 2 > 1
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That leaves no room for custom DST though.

Oh, that’s a good point. I suppose when custom DSTs are added, then we may need to freeze those bounds.

The fat-pointer comparison is always implemented as (a.ptr, a.meta) < (b.ptr, b.meta)

Indeed, it looks like fat-pointer comparison is represented specifically in MIR, and lowered to LLVM integer-comparison instructions here:

https://github.com/rust-lang/rust/blob/1e2a73867/src/librustc_codegen_ssa/mir/rvalue.rs#L630-L642

It doesn’t call PartialOrd methods. In fact it’s the other way around: PartialOrd uses the < operator which is special-cased for raw pointers to not call the trait:

https://github.com/rust-lang/rust/blob/1e2a73867/src/libcore/ptr/mod.rs#L2930

So maybe custom DSTs would also need language-level changes in this area.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_before is a very good example for why it needs to be comparable etc. But I'm missing why it needs to be Send and Sync (I mean if the DST type is Send+Sync then the meta must be Send+Sync, too. But if not it also doesn't need to have that bounds I think).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today we have impl<T: ?Sized> !Send for *mut T {} (similarly Sync) in libcore, without a T: Send bound. For a wide pointer to be Send, the metadata needs to be Send as well.

}

/// Pointers to types implementing this trait alias are “thin”:
///
/// ```rust
/// fn this_never_panics<T: std::ptr::Thin>() {
/// assert_eq!(std::mem::size_of::<&T>(), std::mem::size_of::<usize>())
/// }
/// ```
pub trait Thin = Pointee<Metadata=()>;
Copy link

@rustonaut rustonaut Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's better to actually introduce thin pointers instead of using this trait alias.

The problem is that this trait alias doesn't necessary play well with structs containing DSTs (which might again be "composed" DSTs). It should be possible to have a thin pointer to such a struct and get access to all of it's fields without needing any unsafe code. Except the DST field which would if referenced just return a thin reference to the DST. For many thinks it would make writing unsafe code around all kinds of DST types much easier if we actually had thin pointers.

For example we could have something like (scatch):

#[repr(transparent)]
#[lang = "thin_ptr"] //see below
#[fundamental] //maybe?? I forgot what that did
pub struct Thin<T: ?Sized>(T) //EDIT: Didn't work out see below, it's now `ThinRef<'a,T>` (as well as `ThinMut`,`ThinConstPtr`, `ThinMutPtr`

//override auto-impl
unsafe impl<T> Pointee for Thin<T> {
    type Metadata = ()
}

The reason why it's a lang item is because it has a bit special deref handling, mainly:

  • given struct Foo { f1: usize, f2: usize, dst: [u8] }
  • given let foo: &Thin<Foo> = ...,
  • then &foo.f1 is of type &usize
  • then &foo.f2 is of type &usize
  • then &foo.dst is of type &Thin<[u8]>

This (or something similar) has some benefits:

  • We have a direct representation for thin slices (with unknown length): &Thin<[T]>,&mut Thin<[T]>, *const Thin<[T]>, *mut Thin<[T]>
    • No need to use unsafe "hacks" to get thin pointer
    • This works with all potential future pointer types (e.g. *raw or similar) which we maybe might add
  • We can make from from_raw_parts more type save by accepting *const Thin<T> instead of an *const ()

By keeping the type instead of using *const () we should need less complex unsafe code to e.g. implement a ThinBox for trait objects shown above we would "just" get the thin pointer to the DST field and then pass it with the metadata to from_raw_parts.

Sure this doesn't really eliminate the data alignment fixing done in data_ptr it just moves it from the user written library into the implementation of from_raw_parts (a Thin<dyn Trait> would not be guaranteed to be correctly aligned as we simply can't know the alignment without knowing the type so the from_raw_parts impl. would need to do what data_ptr does).

Still moving this tricky bits from user libraries to the core/std library seems to be a good idea I think and the improved type safety around from_raw_parts would be good too as (I think) in many case we don't handle completely erased types but just the erasure of the exact metadata used.

Through we probably still would need to have a way to create a *const Thin<dyn Trait> from a *const () for some ffi aspects.

(Note that a &Thin<dyn Trait> is kinda pointless but a &Thin<DSTContainingDynTrait> isn't as you still could access other fields safely).

Copy link

@rustonaut rustonaut Sep 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An additional problem I found is that you can't have unaligned references in rust. It's UB.

So &Thin<dyn Trait> would need to "encapsulate" the unalignedness while not doing so for everything for which we know the alignment i.e. another reason why Thin would be a lang item. Keeping with how rust tends to do thinks it might be more appropriate to make Thin a "special" syntax. But then we would need to have thinks like &thin X, &mut thin X, or similar. But having a Thin<T> does just work very egonomical with the rest of rust.

(Solved below)

Another problem I found is that Thin<dyn Trait> or Thin<[u8]> as a unknown layout in the sense that:

  • We don't know the size for either.
  • We don't know the "internal" alignment/padding of the inner value in Thin<dyn Trait> (Thin<dyn Trait> is on the outside well but "unknown" aligned), as we can't move it in memory anyway due to now knowing the size we can just report an alignment of 1, but between the start of Thin<dyn Trait> and the data of dyn Trait there is some unknown padding to "fix" the alignment of the internal field.

But Layout::for_value(&Thin<[u8]>) does still need to return a value.

So either we return a "not so useful" value like alignment 1 and size of usize::MAX, which I don't like or we need to directly have Thin pointers instead of a Thin wrapper around the T: Pointee type.

Like ThinRef<'a,T>, ThinMut<'a,T>, ThinConstPtr, ThinMutPtr. (as e.g. #[...] struct ThinMutPtr<T>(ptr: *mut (), PhantomData<T>))

Which is slightly less elegant but would still work otherwise like described for Thin above.

This also would fix the inner alignment problem much easier as e.g. ThinRef<'a, T> could be implemented as #[...] struct ThinRef<'a, T> { ptr: *const (), marker: PhantomData<&'a T> } which works just fine for a not-necessary-correctly aligned ThinRef<'a, dyn Trait>.

The only real problem would be if we point to not-allocated memory. But we are guaranteed to point to allocated memory.
The only potential wrong thing is the alignment if the pointer is to a DST field. In which case we might point into the padding in front of the field instead of the field directly. Which is fine and expected (see the data_ptr example and what I wrote previously about it).

Copy link
Contributor Author

@SimonSapin SimonSapin Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's better to actually introduce thin pointers instead of using this trait alias.

The Thin trait alias proposed here describes an existing property of some existing types and the existing pointer/reference types to those. *const u32 is a thin pointer, *const str is not. Therefore u32: Thin holds, str: Thin does not.

The problem is that this trait alias doesn't necessary play well with structs containing DSTs (which might again be "composed" DSTs). It should be possible to have a thin pointer to such a struct

In today’s Rust a struct that contains a DST field is itself also a DST and therefore pointers to it are wide, not thin.

Introducing thin pointers to DSTs (structs or not) is an entirely new language feature that is not part of this RFC. If you feel that feature should be pursued at the language level, consider writing a separate RFC for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RFCs would enable a third-party library to implement a type like ThinBox that stores the metadata in the heap allocation and keeps the Box-like pointer thin.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have in the end 2 kinds of DSTs one which I refereed to as composed DST's which have any number of non DST fields and a tailing DST fields and what I will refer to as fundamental DSTs which are currently [T] and dyn Trait.

The problem is that without thin pointers to DSTs which allow access to non DST fields any library ThinBox implementation would need to either create a fat-pointer or use a lot of unsafe code to access non DST fields.

This makes it harder to implement all kind of custom DST types like e.g. slice with the length prefixing the slice or similar.
Mainly a lot of unsafe code is needed which with thin pointers to DST can be moved into the from_raw_parts implementation in core/std.

Also semantically the from_raw_parts functions(s) does take a thin pointer to an DST and meta data and creates a potential non-thin pointer. But instead we represent it as taking a untyped pointer and metadata making the functions more unsafe then they need to be IMHO (as you could put in a "wrong" thinned pointer.

While we seem to clearly disagree on this, thin pointers to DST are IMHO a fundamental part of the API around handling pointer metadata as they are the direct representation of fat-pointers without metadata.

But then a new RFC could also try to amend any non yet stabilized RFC so I guess putting this into a separate RFC is worth a try.

My goal is to long term make it possible to handle/write DST types without needing to know about all kinds of unsafe easy to get wrong without noticing it details like alignment fixing making custom DSTs available to any rust programmer.

I will see if I can find time to write a RFC for this in the next week or so.


/// Extract the metadata component of a pointer.
///
/// Values of type `*mut T`, `&T`, or `&mut T` can be passed directly to this function
/// as they implicitly coerce to `*const T`.
/// For example:
///
/// ```
/// assert_eq(std::ptr::metadata("foo"), 3_usize);
/// ```
///
/// Note that the data component of a (fat) pointer can be extracted by casting
/// to a (thin) pointer to any `Sized` type:
///
/// ```
/// # trait SomeTrait {}
/// # fn example(something: &SomeTrait) {
/// let object: &SomeTrait = something;
/// let data_ptr = object as *const SomeTrait as *const ();
/// # }
/// ```
pub fn metadata<T: ?Sized>(ptr: *const T) -> <T as Pointee>::Metadata {…}

impl<T: ?Sized> *const T {
pub fn from_raw_parts(data: *const (), meta: <T as Pointee>::Metadata) -> Self {…}
}

impl<T: ?Sized> *mut T {
pub fn from_raw_parts(data: *mut (), meta: <T as Pointee>::Metadata) -> Self {…}
}

impl<T: ?Sized> NonNull<T> {
pub fn from_raw_parts(data: NonNull<()>, meta: <T as Pointee>::Metadata) -> Self {
unsafe {
NonNull::new_unchecked(<*mut _>::from_raw_parts(data.as_ptr(), meta))
}
}
}

/// The metadata for a `dyn SomeTrait` trait object type.
///
/// It is a pointer to a vtable (virtual call table)
/// that represents all the necessary information
/// to manipulate the concrete type stored inside a trait object.
/// The vtable notably it contains:
///
/// * type size
/// * type alignment
/// * a pointer to the type’s `drop_in_place` impl (may be a no-op for plain-old-data)
/// * pointers to all the methods for the type’s implementation of the trait
///
/// Note that the first three are special because they’re necessary to allocate, drop,
/// and deallocate any trait object.
///
/// The layout of vtables is still unspecified, so this type is a more-type-safe
/// convenience for accessing those 3 special values. Note however that `DynMetadata` does
/// not actually know the trait it’s associated with, indicating that, at very least,
/// the location of `size`, `align`, and `drop_in_place` is identical for all
/// trait object vtables in a single program.
#[derive(Copy, Clone)]
pub struct DynMetadata {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential improvement I see with DynMetadata is that it does two levels of erasure: 1. the erasure from dyn and 2. the erasure from a specific dyn Trait to dyn *.

If we would keep the specific dyn Trait we might in the future add trait specific vtable based methods.

(This would mean struct DynMetadata<T:?Sized> e.g. DynMetadata<dyn Debug> )

But I'm not sure if this is worth the effort and added complexity.

Copy link

@rustonaut rustonaut Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was discussed before (when DynMetadata still was VTable.)

One of the major arguments against it was the potential increased compiler time. (Me guessing: Which given that this is a auto trait might be non neglibale?? Maybe?)

I wonder if we have a way to erase the exact type of DynMetadata having just impl DynMetadata instead. But I guess this would require (existential) impl Trait features we currently don't have to be doable and nicely usable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve added an unresolved question.

vtable_ptr: &'static (),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure &() is valid?

The empty tuple () is a zero-sized type. Which means it can not be allocated in the classical sense which also implies that a &() with an arbitrary address can't be created in a safe way in rust and as such using &() as such the compiler should be able to assume (it currently doesn't) that all &() have the same compiler predetermined address. Or did I miss something?

Copy link
Member

@RalfJung RalfJung Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all &() have the same compiler predetermined address

That is certainly not true today, and it would be hard to make true. When you take a reference to the first field of a ((), i32), you'll get a reference that actually points to that memory where the pair is stored.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved, see comments below it turns out responding by EMail doesn't work with "sub-threads".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is private field, so its exact type is an implementation detail that’s not too important for the purpose of this RFC. I’ve changed it to NonNull.

(I’m not sure whether making all &ZeroSizeType have a fixed address is realistic or what advantage that would bring, but that’s out of scope for this RFC.)

}

impl DynMetadata {
/// Returns the size of the type associated with this vtable.
pub fn size(self) -> usize { ... }

/// Returns the alignment of the type associated with this vtable.
pub fn align(self) -> usize { ... }

/// Returns the size and alignment together as a `Layout`
pub fn layout(self) -> alloc::Layout {
unsafe {
alloc::Layout::from_size_align_unchecked(self.size(), self.align())
}
}
}
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No drawbacks section...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up short trying to think of a reason not to do this at all (as opposed to doing it differently). Suggestions welcome.


# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

The status quo is that code (such as linked in [Motivation]) that requires this functionality
needs to transmute to and from `std::raw::TraitObject`
or a copy of it (to be compatible with Stable Rust).
Additionally, in cases where constructing the data pointer
requires knowing the alignment of the concrete type,
a dangling pointer such as `0x8000_0000_usize as *mut ()` needs to be created.
It is not clear whether `std::mem::align_of(&*ptr)` with `ptr: *const dyn SomeTrait`
is Undefined Behavior with a dangling data pointer.

A [previous iteration][2579] of this RFC proposed a `DynTrait`
that would only be implemented for trait objects like `dyn SomeTrait`.
There would be no `Metadata` associated type, `DynMetadata` was hard-coded in the trait.
In addition to being more general
and (hopefully) more compatible with future custom DSTs proposals,
this RFC resolves the question of what happens
if trait objects with super-fat pointers with multiple vtable pointers are ever added.
(Answer: they can use a different metadata type like `[DynMetadata; N]`.)

`DynMetadata` could be made generic with a type parameter for the trait object type that it describes.
This would avoid forcing that the size, alignment, and destruction pointers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that avoid forcing this? Can you elaborate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a type parameter, x.size() with x: &'static VTable necessarily executes the same code for any vtable. With a type parameter, x: &'static VTable<dyn Foo> and x: &'static VTable<dyn Bar> are different types and could execute different code. (For example, do table lookup with different offsets.) However, keeping the offset of size the same within all vtables might be desirable regardless of this API.

be in the same location (offset) for every vtable.
But keeping them in the same location is probaly desirable anyway to keep code size small.

[2579]: https://github.com/rust-lang/rfcs/pull/2579


# Prior art
[prior-art]: #prior-art

A previous [Custom Dynamically-Sized Types][cdst] RFC was postponed.
[Internals thread #6663][6663] took the same ideas
and was even more ambitious in being very general.
Except for `DynMetadata`’s methods, this RFC proposes a subset of what that thread did.

[cdst]: https://github.com/rust-lang/rfcs/pull/1524
[6663]: https://internals.rust-lang.org/t/pre-erfc-lets-fix-dsts/6663


# Unresolved questions
[unresolved-questions]: #unresolved-questions

* The name of `Pointee`. [Internals thread #6663][6663] used `Referent`.

* The location of `DynMetadata`. Is another module more appropriate than `std::ptr`?

* The name of `Thin`.
This name is short and sweet but `T: Thin` suggests that `T` itself is thin,
rather than pointers and references to `T`.

* The location of `Thin`. Better in `std::marker`?

* Should `Thin` be added as a supertrait of `Sized`?
Or could it ever make sense to have fat pointers to statically-sized types?

* Are there other generic standard library APIs like `ptr::null()`
that have an (implicit) `T: Sized` bound that unneccesarily excludes extern types?

* Should `<*mut _>::from_raw_parts` and friends be `unsafe fn`s?

* API design: free functions v.s. methods/constructors on `*mut _` and `*const _`?