Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for thin_box #92791

Open
2 of 4 tasks
yaahc opened this issue Jan 11, 2022 · 33 comments
Open
2 of 4 tasks

Tracking Issue for thin_box #92791

yaahc opened this issue Jan 11, 2022 · 33 comments
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@yaahc
Copy link
Member

yaahc commented Jan 11, 2022

Feature gate: #![feature(thin_box)]

This is a tracking issue for ThinBox for making heap allocated DSTs which only occupy 1 pointer on the stack.

Public API

Note: This API is expected to grow to match Box as necessary over time. The initial API is the minimum needed for basic usage for error handling.

pub struct ThinBox<T: ?Sized> { /* ... */ }

impl<T> ThinBox<T> {
    pub fn new(value: T) -> Self;
}

impl<Dyn: ?Sized> ThinBox<Dyn> {
    pub fn new_unsize<T>(value: T) -> Self
    where
        T: Unsize<Dyn>;
}

impl<T: ?Sized + Debug> Debug for ThinBox<T> { /* ... */ }
impl<T: ?Sized + Display> Display for ThinBox<T> { /* ... */ }
impl<T: ?Sized + Error> Error for ThinBox<T> { /* ... */ }
impl<T: ?Sized> Deref for ThinBox<T> { type Target = T; /* ... */ }
impl<T: ?Sized> DerefMut for ThinBox<T> { /* ... */ }
impl<T: ?Sized> Drop for ThinBox<T> { /* ... */ }

unsafe impl<T: ?Sized + Send> Send for ThinBox<T> {}
unsafe impl<T: ?Sized + Sync> Sync for ThinBox<T> {}

Steps / History

Unresolved Questions

  • Can ThinBox be made covariant? (comment)
  • Is const_allocate fleshed out and tested enough to justify it being used in thinbox's impl? Does T-lang agree we'll always have some way to procedurally generate allocations in const eval? (in contrast to declaratively via constants, statics or anonymous promoteds)
@yaahc yaahc added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Jan 11, 2022
@yaahc
Copy link
Member Author

yaahc commented Jan 15, 2022

Example PR using ThinBox as a substitute for Box<dyn Error + ...>: https://github.com/yaahc/pgx/pull/1/files

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Apr 9, 2022
@thomcc
Copy link
Member

thomcc commented Apr 21, 2022

One unfortunate thing about ThinBox is that it's invariant rather than covariant, like Box. For example, this playground fails to compile: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=f6348153d6555716ce86926ac1928a4e.

This happens because of the <T as Pointee>::Metadata projection that is used internally. Projections like these currently require invariance. I'm not sure if there's a way to fix it, but it's unfortunate to differ from box in this way IMO. Arguably, this is more of an issue with the Pointee trait than with ThinBox itself, though, since it's not like ThinBox can fix it -- it also is less likely to matter in most of ThinBox's proposed use cases (although it would be nice if the version from the stdlib could somehow do better here).

@thomcc
Copy link
Member

thomcc commented Apr 21, 2022

A use case that's slightly more realistic for ThinBox is https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=e2659853940505c62b73b21deb079195:

#![feature(thin_box)]
use std::boxed::ThinBox;

pub trait Foo {}

pub fn add_foo(foos: &mut Vec<ThinBox<dyn Foo + '_>>, foo: ThinBox<dyn Foo + 'static>) {
    foos.push(foo);
}

This fails with a lifetime error, because '_ is not 'static. This is actually safe usage, though, and with Box it works fine: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=dc3e919e611f4e91ac0567db47ac4d92.

@cuviper
Copy link
Member

cuviper commented Jun 27, 2022

FWIW, I also noted the invariance of <T as Pointee>::Metadata in #81513 (comment).

@cuviper
Copy link
Member

cuviper commented Jun 27, 2022

Send and Sync are also missing, compared to Box<T>, because NonNull<_> blocks them by default. I think it should be perfectly fine to justify manual impls for ThinBox<T> though.

@thomcc
Copy link
Member

thomcc commented Jun 27, 2022

Oh, it should definitely be Send and Sync.

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Jun 28, 2022
Implement `Send` and `Sync` for `ThinBox<T>`

Just like `Box<T>`, `ThinBox<T>` owns its data on the heap, so it should
implement `Send` and `Sync` when `T` does.

This extends tracking issue rust-lang#92791.
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Jun 28, 2022
Implement `Send` and `Sync` for `ThinBox<T>`

Just like `Box<T>`, `ThinBox<T>` owns its data on the heap, so it should
implement `Send` and `Sync` when `T` does.

This extends tracking issue rust-lang#92791.
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Jul 1, 2022
Make `ThinBox<T>` covariant in `T`

Just like `Box<T>`, we want `ThinBox<T>` to be covariant in `T`, but the
projection in `WithHeader<<T as Pointee>::Metadata>` was making it
invariant. This is now hidden as `WithOpaqueHeader`, which we type-cast
whenever the real `WithHeader<H>` type is needed.

Fixes the problem noted in <rust-lang#92791 (comment)>.
@Ultra-Gato
Copy link

Is it not supposed to implement Eq? because Box does.

@Ultra-Gato
Copy link

Ultra-Gato commented Sep 1, 2022

I was wondering, Because I get this error:

the trait Eq is not implemented for ThinBox<Vec<Item>>

@cuviper
Copy link
Member

cuviper commented Sep 1, 2022

Eq is not implemented yet, but it seems reasonable to add, and eventually mirror most of Box's API. For now, I think the initial implementation was intentionally kept light.

@cuviper
Copy link
Member

cuviper commented Sep 1, 2022

We should think at least about the fundamental traits before stabilization -- Fn, FnMut, FnOnce -- because they can't be added later. (That's why Rc and Arc don't implement them -- see #71570.)

Maybe ThinBox should also be a fundamental type like Box.

@cuviper
Copy link
Member

cuviper commented Sep 1, 2022

Another auto-trait consideration is Unpin -- Box<T> is unconditional, but ThinBox<T> is currently auto for T: Unpin. There's a big note about this on Box, so I'm not really sure whether ThinBox should get that same exception.

/* Nota bene
*
* We could have chosen not to add this impl, and instead have written a
* function of Pin<Box<T>> to Pin<T>. Such a function would not be sound,
* because Box<T> implements Unpin even when T does not, as a result of
* this impl.
*
* We chose this API instead of the alternative for a few reasons:
* - Logically, it is helpful to understand pinning in regard to the
* memory region being pointed to. For this reason none of the
* standard library pointer types support projecting through a pin
* (Box<T> is the only pointer type in std for which this would be
* safe.)
* - It is in practice very useful to have Box<T> be unconditionally
* Unpin because of trait objects, for which the structural auto
* trait functionality does not apply (e.g., Box<dyn Foo> would
* otherwise not be Unpin).
*
* Another type with the same semantics as Box but only a conditional
* implementation of `Unpin` (where `T: Unpin`) would be valid/safe, and
* could have a method to project a Pin<T> from it.
*/
#[stable(feature = "pin", since = "1.33.0")]
#[rustc_const_unstable(feature = "const_box", issue = "92521")]
impl<T: ?Sized, A: Allocator> const Unpin for Box<T, A> where A: 'static {}

@Ultra-Gato
Copy link

I always thought that Unpin was automatically implemented on all types. Does unconditional simply mean you can't opt out of Unpin?

@cuviper
Copy link
Member

cuviper commented Sep 1, 2022

The unconditional impl on Box<T> means it does not require T: Unpin, while ThinBox<T> does require that.

workingjubilee pushed a commit to tcdi/postgrestd that referenced this issue Sep 15, 2022
Make `ThinBox<T>` covariant in `T`

Just like `Box<T>`, we want `ThinBox<T>` to be covariant in `T`, but the
projection in `WithHeader<<T as Pointee>::Metadata>` was making it
invariant. This is now hidden as `WithOpaqueHeader`, which we type-cast
whenever the real `WithHeader<H>` type is needed.

Fixes the problem noted in <rust-lang/rust#92791 (comment)>.
@ZhennanWu
Copy link

As a side question, is there any plan for a similar ThinArc? This would enable concurrent libraries to use new patterns like arc_swap::ArcSwapAny<ThinArc<dyn Trait>> due to atomicity of thin pointers.

@dead-claudia
Copy link

dead-claudia commented Jan 21, 2023

Another auto-trait consideration is Unpin -- Box<T> is unconditional, but ThinBox<T> is currently auto for T: Unpin. There's a big note about this on Box, so I'm not really sure whether ThinBox should get that same exception.

@cuviper The pointer to the T in both Box<T> and ThinBox<T> only differ by a constant offset (specifically, std::mem::size_of::<<T as Pointee>::Metadata>()). Since Box<T> implements Unpin and, for all constant n, a + n points to a stable address if and only if a does, ThinBox<T> could likewise safely implement it regardless of whether T itself is Unpin.

<T as Pointee>::Metadata would still need to be Unpin, but the Pointee trait already separately requires Metadata associated types to implement Unpin, so that's not a concern.

@cuviper
Copy link
Member

cuviper commented Jan 21, 2023

@dead-claudia It can be done safely, but there's a trade-off. Referring to the code comment that I linked, we can either have unconditional Unpin or pin projection, but not both.

  *  Another type with the same semantics as Box but only a conditional 
  *  implementation of `Unpin` (where `T: Unpin`) would be valid/safe, and 
  *  could have a method to project a Pin<T> from it. 

So the question is whether ThinBox should be "another type" like that, or should it mimic Box exactly.

@dead-claudia
Copy link

@cuviper Could you explain the difference between the two? I'd think that Pin::new(&mut my_box) would work basically the same as fn project(some_box: Pin<MyBox<T>>) -> Pin<&mut T>? Or am I missing something here, like it really being about whether it should be !Unpin as a lint similar to pointers being !Send and !Sync?

@cuviper
Copy link
Member

cuviper commented Jan 24, 2023

I still get confused about pinning myself, but hopefully this section of the docs will help you:
https://doc.rust-lang.org/std/pin/index.html#projections-and-structural-pinning

That said, I think we probably do want unconditional Unpin for ThinBox<T> (and not projection) for this same justification about trait objects, since unsized types are pretty much the whole point of ThinBox.

 - It is in practice very useful to have Box<T> be unconditionally 
   Unpin because of trait objects, for which the structural auto 
   trait functionality does not apply (e.g., Box<dyn Foo> would 
   otherwise not be Unpin). 

@tleibert
Copy link
Contributor

tleibert commented Apr 18, 2023

It would be very nice if this structure could support the fallible allocation operation API's provided by Box in #32838. I'll try to whip up an example PR.

Example diff as promised:
#110483

@clarfonthey
Copy link
Contributor

clarfonthey commented Nov 3, 2023

So, I'm wondering if there would be some way to recontextualize this in terms of an unsized struct instead of a thin box as-is. In other words, whether it would be possible to make ThinBox<T> simply Box<Thin<T>>, where Thin<T> is an unsized type that implements Deref<Target = T>.

My justification for this is for use cases such as unsized_vec, where you would want a nicer API to encapsulate all the logic of computing a combined layout for metadata + value, and perhaps the ability to coerce a pointer into a Thin<T> and then use that to compute its size.

I was also thinking of this from the perspective of creating an unsized_vec type data structure, except for storing trees inside a single buffer.

If this seems even remotely doable, I'm willing to sketch out a potential API for it and maybe a PR for it, but I don't know 100% what the reasoning for specifically going for a ThinBox here was.

@yaahc
Copy link
Member Author

yaahc commented Nov 3, 2023

My original motivation for working on this was use cases in error handling. I wanted to have the same thin pointer benefits that anyhow and eyre provide but generalized, and I was hoping that ThinBox could be made into a preferred error type over Box<dyn Error> and have it actually implement Error for ThinBox<dyn Error> rather than implementing From<E> where E: Error while still having ? work on all error types with some clever usage of FromResidual instead of From and a custom Result type.

In the end I gave up on it because I didn't think the UX would be adequate to justify all the extra types people would have to juggle and think about, especially with the whole thing requiring a second Result type to work. Feel free to cut it up however you want to find value in any of the work. I think the BiPtr abstraction internal to the implementation is particularly cool and might be close to the Thin type you're imagining already. Back when I wrote this I wanted to go back and update the Rc implementation to use the same stuff internally but never got around to it.

@clarfonthey
Copy link
Contributor

Makes sense! I have other stuff I want to get to first, so, I don't mind if someone ends up getting to this before me. Going to just dump my thoughts here on what I was thinking of for a design, although I haven't actually hashed it out much.

The main thing that came to mind for this is that we'd effectively end up having to resolve the DST issues that are preventing us from making CStr not have any pointer metadata: the size would have to be knowable from the value itself, not the metadata, or potentially not knowable at all in some cases.

I was thinking that it might be useful to have effectively three versions of the thin wrapper: one that's equivalent to the current implementation, but another where the metadata is stored after the value, and one where the metadata is stored redundantly on both sides. The main reasoning for this is that the "unsized slice" case where you're storing multiple of these objects in one buffer might want the metadata at the end to allow backwards iteration.

One particular case I was looking into is the ability to have a "type-safe" version of a tree storable in a buffer, primarily for parsing prefix/postfix arithmetic expressions. For prefix expressions, you'd have the metadata at the beginning of operands, whereas for postfix expressions, it would be at the end. And a more general "unsized vec" might just have them on both sides to allow easy double-sided iteration.

For the case where metadata is only at the end, it's an example of a DST whose size isn't knowable from a pointer alone. If you know the size of it already, you can find the metadata and reconstruct a fat pointer, but you need the size or a pointer past the end of it in order to do that. If it's at the beginning, you can just read the metadata to determine the layout.

@kmdreko
Copy link

kmdreko commented Dec 21, 2023

Just chiming in here since this came up as a potential solution to a problem but it seems there's no real way to create a ThinBox<str> should the user want it. Likewise only being able to create a ThinBox<[T]> from an array is pretty limiting compared to the ubiquity of Vec.

Can we get a From<String> for ThinBox<str> implementation and/or From<Vec<T>> for ThinBox<[T]> by chance? Those conversions are available for Box (among many others) and would be nice to have.

@clarfonthey
Copy link
Contributor

Presumably rust-lang/rfcs#3536 will help provide a solid path for this.

@NoahJelen
Copy link

What is the difference between this and a normal Box?

@Aandreba
Copy link

What is the difference between this and a normal Box?

A Box will store unsized types (slices, str, object traits, etc.) as a pointer to the data and some metadata related to the type (slices and strs store their length, and object traits store a pointer to a table with the function pointers to the trait's methods).

A ThinBox would always only store a single pointer on the stack, by writing the metadata that a regular Box would have on the stack alongside the data it points to.

This would make it easier to pass pointers to unsized types through FFI boundries.

Example of Box<str> v. ThinBox<str>
Box<str>
[ Metadata ][ Pointer to string data ]

ThinBox<str>
[ Pointer to data ] --> [ Metadata, string data ]

@orowith2os
Copy link

Is it guaranteed what the layout of Metadata will be? Would it be feasible to make a ThinBox FFI-safe?

If we can guarantee:

  • the final item in Metadata is a usize containing the length
  • length is immediately next to the first T
  • a ThinBox<[T]> is repr(transparent) over a *const ()
  • the pointer contained inside a ThinBox<[T]> points to the first T

a ThinBox<[T]> is safe to pass through FFI, and the FFI users can just offset backwards by a usize to get a pointer to the length. however, they couldn't free the ThinBox<[T]>, as they'd need to offset to the beginning of the entire buffer, where malloc would store the size of the buffer to free later on, unless, of course, Metadata has a guaranteed size and (semi-optionally) layout.

@clarfonthey
Copy link
Contributor

clarfonthey commented Jan 18, 2024

If we can guarantee:

  • the final item in Metadata is a usize containing the length

I was under the impression this is exactly the kind of thing we didn't want to have, since we want a path forward for types which are unsized but not length-prefixed, like CStr, and we don't want to guarantee exactly where the size field is stored in stuff like vtables.

@orowith2os
Copy link

In that case, perhaps it could be made optional somehow - the problem there would be how we make the length optional. It's valid to have a slice with a length of 0, and a pointer that's not NULL. You can also make a Box<[T]> from a Vec<T> with a length of 0. If we allocate an additional byte for a "does this type have an immediate length?", we can reference that instead.

@dead-claudia
Copy link

In that case, perhaps it could be made optional somehow - the problem there would be how we make the length optional. It's valid to have a slice with a length of 0, and a pointer that's not NULL. You can also make a Box<[T]> from a Vec<T> with a length of 0. If we allocate an additional byte for a "does this type have an immediate length?", we can reference that instead.

Worth noting that, for fat empty slices, the slice pointer is in practice still never null, to ensure that Some(&[]) and None can be told apart without needing an extra discriminant byte. Instead, the empty slice pointer is usually NonNull::<T>::dangling(), but not always, and can be just about any nom-zero value.

Not that you should ever depend on this for correctness, mind you. This is just the current implementation to my knowledge.

@clarfonthey
Copy link
Contributor

clarfonthey commented Jan 20, 2024

You can depend on it being non-zero because &T is guaranteed to non-zero, but another reason why you can't depend on an empty slice being equivalent to dangling is because you can easily create an empty slice from a non-empty slice, e.g. s[..0]. This is also what you would get from calling as_slice on an exhausted slice::Iter; the empty slice's pointer would be somewhere between the start and past-end of the valid slice, depending on the order of iteration.

Empty boxes might be able to optimise differently since you generally don't want to reference a valid allocation if it's empty, although there are also other cases that depend on pointer-equality being different even for empty allocations.

I would just generally not rely on pointer checking to determine size. The main case of a no-metadata DST is CStr, which is never an empty allocation; an empty string is [0]. Even if we statically allocate one "empty string" that we reuse, that would still be a valid pointer to a valid buffer, just the same one.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Feb 11, 2024
Create try_new function for ThinBox

The `allocator_api` feature has proven very useful in my work in the FreeBSD kernel. I've found a few places where a `ThinBox` rust-lang#92791 would be useful, but it must be able to be fallibly allocated for it to be used in the kernel.

This PR proposes a change to add such a constructor for ThinBox.

ACP: rust-lang/libs-team#213
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Feb 12, 2024
Rollup merge of rust-lang#110483 - tleibert:thin-box-try-new, r=dtolnay

Create try_new function for ThinBox

The `allocator_api` feature has proven very useful in my work in the FreeBSD kernel. I've found a few places where a `ThinBox` rust-lang#92791 would be useful, but it must be able to be fallibly allocated for it to be used in the kernel.

This PR proposes a change to add such a constructor for ThinBox.

ACP: rust-lang/libs-team#213
@GKFX
Copy link
Contributor

GKFX commented Jun 2, 2024

More ways of making ThinBox slices would likely be helpful - e.g.:

  • a FromIterator implementation
  • a way of copying or cloning an existing slice
  • a ThinBox::new_uninit_slice(n: usize) for only allocating the slice

@dead-claudia
Copy link

dead-claudia commented Jun 3, 2024

Now that I'm thinking about this more, I'm starting to like the basic idea in #92791 (comment) more and more. Here's my intuition:

  • Unsizing types to fat pointers involves pairing a reference to the original value with generated metadata for the final field.
  • Unsizing types to thin pointers involves storing the generated metadata before the type, and then returning a pointer at the metadata's offset. It's different from the default, and the type system currently cannot represent changing unsizing placement like that.
  • In both cases, it's separated from allocation, and heap allocators do not need any knowledge of whether the pointer is thin or fat, only that it's a pointer.

I do feel it can't be fully done in userland, though, and it feels more like a dyn Trait than a Pin<P> in the way it modifies unsizing, rather than merely decorating it. Makes me feel that &thin T, Box<thin T>, and so on would provide a better, cleaner interface. For a thin trait object, you'd do &thin dyn Trait, and for a thin slice reference, it'd be &thin [Foo]. The compiler could just reserve that storage on the stack transparently if a &thin T is requested. And for allocators, std::mem::size_of_val(&thin value) returns the metadata size plus std::mem::size_of_val(&value).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests