Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Dynamically Sized Types for Rust #1524

Closed
wants to merge 11 commits into from
Closed

Custom Dynamically Sized Types for Rust #1524

wants to merge 11 commits into from

Conversation

strega-nil
Copy link

I believe this fixes #813, and is a nicer, and far more powerful, solution than #709.

@dgrunwald
Copy link
Contributor

&CStr was originally supposed to be compatible with *mut c_char for FFI purposes. (with CStr being an unsized type)
It's only a fat pointer because of Rust's current limitations regarding DSTs.

@strega-nil
Copy link
Author

@dgrunwald Ah, well, this RFC does support that.

struct CStr([libc::c_char]);
impl Dst for CStr {
    type Meta = ();
}
impl Sizeable for CStr {
    fn size_of_val(&self) -> usize {
        libc::strlen(self.as_ptr())
    }
}
impl CStr {
    pub unsafe fn from_ptr(p: *const libc::c_char) -> &CStr {
        // either
        std::mem::transmute::<*const libc::c_char, &CStr>(p)
        // or
        std::ptr::make_fat_ptr(p as *const (), ())
    }
}

@Aatch
Copy link
Contributor

Aatch commented Mar 3, 2016

I'm not sure about this. My main concern is around how actually useful this is compared to the complexity. There is a lot of code that assumes fat pointers are two words and two words only. This would completely change that. It's a very large increase in complexity.

Another issue is how this interacts with generics and unsizing. Currently Foo<[T; N]> will coerce into Foo<[T]> and everything will work properly because we know how to do that. There's also the case of people trying to use a generic Meta type, it's unclear how that might work.

Right now, my thought is that it's a nice feature (like most that get proposed), but not worth the increased complexity.

@strega-nil
Copy link
Author

@Aatch We need something like this. Look at the documentation for CStr:

Note: This operation is intended to be a 0-cost cast but it is currently implemented with an up-front calculation of the length of the string.

This is just in the standard library. Out there in real code, we have to deal with C interop, where stuff like

typedef struct _TOKEN_GROUPS {
  DWORD              GroupCount;
  SID_AND_ATTRIBUTES Groups[ANYSIZE_ARRAY];
} TOKEN_GROUPS, *PTOKEN_GROUPS;

has to be translated into

struct TokenGroups {
    group_count: u32,
    groups: [SID_AND_ATTRIBUTES; 0],
}

And traits like Index and IndexMut only return references, so what if you have a 3D world which wants to return 2D slices into the world? You can't. You just can't. You have to define getters which return a

struct 2DSlice<'a, T> {
    width: usize,
    height: usize,
    ptr: *const T,
    lifetime: PhantomData<&'a [T]>,
}
struct 2DSliceMut<'a, T> {
    width: usize,
    height: usize,
    ptr: *mut T,
    lifetime: PhantomData<&'a mut [T]>,
}

@strega-nil
Copy link
Author

@Aatch Unsizing isn't touched in this RFC. We really need integer generics to start doing stuff with unsizing. However, once we do get integer generics, it'll be quite easy to add that to this and we can have unsizing for everybody :)

@strega-nil
Copy link
Author

@Aatch A generic Meta type isn't a problem, because of this error: "the type parameter T is not constrained by the impl trait, self type, or predicates".

@reem
Copy link

reem commented Mar 3, 2016

@ubsan struct Whatever<T> { inner: Vec<T>> } impl<T> Whatever<T> Dst { type Meta = T; }?

@strega-nil
Copy link
Author

@reem 1) That's not an unsized type, 2) that wouldn't actually be a problem because you know what Meta is (as long as Whatever<T: Copy>)

@bluss
Copy link
Member

bluss commented Mar 3, 2016

@japaric Didn't you work on user-defined DSTs before? Do you have a link to your old effort?

@thepowersgang
Copy link
Contributor

How would indexing the inner [T] of a custom DST be handled? It doesn't have the element count metadata from a normal slice. Would it be an unsafe operation? Or would the normal impl [T] methods be removed/hidden by the compiler when the slice is accessed via a custom DST pointer.

Idea: Add fn element_count(&self) -> usize to the Dst trait

@strega-nil
Copy link
Author

@thepowersgang It's an unsafe operation. The inner [T] isn't "real"; it's only there to get the base pointer of the unsized data.

@eddyb
Copy link
Member

eddyb commented Mar 3, 2016

I originally proposed using [T; 0] for anything slice-like because it has the right alignment and can serve as a base pointer.

@Aatch Fat pointers being hardcoded to one pointer-sized piece of metadata is a mistake. There are important extensions to slice and traits we need to support and that hardcoding has to go (which is easier with the MIR).
cc @nikomatsakis about his desire to have TraitA+TraitB work as a trait object with 2 vtable pointers.

@strega-nil
Copy link
Author

I liked [T] because it seemed nicer; this should be up for bikeshedding.

@japaric
Copy link
Member

japaric commented Mar 3, 2016

Didn't you work on user-defined DSTs before? Do you have a link to your old effort?

Yes, I was working on it last year. I don't have time right now to analyze/comment on the design proposed in this RFC. So I'm just going to drop a few links here:

@Aatch
Copy link
Contributor

Aatch commented Mar 3, 2016

@thepowersgang actually brings up a good point, if you can get at the [T], then you'd expect to be able to work with it like a regular [T]. I'd prefer a way of indicating "this is unsized" that doesn't use [T] in that case.

Either use a [T;0] or something else that isn't already a DST. I don't think this really needs a marker for the "unsized portion" anyway, since a more trait-like DST would probably want to use (). The trait itself should be enough. It might be possible to treat an implementation of Dst as an implied impl !Sized, which would handle most of the issues I can think of.

@strega-nil
Copy link
Author

Hmm, yeah, I do think [T; 0] is better now that it's been discussed.

@strega-nil
Copy link
Author

Although, I'm thinking about how it would work with integer generics and Unsizing coercions... I think that [T] is the best way, because we would have some sort of way of saying "This PascalStr is a Sized PascalStr (which can be unsized)", or whatever, which would look something like

struct PascalStr {
    len: usize,
    buf: [u8],
}
// &PascalStr == ptr -> { len, [buf; len] }
// &PascalStr<N> == ptr -> { N, [buf; N] }
// PascalStr<N> == { N, [buf; N] }

@Aatch
Copy link
Contributor

Aatch commented Mar 3, 2016

@ubsan eh, I'm not sure that's really worth it here. My main concern is that the compiler now has to be aware of a lot more context when handling field access. The main advantage of [T; 0] (or a semantically-similar marker type) is that you can't safely access any data through a pointer to it, but it has all the relevant properties of T. I'm not sure I like the idea of &foo.bar being invalid just because foo implements Dst.

Unsizing is probably better handed via separate types. This is already the case, [T; N] might look similar to [T], but they're different types completely. It's more obvious when you consider trait object unsizing. I think the same is going to be true here, if you want unsizing, you're gonna have to have a separate Sized type like everybody else.

@strega-nil
Copy link
Author

@Aatch but they're not completely different types. You need them to be exactly the same type, in fact, behind the pointer, for unsizing coercions to work.

@Aatch
Copy link
Contributor

Aatch commented Mar 3, 2016

@ubsan no, they're the same representation, not the same type.

@eddyb
Copy link
Member

eddyb commented Mar 3, 2016

struct PascalStrBuffer<B> {
    len: usize,
    buf: B
}
type PascalStrArray<const N: usize> = PascalStrBuffer<[u8; N]>;
type PascalStr = PascalStrBuffer<[u8]>;

This might be an okay compromise of sorts.

I've also considered having a single optional integer parameter, which denotes the minimal contained size and defaults to 0 (which would mean [u8; 0], i.e. "any size").
The problem with that, though, is you lose the sized aspect for which you wanted the flexibility in the first place, so it's probably pointless.

@strega-nil
Copy link
Author

@Aatch fine; you can argue whether they're the same type or not. They are the same representation, definitely, and if you had to write separate Sized versions of each type, imagine the really big types:

#[repr(C)]
struct CLAIM_SECURITY_ATTRIBUTE_RELATIVE_V1 {
    Name: DWORD,
    ValueType: WORD,
    Reserved: WORD,
    Flags: DWORD,
    ValueCount: DWORD,
    union Values {
        pInt64: [DWORD],
        pUint64: [DWORD],
        ppString: [DWORD],
        pFqbn: [DWORD],
        pOctetString: [DWORD],
    }
}

imagine writing two of these types, and now for each of the Windows flexible array structs. It gets ridiculous.

@eddyb
Copy link
Member

eddyb commented Mar 3, 2016

It would be almost impossible to ensure the same layout for two different definitions though.
And I'm not sure how well #[repr(C)] would work with the DST stuff, although I reckon that it seems useful to have C interop defined for non-trait DSTs anyway.

@Aatch
Copy link
Contributor

Aatch commented Mar 3, 2016

In that case how about we restrict this to slice-like DSTs and use @thepowersgang's suggestion of having an element_count method?

@strega-nil
Copy link
Author

@Aatch that seems overly restrictive. For example, if you have a

struct 2dSlice<T>([T]);
impl<T> Dst for 2dSlice<T> {
    type Meta = (usize, usize);
    fn element_count(&self) -> usize {
         // (meta.0 * meta.1)
    }
}
impl<T> Sizeable for 2dSlice<T> {...}

fn main() { 
    let x = 2dSlice::<(), (std::usize::MAX, std::usize::MAX)>::new(); // not possible with `element_count`
}

@Aatch
Copy link
Contributor

Aatch commented Mar 3, 2016

@ubsan that's not possible anyway! You can't have something that large. And it's not like the same reasoning doesn't apply to the size_of_val case.

Oh, and something I just thought of: destructors/dropping. It's all well and good having something that only contains Copy` types, but what if you have something non-Copy in your custom DST?

@strega-nil
Copy link
Author

@Aatch yes you can :) look at the type inside!

You only have a pointer to the values, and aren't the owner. You could impl Drop to drop the inside values, and that would work for something like Box.

@Aatch
Copy link
Contributor

Aatch commented Mar 3, 2016

@ubsan "not the owner"? Then who is? [T] owns the Ts it contains, sure, if you have a &[T] you don't own the [T] itself, but that's not the same.

Do I now have to explicitly implement Drop to get the correct behaviour for my custom DST? I mean, fine, if that's the case, that's the case, but it should be in the RFC.

So far everything has been focused on custom strings, N-dimensional arrays of Copy types and N-dimensional slices from... somewhere. I'd like to see this actually all worked out, you may have your use-cases, but you need to look at how this can be used, not just how you would use it.

@strega-nil
Copy link
Author

I can no longer keep this open in good faith. Someone else can take it from me, but ... personal issues.

@nikomatsakis
Copy link
Contributor

So, for some time now I've been wanting to write a followup to my previous comment sharing my thoughts about this work. Even though @ubsan has closed the RFC, I figured I should leave it anyway. Hopefully it is clear from my previous comments that I think the ideas in here are good in a technical sense, and I'd like to do some experimentation with them. However, it's probably also clear that I feel somewhat hestitant about this. My concerns here are not technical, but rather all about prioritization and motiviation.

Rust is an ambitious language. Making it successul is a big job and there is a lot of work ahead for us. When I ask myself "how would Custom DST help Rust grow over the next year", I don't feel like I hear a convincing answer. The main use case that I see is helping make matrix libraries more ergonomic. Now, admittedly, this is somewhat compelling, but otoh lack of custom dst isn't, I don't think, blocking work on Matrix-like things, it's just a way to make them nicer to use. (I'd be interested to hear otherwise; are there any applications where the presence of custom DST would dramatically change the design in some way that it's not even worth doing it otherwise?)

Earlier I had thoughts, which I mentioned briefly in my comment, about the idea of trying to land this work in a very provisional way, to enable hacking. I am still interested in establishing a process for this. I think there are a number of things that might benefit from having a more "experimentation friendly" process (e.g., naked fns, inline asm, and interrupt ABIs come to mind).

But I haven't gone and publicly worked on defining such a process because I also have concerns: the fact is that any time there is significant churn in the codebase, it has the potential to take a lot of time for core developers who are trying to focus on other things. I think the work around ! has proven this point quite dramatically! By any measure, ! is not a core feature, but working out all the details on it and how it interacts with the unsafe code guidelines, match statements, and so forth has taken and will continue to take quite a lot of time and energy. (I think that much of the work around ! is work we would have to do anyway, as it happens, but perhaps with less urgency and more at a time of our choosing.) And even the most innocent and orthogonal of feature can cause ICEs and regressions which are a big distraction.

(On the other hand, there are some areas where we are adopting a looser process. We did land naked fns in a "provisional state", for example, and that has largely been a non-event. There is some experimentation around the embedded area, particularly with ABIs. But all of these are quite narrow and targeted compared to custom dst, which affects the code that handles every single reference.)

Before @ubsan closed this RFC, I was working up my courage to move to postpone (close) -- but I kept hesitating, because I think it's a good idea and I like it. Yet still I have the feeling that on balance it just isn't a good investment of resources. I regret that hesitating perhaps sends a more frustrating message than actually moving to close. So @ubsan, I'm sorry about that.

@AtheMathmo
Copy link

@nikomatsakis Thank you for your comment.

I agree with what you're saying - there are more pressing things to work on right now. I also wanted to comment on the following:

Now, admittedly, this is somewhat compelling, but otoh lack of custom dst isn't, I don't think, blocking work on Matrix-like things

I think this is an accurate statement. There are a few things that I cannot do without custom DSTs, some of which don't even have ugly workarounds, but these are only mechanisms to tidy up the code and make things easier for users. Here are some example issues that we had to put off:

AtheMathmo/rulinalg#149 (I think custom DSTs help this)
AtheMathmo/rulinalg#14

It would be really nice to parallel the Vec<T>-&[T] relationship in the core library but Matrix-like things are still workable without this (to me).

@nikomatsakis
Copy link
Contributor

@AtheMathmo interestingly, I was planning to come and post a comment here anyhow just to mention I myself ran into a use-case for this just yesterday when hacking on my NLL prototype, specifically the bitset code. Really it's just the "2-D matrix use case" in disguise, but in trying to play with it, I did appreciate that there are things that are hard to do "just with methods". Though maybe I didn't find the best answer.

In particular, I have a BitSet type that supports, for each graph node, a bit set of N bits. I wanted the ability to also create a BitBuf (one row of N bits) and a BitSlice (a view into either BitSet or BitBuf). This all worked ok but it seemed like I had to copy-and-paste various bits of logic and couldn't easily eliminate the redundancies. I remember thinking that if I could have Deref and DerefMut impls I would have wound up with less redundancy. Sorry not very specific. I have to revisit it now and try to re-examine.

In any case, I definitely think that this conversation -- to what extent is this "just sugar" vs unlocking some fundamental capability -- is the critical one to deciding how to prioritize this change.

@nikomatsakis
Copy link
Contributor

Another thing that I was thinking about was this question: is this the sort of capability that, when added, would mean that the interface for existing libraries would want to be completely reworked to take advantage of it?

@AtheMathmo
Copy link

is this the sort of capability that, when added, would mean that the interface for existing libraries would want to be completely reworked to take advantage of it?

I cannot speak for the others who are looking to make use of this feature, but for me I would be making significant changes.

Right now we have the following types in rulinalg: Matrix, MatrixSlice, MatrixSliceMut. And we have BaseMatrix and BaseMatrixMut traits to let us be generic over these. If we had custom DSTs then we could use Deref and DerefMut and (probably) wouldn't want these traits any more - we just implement everything for the slice types and let Deref do the work.

Another particularly difficult area is with operator overloading. Right now we have an explosion of overloading implementations for all combinations of matrix types. I think that custom DSTs could help here too but I haven't explored the idea too much. (Hoping Deref lets us cover all &MatType op &OtherMatType etc. ops).

I'll try to spend some time thinking about whether there is anything else that I can add to this discussion. I've got quite used to working around the lack of custom DSTs and I don't think there are many features missing for the user. There are however quite a few ways we could make things nicer - not needing to import traits everywhere, being able to have &MatrixSlice return types in their code and with Index etc., and others.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Feb 22, 2017

@AtheMathmo it seems like it would be a fruitful exercise to try and use e.g. the design from my comment and sketch out how the types in your library would work (in contrast to now) and throw it in a gist. (I'd like to see how you envision the operator overloading working, too.) Apologies if you've already done it and I missed it.

(The design from the RFC seems mostly fine too, I don't recall there being any major differences.)

@AtheMathmo
Copy link

@nikomatsakis that sounds like a good idea!

I'm a little busy for the rest of this week but hopefully will be able to sketch something out early next week.

@nikomatsakis
Copy link
Contributor

@AtheMathmo great, I'd love to see it. Since this RFC is closed, and it feels like we're still a bit in "design discussion" here, I'm going to open a thread on internals to carry on the conversation:

https://internals.rust-lang.org/t/custom-dst-discussion/4842

@ghost
Copy link

ghost commented Jun 17, 2017

For some more motivation, here's what I need in Crossbeam:

  1. Heap-allocated Array<T>. The head consists of an usize (length), and the tail is [T].
  2. Heap-allocated SkiplistNode<K, V>. The head consists of (K, V, usize) (key, value, height), and the tail is [AtomicPtr<SkiplistNode<K, V>>].
  3. Other similar situations with heap-allocated objects. The head is some arbitrary data and the length. The tail is an array of that length.

Pointers to such objects must be thin because atomically manipulating multiple words is a pain to do portably and performantly. Some support from the language for such use of DSTs would be great to have.

@SimonSapin
Copy link
Contributor

Heap-allocating (HeaderIncludingLength, [Item]) (possibly with not all items initialized) is a common pattern that could use some more built-in support. But does it need to be language-level with DSTs, or could it be a library-level?

@bergus
Copy link

bergus commented Jun 14, 2018

@SimonSapin Looking for exactly that as well. Have you found a suitable workaround yet?

@SimonSapin
Copy link
Contributor

@bergus Today, the best you can do is manual memory layout computation, raw allocation, and unsafe initialization. See for example:

https://github.com/rust-lang/rust/blob/1.26.2/src/liballoc/rc.rs#L714-L727
https://github.com/rust-lang/rust/blob/1.26.2/src/liballoc/rc.rs#L664-L680

@eddyb
Copy link
Member

eddyb commented Jun 15, 2018

There's also this pattern for a slice with a header, and !Sized thin pointers:
https://github.com/rust-lang/rust/blob/5205ae8bdc2991eecf3bfbb58ed8f56c0673e738/src/librustc/ty/mod.rs#L603-L706

@bergus
Copy link

bergus commented Jun 15, 2018

@SimonSapin Thanks
@eddyb That's pretty much what I ended up with (needed a bit until I figured out #[repr(C)] was necessary…) though without the OpaqueSliceContents. I had seen extern type mentioned in #813, but the guide explanation from the RFC was not exactly helpful. Does this basically prevent us from allocating the dynamically-sized slice on the stack and from statically determining its size, so that we are forced to use it only through pointers?

@eddyb
Copy link
Member

eddyb commented Jun 15, 2018

Does this basically prevent us from allocating the dynamically-sized slice on the stack and from statically determining its size, so that we are forced to use it only through pointers?

Yes, and if you need mutable references, it prevents someone from (mis-)using mem::swap.

@velvia
Copy link

velvia commented Jan 6, 2019

Hi folks, I'm new to the Rust community, but was really hoping to implement a cross-platform library for high performance compressed vectors using this DST feature. These vectors would have, for compatibility reasons, a u32 size header and other header fields followed by something like data: [u8] (or u64, etc.). Kind of like what @stjepang and @SimonSapin are asking for. Having to work around this using unsafe thin pointer coercion and tricks like [u8; 0] as markers kind of is discouraging me a bit from porting it from the JVM (though there are certainly other reasons to port it, such as access to SIMD and things like that).

@eddyb
Copy link
Member

eddyb commented Jan 30, 2019

@velvia This was superseded by #2594.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC: truly unsized types