-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Representation of Rust references (&T
, &mut T
) and raw pointers (*const T,
*mut T`)
#16
Comments
&T
, &mut T
)&T
, &mut T
) and raw pointers (*const T,
*mut T`)
This is related to Stacked Borrows: References might, at least abstractly, have extra metadata attached to them, making transmutes to/from raw pointers potentially non-trivial to handle. |
|
Given that we are passing |
Even if we got rid of |
I agree that the low bits of a pointer, but not a reference, can be used for storage. That said, I would like it to be possible to write a type which does store values in the low bits that has an interface to safe Rust basically equivalent in functionality to a |
I do agree that we could teach the compiler to internally treat the pointer values of references as having holes. |
This is trickier than it sounds. The problem is that you can create a In general, to do more advanced optimizations of this kind, it would be useful to be able to mark an enum as not permitting "ref bindings", which sidesteps the problem. |
Does the " I ran into the following example in the wild today: pub struct Wrapper([i32]);
impl Wrapper {
pub fn new(x: &[i32]) -> &Self { unsafe { mem::transmute(x) } }
} Somebody suggested that making EDIT: I get the feeling that this is only for references to sized types. Is there a different issue for the layout of trait objects, DSTs, etc. or is it fine to discuss that here? |
I think that |
@alercah, @ubsan explained on discord that pointers to unsized types are (or can be) larger than normal pointers (I did not know this). For example, in a pointer to a slice, like So in the example above, the following should work: #[repr(transparent)]
pub struct Wrapper([i32]);
impl Wrapper {
pub fn new(x: &[i32]) -> &Self {
unsafe {
&*(x as *const [i32] as *const Self)
}
}
} Since there is no need to use the |
More generally, I think it'd be interesting to discuss what we want to say about fat pointers. For example (and I am not sure whether that is more relevant for data layout or validity discussions), I think that even raw pointers must have valid metadata: For slices, it must be an integer (not uninitialized memory) such that the slice is not too big, and for trait objects it must be an actual vtable. |
From experience, it seems that it'd be really nice if we defined the ABI for pointers to DSTs. I would love it if we had that pointers to trait objects were exactly struct TraitObject {
void* object;
TraitVtable* vt;
}; and pointers to slice were exactly struct Slice {
T* ptr;
uintptr_t len;
} for purposes of ABI ( |
I have no thoughts on this issue, but there is a lot of a background and previous discussions about this. It took me a while to go through it, the following is in more or less chronological form:
From the point-of-view of the unsafe code guidelines, it is maybe necessary to know where the custom DST story is going to be able to state what can / cannot be guaranteed about the layout of DSTs in general. It would be a bad idea to not guarantee anything at all about their layout, but it would be equally bad to guarantee something that prevents us from getting good custom DSTs in the future. It might be worth it to re-evaluate whether the priority/cost balance for the custom DSTs has changed, and whether it is something that might be worth prioritizing after Rust2018 is released, and maybe incorporate the unsafe code guidelines "motivation" into a revised version of the RFC. |
I would prefer if "fat pointers" were just "sugar" for We can fix the rest once we start discussing validity. E.g., |
@gnzlbg That also seems like a very reasonable approach. @nikomatsakis expressed similar preference. |
For fat pointers, do we want to mandate a particular representation, or are we good with just saying they're two words long, and the first word is a thin pointer? (so not specifying anything about the rest other than it being one word) |
I think with custom DSTs the rest might be differently sized even? |
@RalfJung but if we can't even say anything about the size, then we're back to the (what I believe) is the current spec, that fat pointers guarantee that the first word is a thin pointer, and nothing else. |
Seems fair to me? |
Another question... what do we want to say about |
@asajeffrey
#[repr(C)]
struct *[T] {
ptr: *(),
meta: usize,
}
#[repr(C)]
struct *dyn Trait {
ptr: *(),
meta: *(),
} |
@ubsan I'm not sure what you mean by "the same representation". Isn't Annoyingly, I added a note about (e.g.) |
What do you mean? References to ZSTs are still pointer-sized and ZSTs' alignment requirements matter.
There's no such thing. Types have one repr, period, and for tuples that is not the same as a repr(C) struct. |
@rkruppe Depends what we want to say about |
For starters, we are talking about representation here, not validity, and that &ZST is pointer sized is not just the overwhelming status quo, it's also recently been explicitly confirmed in the closing of RFC PR 2040. Moreover, even for the validity question, allowing only a single address as reference to ZSTs is not tenable IMO due to how much existing unsafe code uses ZSTs in place of |
Yes, it's not whether it's pointer sized that's the issue, it's whether any non-zero value is a valid representation of a |
@asajeffrey object representation and whether specific representations are valid, are very different concerns. |
@ubsan Ah, perhaps we have a terminological issue here, about what is part of "validity" vs what is part of "representation". I am including things like being non-zero as part of representation, and you are thinking of it as part of validity? |
Correct. Non-zero is not part of representation, at least how we define representation - representation is the list of bits which map to a value; i.e., For example, |
@ubsan: I'm a bit lost with that defn, if I was thinking of representation as being the set |
@asajeffrey transitivity and reflexivity both make sense when α ≠ β. You create a morphism reflexive: transitive: symmetric: Also, that's not the common way of defining representation. That's the set of values. For example: The equivalence ~ between
since Note that this definition is unnecessarily strict, since we can't define a mapping from |
@ubsan hmm, that's not the definition of type-indexed equivalence that I'm used to, which is either the groupoid model, or more recently HoTT, but we are now getting seriously off-topic! |
I updated the draft at https://github.com/asajeffrey/unsafe-code-guidelines/blob/repr-pointers/reference/src/representation/pointers.md, to remove the specialness of ZST, and to add some questions:
|
|
An example of the latter is again,
However, if we like, we can then say that the representation of |
|
|
References are always properly aligned independently of whether they point to a ZST or not. I think it would be unnecessarily complicated to add an exception for DSTs. That means that the pointer in the Also, I think it is worth it to guarantee that EDIT: this is not the case for |
I don't have many thoughts along @ubsan's "representations" as that's not how I usually think about specifying low-level languages. It presumes the existence of some "higher-level representation" of data, higher-level than "sequence of bytes/bits", and I am not convinced that is something we need for all types. (To be more precise, I think it is useful for some types, like when defining arithmetic, but e.g. unnecessary for compound types.) There is some overlap with validity in the sense that the set of valid bit sequences for a type would likely coincide with the set of bit sequences that "map to a representation of the type". However, I don't think any of that is even needed to have this discussion. This is the layout discussion ("representation" as in // `&?mut [T]` is like
#[repr(C)]
struct Slice {
ptr: &?mut T, // with raw slices, this is a raw ptr
len: usize,
}
// `&?mut dyn Trait` is like
#[repr(C)]
struct DynObject {
data: &?mut T, // with raw dyn objects, this is a raw ptr
vtable: &usize, // with raw dyn objects, this is a raw ptr
} This is pretty much exactly what @ubsan wrote above. I used reference types to indicate non-nullness and alignedness, though that is already kind-of a validity invariant topic. @gnzlbg Is there anything to say other than giving equivalent struct layouts and guaranteeing the |
No, I don't think so. Maybe this is something worth repeating in a note, but from the little that we guarantee about enum optimizations, and the difference in niches between |
I updated the draft at https://github.com/asajeffrey/unsafe-code-guidelines/blob/repr-pointers/reference/src/representation/pointers.md, to make "same representation as" the defn of the representation of |
Oh, something we might want to add... the representation of |
Representation of Rust references:
&T
and&mut T
guaranteed to be a pointer?Representation of raw pointers:
Other factors:
The text was updated successfully, but these errors were encountered: