-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Representation of unions #13
Comments
Also, some relevant text from C11:
6.5.8.5:
6.7.2.1.16:
|
AFAIK the first paragraph you quoted (6.5.3.6.6) is solely about strict aliasing/TBAA, which Rust doesn't do. The other two quotes seem to practically guarantee that all members of the union start at the same offset, and offset 0 at that unless the |
@rkruppe Yes. (Also, I think the "suitably converted" there exclusively means a type conversion, not a value change.) |
Seems like we might as well reserve the right for that, although I don't see much motivation. Maybe we should drill in to some of the more specific questions:
I am sort of of tempted to do so, because I don't know that there is much practical utility to doing otherwise, but I'd be curious to hear of use cases. |
In the interests of full evaluation of alternatives: the only argument I've heard for doing otherwise would be if we could detect that all the variants in a I don't believe we should do either of those things, but I wanted to mention the arguments for doing so for completeness. |
That's an interesting line of thought! However, what could we actually use these "holes" for? I assume you're referring to padding. I'm not aware of any way to stash discriminants or other data in a contained type's padding. Any write or copy is allowed to omit or clobber padding bytes at random. For example, suppose |
Do we want We can then, at some point, re-consider adding We don't have to allow all kinds of types for all |
@gnzlbg We already have |
@rkruppe I'm not talking about padding. I'm talking about things like enums and If I have a Again, I don't think that we should do that, but people have suggested doing so. |
Ah, that makes more sense. I also don't think we should do this, though, I'm in favor of the "unions are bags of uninterpreted bits" approach that we seem to be slowly converging on (e.g. with the disposition to merge rust-lang/rfcs#2514). |
My question was more about: how is We don't really have to break any code for doing this change. We can just say that unions are |
I think this is part of the next discussion, the one about validity invariants: We first have to decide which bit patterns are valid for a union; once that is fixed, if the decision rules out some bit patterns as invalid, we can talk about layout optimizations. |
That's a good point, but it's also inconvenient since pretty much everything that there is to decide about union layout now depends on the validity invariant =/ |
Well, there are still the things @joshtriplett brought up above. But yeah, while for enums layout came first and now we retrofit the validity invariant, it seems more reasonable to me to proceed the other way around with unions. I guess one question we could discuss here, that could inform that validity discussion, is: Is there a strong need for layout optimizations on unions? We know for sure that some unions should not get layout optimized ( My personal stanza on this is that if someone really wants their union layout optimized, we should provide attributes to let them do that -- attributes that could also be used on |
Which ones? The C standard quotes just more or less confirm what I think everyone agrees on for repr(C) layout, and as far as I can see all the points on repr(Rust) are driven by layout optimizations that interact with the validity invariant.
+1 |
My primary use case was |
So, this seems like a place where we can "describe the controversy", rather than laying out specific rules. I think I could summarize the comments like so thus far (let me know if I am missing something):
Personally, I lean towards the view that we should not assume anything about the bits of a union until we see an actual access (which then gives us the type we can use). This seems to imply that we can't do layout optimizations, because we can't be sure that the union is even initialized, and all layout optimizations rely on a notion of initialization. (But perhaps this is overly strict?) |
@nikomatsakis That would be my preference. And even then, the compiler can't necessarily keep assuming that type over time. |
I think the idea here is that you may be able to do layout optimizations provided that you can find some guarantee that applies to all members? I think the "bag of bits" thing is a bit of a red herring here, although it does depend on the exact decisions we make about uninitialized memory. For instance, given Now that I think about it more, I don't think there's a sensible way for the compiler to offset fields within the union in order to achieve this result more generally. The niches into which we perform enum optimizations are quite different from padding holes, which we have discussed already cannot be assumed to remain constant (and requiring that may force undersized assignments, which may be a pessimization). Without going to too much detail right now, I'm pretty sure we cannot make use of this unless, as Niko floated for enums, we have some way to insist that you cannot bind references to individual fields. Furthermore, I think we're converging on saying that a single-field struct always lays its field at offset 0. If that is the case, then the same should probably apply to unions, so we would say that all fields are laid out at offset 0. This leaves two questions unanswered, however: a) can a union be more strictly aligned than any of its fields and b) can a union have unnecessary trailing padding? As far as I can tell, C answers "yes" to both, though I do not know why. I think this implies that |
I don't recall us saying anything about this in the struct chapter. Perhaps I just missed it. In any case, it seems like something we might want to add into #31. |
The latter part of your statement is not correct. We should allow things like this. The union is not valid for any of its variants after the assignment to So, if anything we'd have to do that union bytewise. I'd rather if we didn't. Unions are bags of bits with names to access some offsets, end of story. That's already complicated enough to use that I wouldn't want to further complicate the story by adding layout optimizations to the mix. If people show compelling use-cases for layout-optimized unions, I'd suggest we work towards stabilizing something like rust-lang/rust#54032. That covers your union-of-two-references. |
I heard people want me to do the writeup. Assigning to me then. I didn't hear of any deadline though... ;) |
What is the layout of union variants when the union U {
a: __m128, // repr(simd)
b: (),
} Currently, |
Memory layout is no different. Calling convention details are different (and that factors into the relatively superficial difference in the IR we produce observed over there in that PR), but I see no reason to specify those for repr(rust) unions, as they are irrelevant outside of FFI which one should use repr(C) for anyway. |
So the distinction between |
No, arrays vs vectors is a quite important distinction for the IR, but none of those differences except ABI lowering affect the sort of visible behavior we are documenting here. |
To summarize the discussion that happened here, the consensus seems to be that |
@RalfJung sounds great to me! Do you think you can get a write-up done by this Thursday? Would be good to have somethng by the meeting. =) |
Done: #39 Feels rather short, but what else is there to say? |
It would be useful to use |
Turns out regex relies on I bet they are not the only ones... |
Discussing how unions are laid out.
#[repr(C)]
meaningful when applie to a union?The text was updated successfully, but these errors were encountered: