-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial design for unions #139
Conversation
|
||
``` | ||
struct SsoString { | ||
bitfield(1) var Bool: is_small; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we should be more actively considering ways of expressing this in a way where the language is aware of the relationship between is_small
and which component of the union is active. This would open the door to zero-cost static analysis that the type is being used correctly, and lower-cost dynamic analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that there's no guarantee that the discriminator is in the same struct as the union, or is even reachable from the union. There are even quite plausible cases where there's no discriminator at all, and safety is achieved entirely through static reasoning.
We could consider something like that as an optional add-on, rather than something that's inherent in all unions, but because it's an add-on, I don't think it needs to be part of the initial unions design. Instead, I think we should consider it as part of the overall safety story, because until then it won't be clear how much benefit it actually provides. I've just added a proposal doc to this PR, which includes a rationale for why I want to proceed with this proposal even though the safety plan are unresolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that there is no guarantee that you won't need to do something unsafe, but I don't think that is the common case we need to make easy, especially since it is the case where the language can give you little support. Rust is an example of a language aiming for high performance but provides a safe primitive here. If you really need something outside what the language can reason about, then use something like a byte array and casts.
I think the main thing as a team we should be doing at this stage is deciding between high level approaches. So even if we ultimately decide to go with the approach currently presented in this document, the job of this document should be to present and discuss the alternatives under consideration, and presenting rationale for one choice over others. I very much feel this document needs to at least address the safer design alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you added a blurb in the proposal text about discriminated unions -- I maintain that it requires more serious consideration since I think some of the concerns raised there are quite addressable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully agreed with @josh11b. I feel like our safe algebraic data type facility should allow the type of micro-management of storage layout necessary for SsoString
presented here. Of course, there will be cases where the discriminator is not accessible easily, but I don't think that's the common case, and that type of code is highly unsafe anyway, so there's little the language can do to help.
…n matching, update the discussion of safety, and add some section headings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some quick comments -- I haven't gotten far, but it's the end of the day and I don't want to run into commit skew if you're making changes before I get back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finishing a pass of comments.
proposals/p0139.md
Outdated
|
||
### No type punning | ||
|
||
This proposal does not permit accessing an inactive union member. This is unlike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the designator, this behavior can't be detected, right? And so in opt builds, this probably won't be caught and would not be an error?
I wonder, is "error" an apt description for something that would only be caught in debug build modes? Is this something that we should make clear in our documentation, that "error" does not mean a program will consistently fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed by chat, I've switched to using the term "out of contract". I'm not sure I like the result; I'm open to alternative suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with "out of contract", but can you also replace "does not permit"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better?
proposals/p0139.md
Outdated
However, that's a false analogy: the defining characteristic of an uninitialized | ||
or indeterminate value is that we know nothing about its state, whereas we know | ||
everything there is to know about the state of an inactive union. In particular, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels very similar to me as if we simply said every struct defaults members to uninit, and thus we "know everything" and don't need require struct members be initialized.
(as I point out, this very much feels like an inconsistency -- I disagree with the false analogy claim)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the next sentence: when I say we "know nothing" about the state of an uninitialized value, what I mean is that if an operation's precondition is guaranteed to be true of an uninitialized value, then that precondition is also guaranteed to be true of every possible value. To put it another way, the knowledge that a variable is uninitialized never helps you deduce that a program is correct, or deduce anything about the behavior of a correct program.
By contrast, there are preconditions that are satisfied by an inactive union, but not by an active union (in particular, the precondition of create:
), and the knowledge that a union is inactive can help you deduce that a program is correct (e.g. if that union is passed to create:
).
Does that make sense? If so, any suggestions for how to make this point clearer in the text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it also seems like a distinction without a difference.
To put it another way, the knowledge that a variable is uninitialized never helps you deduce that a program is correct, or deduce anything about the behavior of a correct program.
Certainly it does. A storage location should be uninitialized before the allocation it is a part of is deallocated, and before create
, regardless of whether it is a part of a union or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A storage location should be uninitialized before the allocation it is a part of is deallocated, and before
create
, regardless of whether it is a part of a union or not.
I agree, but I don't understand how that relates to what I said.
|
||
### `create` and `destroy` are union-only | ||
|
||
In this proposal, `create` and `destroy` can only be applied to union members, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these only apply to union members, perhaps union_create
and union_destroy
might be better names? I'm just wondering if we want to reserve these two words as keywords for such a narrow purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rephrased to try to make clear why I think that would be counterproductive. If you still want to change these keywords, can you elaborate on what problem you're trying to solve, i.e. what might go wrong if we stuck with the proposed names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are use cases for leaving local variables and struct fields uninitialized:
var Int32: x = uninitialized;
if (...) {
create x = 42;
} else {
create x = 0;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the proposed names, combined with reserved keywords it means that people can't write create
or destroy
identifiers. It's just not obvious to me that unions are common enough for that. shrug
Dmitri, for local variables, suppose you have:
var Int32: x = uninit;
if (...) {
x = 42;
}
(some body of code generating B)
if (B) {
x = 0;
}
If you require create, and it's enforced in the way described here, then you would need:
var Int32: x = uninit;
var bool: created = false;
if (...) {
create x = 42;
created = true;
}
(some body of code generating B)
if (B) {
if (created) {
x = 0;
} else {
create x = 0;
}
}
Or some other refactoring. I think this could get to be more complex, and would ideally not be required for most types. The link of requiring users to not just initialize, but also state that they expect this is the first initialization, seems like a burden that's maybe okay in the scope of unions (as a niche feature) but would likely be problematic taken broadly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sort of gets back to the discussion of uninit
vs. a union with no active member: I've been assuming that = uninit
is like default-initialization in C++: it creates the object, but leaves it in a partially-formed state (i.e. a state that can only be assigned to or destroyed). Consequently, it has to be an error to apply create
to that object, because you can't create an object that already exists.
I agree that there are important use cases for manually controlling the lifetime of a local variable, but I think you should have to opt into that as part of the variable's declaration, e.g.
struct ManualLifetimeInt32 {
union {
Int32 object;
}
}
...
var ManualLifetimeInt32: x;
if (...) {
create: x.object = 42;
} else {
create: x.object = 0;
}
...
destroy x.object;
ManualLifetimeInt32
could of course be a template, and might have a nicer API. One variant to explore would be a type where the underlying object starts out live, and is destroyed by the wrapper's destructor, but can be destroyed and recreated in the middle; that seems to be the more common pattern in C++, possibly because C++ makes it easier to write.
Regarding the proposed names, combined with reserved keywords it means that people can't write
create
ordestroy
identifiers.
I expect that to be the case regardless, since operator create
and operator destroy
are planned to be how we spell constructors and destructors.
at the end of a union, but it's not clear that there are any compelling use | ||
cases for it. In particular, packing at the end of a union is not necessary to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the other doc, I think I've suggested two use cases when I saw this:
- Avoid asymmetric behavior that creates bitfield ordering surprises for users.
- Sequential bitfield unions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an earlier comment, I suggested another one: packing a discriminator into the trailing bits of the union.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't describe 1 as a "use case" (i.e. something a user might want to do), and as noted on your other comment, the surprises in question don't seem very serious to me. As for 2, I don't find it very compelling; it seems like a very rare corner case (note that C and C++ don't support it, and I'm not aware of any complaints about that), and there's a viable if somewhat annoying workaround (nest the second union inside the first).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for packing into the trailing bits of the union, is there a reason the user couldn't pack the discriminator into the leading bits instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the example I gave before, with a PackedPtr
, where we'd be reusing the low-order (alignment) bits: The high order bits aren't necessarily available in general, so one ordering of the union versus the discriminator will use at most only a mask (and if the value of the discriminator is known, as it typically will be, can directly use the pointer value or ptr-1), whereas the other ordering will use bit-shifts. Which ordering is best will depend on endianness -- for little-endian, we do want to use the leading bits not the trailing ones, but it seems like that is us getting lucky rather than things working out by design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree 1 isn't something a user might intend to do, but it is something they might accidentally do from misunderstanding how byte packing of unions works (or just clumsy reordering of struct members). This gets to the thread on the design:
Dangerous how? As far as I can tell, if users mistakenly think that packing can happen at the end of a bitfield, the worst case scenario is that their objects are somewhat larger than they expected.
That can be pretty bad for the types of software Carbon wants to target.
To emphasize on Dmitri's point, the issue is that it would be larger than the developer expects, which would be a bug. That could mean a difficult to detect, difficult to debug issue. In goal terms, I believe this is problematic under "Adhere to the principle of least surprise," an understandability issue for a performance feature.
Regarding 2, and in general, I'm fine if you don't consider issues compelling. Please try to address them in the alternatives regardless. Simply saying "it's not clear that there are any compelling use cases for it" without offering any discussion of cost-benefit or pros-cons feels more dismissive of concerns, at least to me.
allowing packing at the end of a union would make it more difficult to tell when | ||
accesses to two separate fields can cause a data race. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already do this at the start of the boundary -- why is it measurably worse at the end of the boundary?
Maybe it'd be better to allow field groups at the struct level, to determine when layout packing applies, making it explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already do this at the start of the boundary -- why is it measurably worse at the end of the boundary?
Because the end of the boundary might also be the start of another boundary, which multiplies the number of fields that can conflict with each other.
Maybe it'd be better to allow field groups at the struct level, to determine when layout packing applies, making it explicit?
It sounds like you're suggesting an approach where adjacent fields would not be packed together unless they're in the same field group, but that would defeat the whole purpose of introducing field groups in the first place; for example it would mean that is_small
is not packed together with small.size
or large.size
.
The more fundamental problem is that we're trying to address cases where the packing boundaries cut across the logical field structure boundaries. For example, for packing purposes small.size
groups with is_small
but not with small.buffer
, but for purposes of which fields are live at the same time, small.size
groups with small.buffer
but not with is_small
. That means that at most one of those sets of boundaries can line up with open/close curly braces. C/C++ chose to make curly braces represent packing boundaries, but that means sometimes you have to put a field inside a union even though it's supposed to always be available. I'm proposing that Carbon should make curly braces represent logical field structure, but that means we can't use curly braces to define packing boundaries.
docs/design/unions.md
Outdated
apply `create` to a union that already has an active member, or apply `destroy` | ||
to a union that does not have an active member. `destroy` can be thought of as a | ||
unary operator, but a `create` statement has the syntax and semantics of a | ||
variable declaration, with `create` taking the place of `var`, and the field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variable declaration, with `create` taking the place of `var`, and the field | |
variable declaration, with `create` taking the place of `var T`, where `T` is the type of the field, and the field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, this design seems very promising to me. Treating union
and group
as not-exactly-type containers has some appeal, and I like that this lets us form efficient structure layouts that C++ makes painful or impossible to achieve.
Tying the possibility of a subobject having a different lifetime from its container to whether the subobject is a union member seems quite appealing. It'll be interesting to see how this plays out with (for example) vector, where we might not be able to expose raw pointers to an array of elements, because the elements are actually a union { var T: value; }
. But I think this is a good base to build from.
docs/design/unions.md
Outdated
have the same types, names, and order, and their names are part of the names of | ||
their fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have the same types, names, and order, and their names are part of the names of | |
their fields: | |
have the same types, names, and order. The name of a field group is part of the name of | |
each field in the group: |
Assert(str.small.size == 0); | ||
destroy str.small; | ||
create: str.large = (.size = 0, .capacity = 100, .buffer = MakeBuffer(100)); | ||
Assert(str.large.capacity == 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is str.large
here? Is it possible to refer to that without a following .field
? Does that expression have a type? Can I pass it to a function?
Not supposed to be leading questions -- I think it could possible be OK for the answer to be that you must always write .something
after str.large
, so it's really just as if large.
is a prefix of the member names. But that would potentially lead to some non-uniformity. I think it could also possibly be OK for the answer to be that you get some unspecified anonymous type -- but that would somewhat contract the statement below that field groups don't have types.
docs/design/unions.md
Outdated
However, field groups are not objects, and do not have types; like unions, field | ||
groups are a way of controlling the layout of the fields of a struct. For | ||
example, if `large` and `small` were structs, the bitfields would not save any | ||
space, and `is_small` would have to be followed by 63 bytes of padding in order | ||
to ensure proper alignment of `large`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible path through the "what is the type of str.large
?" question would be to generalize what we mean by "type" to include types whose representation spans some set of bits instead of some set of bytes, and possibly a non-contiguous set of bits -- or at least, a sequence of bits not starting at the first bit in their alignment unit and not ending at the last bit in such a unit.
docs/design/unions.md
Outdated
field group containing a single field. The ending offset of a union is the | ||
maximum ending offset of any field in any of its field groups, rounded up to the | ||
next whole byte. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forcing a byte boundary to the end of a union means we can't do
union {
bitfield(63) var Int63: x;
bitfield(63) var PackedPtr(Foo): y;
}
bitfield(1) var Int1: active;
Do we need to pad to the end of the byte here? What do we get in exchange?
docs/design/unions.md
Outdated
destruction of an active union member as also destroying any immediately | ||
preceding bitfields, and then recreating them with the same contents. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what this is acknowledging is that any store to a bitfield may actually also write to adjacent bitfield members, because the hardware doesn't support bit-level access, and that can result in a data race.
So in some sense yes, any store to a union is more expensive than a regular store, because we will first load the entire storage unit, including any unchanged members that are nearby, then change the bitfield in question with some bitwise operations, then store back to the entire storage unit. But I think the cost is the same for a destroy+create as it would be for a regular store.
nested subobjects), and such pointers are indistinguishable from pointers to any | ||
other object. Reliably sanitizing accesses via such pointers would require | ||
dynamically tracking which union member (if any) each pointer points to, | ||
propagating that information to subobject accesses, and instrumenting every |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps. We'd need to be able to represent pointers in such a way that we can invalidate all pointers that point to a union member when it becomes inactive. We could use pointer tagging for this, at the expense of not having the tag bits available for other purposes -- and that still leaves us instrumenting every access, but perhaps in a way we were going to anyway. Or we could track the set of pointers that point to the union member, which would likely require instrumenting at least every pointer assignment that we can't statically prove doesn't point to a union member.
I think a better baseline behavior would be to say that members (union or otherwise) cannot have their address taken by default. This is a desirable property for other reasons too. That still leaves open this problem for the case where unions are explicitly made addressable, but that's at least only a (hopefully small) subset of all union members.
docs/design/unions.md
Outdated
**TODO:** It looks very difficult to support exposing C++ unions to Carbon in an | ||
automated way, unless we are willing to allow type-punning via ordinary pointer | ||
reads, and allow assignment through a pointer to implicitly destroy and create |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't entirely follow this. If a C++ union is exposed to Carbon semantically as a Carbon struct { union { ... } }
(with the proper layout enforced by the interoperability layer), the Carbon semantics would presumably apply to it in Carbon code, and the C++ semantics would presumably apply in C++ code -- Carbon wouldn't be able to change the active member by assignment but C++ would.
Is the concern here about migration rather than interoperability?
at the end of a union, but it's not clear that there are any compelling use | ||
cases for it. In particular, packing at the end of a union is not necessary to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an earlier comment, I suggested another one: packing a discriminator into the trailing bits of the union.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I haven't finished responding to comments)
nested subobjects), and such pointers are indistinguishable from pointers to any | ||
other object. Reliably sanitizing accesses via such pointers would require | ||
dynamically tracking which union member (if any) each pointer points to, | ||
propagating that information to subobject accesses, and instrumenting every |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Clarification: there was a race between @zygoloid's comment and mine; I was responding to the original comment)
at the end of a union, but it's not clear that there are any compelling use | ||
cases for it. In particular, packing at the end of a union is not necessary to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for packing into the trailing bits of the union, is there a reason the user couldn't pack the discriminator into the leading bits instead?
} | ||
``` | ||
|
||
A union consists of zero or more _members_, at most one of which can be live at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bikeshed, but since you are already using the term "field" when referring to things contained in a struct, why not use the same term here for unions as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to clarify why the two aren't synonymous.
docs/design/unions.md
Outdated
accessed during pattern matching, and has no effect on whether the pattern | ||
matches. If the pattern mentions a union member, the corresponding subpattern is | ||
matched against that member, which means that user code must ensure that pattern | ||
matching does not reach that point unless that member is active. Note that this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there's a consequence that we would have to document exactly how pattern matching reads from memory and finds the best match, and that would become a part of language semantics. I personally don't think we want to do that given the optimization potential. I'd rather say that union members can't participate in pattern matching; given how subtle the code handling unions is, the code should spell out the exact memory access order for readability and making the intent explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to document it exactly, we just have to constrain it enough to enable some basic pattern matching usages. In particular, we have to constrain it enough that code for a user-defined type can ensure that pattern matching applies to the discriminator before it applies to the union members.
I'm extremely reluctant to say that unions can't participate in pattern matching, because I think it will actually make pattern matching a much more complicated feature. If unions can't participate in pattern matching, then we have to invent some way for pattern matching on a user-defined type to delegate to non-pattern-matching-based code (i.e. a pretty heavyweight pattern matching customization mechanism), or say that user-defined types that support pattern matching can't be implemented in terms of unions. On the other hand, if we allow pattern matching on unions, we can allow user-defined types to support pattern matching by delegating to pattern matching on their implementation details.
|
||
## Overview | ||
|
||
Fields of a struct can be grouped into _unions_. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can fields within a union have non-trivial copy, assignment, or destruction semantics?
Does a struct with a union get a compiler-generated copy, assign, or destroy operation? If yes, what does it do with the fields inside the union?
(Sorry, I can't find this point discussed anywhere in the proposal, so I'm commenting in an arbitrary place.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A union member can have any type, but structs with unions won't get compiler-generated operations. Updated the text to make those points explicit.
|
||
``` | ||
struct SsoString { | ||
bitfield(1) var Bool: is_small; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully agreed with @josh11b. I feel like our safe algebraic data type facility should allow the type of micro-management of storage layout necessary for SsoString
presented here. Of course, there will be cases where the discriminator is not accessible easily, but I don't think that's the common case, and that type of code is highly unsafe anyway, so there's little the language can do to help.
space, and `is_small` would have to be followed by 63 bits of padding in order | ||
to ensure proper alignment of `large`. | ||
|
||
## Layout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative that I'd like us to consider is to by default let the compiler choose the layout, but allow the user to specify the layout exactly. For example, the parts of a single bitfield should be tagged with the name of the bitfield to demand that they are packed together, members that should be laid out at specific offsets should be annotated as such, any padding must be explicitly inserted (or the following field must be annotated as unaligned) etc.
We could demand that code must always explicitly specify the layout in unions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think explicit layout control specifically needs to be part of this proposal, rather than a separate one?
We could demand that code must always explicitly specify the layout in unions.
Sure, but what problem would that solve?
|
||
Conversely, C++ is more restrictive in one respect: the members of a C++ union | ||
must be objects, and the union's alignment must conform to the alignment | ||
requirements of its member objects. For example, there doesn't seem to be any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced it is important in practice to expose Carbon unions to C++. Most unions of this kind in C++ are typically very well encapsulated (like SsoString
above), and expose an API that does not mention unions; so I expect the same to be true in Carbon as well. My suggestion is to represent Carbon unions as opaque byte blobs in C++ if the type does happen to de exported to C++.
Furthermore, since there's a semantic gap (like the fact that Carbon requires using special create
and destroy
operators to change the active member), I'm not sure that C++ union is really a good match, even if we tried hard to make it work. I think a more fruitful direction would be to expose Carbon unions to C++ as classes with no public data members, and only member functions to manipulate the data. The contract of those member functions would match Carbon's language semantics exactly.
proposals/p0139.md
Outdated
prevents the toolchain from detecting and diagnosing bugs, and may foreclose | ||
useful optimization opportunities. | ||
|
||
These restrictions also substantially simplifies the design of the language: we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These restrictions also substantially simplifies the design of the language: we | |
These restrictions also substantially simplify the design of the language: we |
proposals/p0139.md
Outdated
substantively, there are two cases in which union names seem like they might be | ||
useful. However, in both cases the actual utility seems limited. | ||
|
||
First, it's tempting to think of an inactive union as equivalent to an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, it's tempting to think of an inactive union as equivalent to an | |
First, it's tempting to think of an union without an active member as equivalent to an |
proposals/p0139.md
Outdated
expectation that an integer variable holds a meaningful value. There is no | ||
corresponding default expectation that unions will always have an active member, | ||
so there is no need for special syntax to opt out of such a default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK about that. Whether someone has such expectations depends on a person and their background. I, for example, was surprised that unions allow having no active member. In my opinion, it should be at least an opt-in feature (since not every union will need it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised this passage to hopefully make the argument clearer.
proposals/p0139.md
Outdated
However, that's a false analogy: the defining characteristic of an uninitialized | ||
or indeterminate value is that we know nothing about its state, whereas we know | ||
everything there is to know about the state of an inactive union. In particular, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it also seems like a distinction without a difference.
To put it another way, the knowledge that a variable is uninitialized never helps you deduce that a program is correct, or deduce anything about the behavior of a correct program.
Certainly it does. A storage location should be uninitialized before the allocation it is a part of is deallocated, and before create
, regardless of whether it is a part of a union or not.
|
||
### `create` and `destroy` are union-only | ||
|
||
In this proposal, `create` and `destroy` can only be applied to union members, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are use cases for leaving local variables and struct fields uninitialized:
var Int32: x = uninitialized;
if (...) {
create x = 42;
} else {
create x = 0;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed out some fixes I had in progress, but before I respond to the remaining comments I'm going to put together an overview proposal for the design direction this is part of, in order to address some of the feedback about how this fits together with discriminated unions.
|
||
## Overview | ||
|
||
Fields of a struct can be grouped into _unions_. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A union member can have any type, but structs with unions won't get compiler-generated operations. Updated the text to make those points explicit.
} | ||
``` | ||
|
||
A union consists of zero or more _members_, at most one of which can be live at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to clarify why the two aren't synonymous.
docs/design/unions.md
Outdated
accessed during pattern matching, and has no effect on whether the pattern | ||
matches. If the pattern mentions a union member, the corresponding subpattern is | ||
matched against that member, which means that user code must ensure that pattern | ||
matching does not reach that point unless that member is active. Note that this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to document it exactly, we just have to constrain it enough to enable some basic pattern matching usages. In particular, we have to constrain it enough that code for a user-defined type can ensure that pattern matching applies to the discriminator before it applies to the union members.
I'm extremely reluctant to say that unions can't participate in pattern matching, because I think it will actually make pattern matching a much more complicated feature. If unions can't participate in pattern matching, then we have to invent some way for pattern matching on a user-defined type to delegate to non-pattern-matching-based code (i.e. a pretty heavyweight pattern matching customization mechanism), or say that user-defined types that support pattern matching can't be implemented in terms of unions. On the other hand, if we allow pattern matching on unions, we can allow user-defined types to support pattern matching by delegating to pattern matching on their implementation details.
space, and `is_small` would have to be followed by 63 bits of padding in order | ||
to ensure proper alignment of `large`. | ||
|
||
## Layout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think explicit layout control specifically needs to be part of this proposal, rather than a separate one?
We could demand that code must always explicitly specify the layout in unions.
Sure, but what problem would that solve?
proposals/p0139.md
Outdated
However, that's a false analogy: the defining characteristic of an uninitialized | ||
or indeterminate value is that we know nothing about its state, whereas we know | ||
everything there is to know about the state of an inactive union. In particular, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A storage location should be uninitialized before the allocation it is a part of is deallocated, and before
create
, regardless of whether it is a part of a union or not.
I agree, but I don't understand how that relates to what I said.
proposals/p0139.md
Outdated
|
||
### No type punning | ||
|
||
This proposal does not permit accessing an inactive union member. This is unlike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better?
proposals/p0139.md
Outdated
expectation that an integer variable holds a meaningful value. There is no | ||
corresponding default expectation that unions will always have an active member, | ||
so there is no need for special syntax to opt out of such a default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised this passage to hopefully make the argument clearer.
|
||
### `create` and `destroy` are union-only | ||
|
||
In this proposal, `create` and `destroy` can only be applied to union members, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sort of gets back to the discussion of uninit
vs. a union with no active member: I've been assuming that = uninit
is like default-initialization in C++: it creates the object, but leaves it in a partially-formed state (i.e. a state that can only be assigned to or destroyed). Consequently, it has to be an error to apply create
to that object, because you can't create an object that already exists.
I agree that there are important use cases for manually controlling the lifetime of a local variable, but I think you should have to opt into that as part of the variable's declaration, e.g.
struct ManualLifetimeInt32 {
union {
Int32 object;
}
}
...
var ManualLifetimeInt32: x;
if (...) {
create: x.object = 42;
} else {
create: x.object = 0;
}
...
destroy x.object;
ManualLifetimeInt32
could of course be a template, and might have a nicer API. One variant to explore would be a type where the underlying object starts out live, and is destroyed by the wrapper's destructor, but can be destroyed and recreated in the middle; that seems to be the more common pattern in C++, possibly because C++ makes it easier to write.
Regarding the proposed names, combined with reserved keywords it means that people can't write
create
ordestroy
identifiers.
I expect that to be the case regardless, since operator create
and operator destroy
are planned to be how we spell constructors and destructors.
We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active, please comment or remove the |
We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active or becomes active again, please reopen it. |
Note: FIXME indicates something I intend to address before merging, but TODO indicates something I expect to be addressed by future PRs.