Initial design for unions #139

geoffromer · 2020-08-07T20:13:21Z

Note: FIXME indicates something I intend to address before merging, but TODO indicates something I expect to be addressed by future PRs.

docs/design/unions.md

josh11b · 2020-08-12T17:26:59Z

docs/design/unions.md

+
+```
+struct SsoString {
+  bitfield(1) var Bool: is_small;


It seems like we should be more actively considering ways of expressing this in a way where the language is aware of the relationship between is_small and which component of the union is active. This would open the door to zero-cost static analysis that the type is being used correctly, and lower-cost dynamic analysis.

The problem is that there's no guarantee that the discriminator is in the same struct as the union, or is even reachable from the union. There are even quite plausible cases where there's no discriminator at all, and safety is achieved entirely through static reasoning.

We could consider something like that as an optional add-on, rather than something that's inherent in all unions, but because it's an add-on, I don't think it needs to be part of the initial unions design. Instead, I think we should consider it as part of the overall safety story, because until then it won't be clear how much benefit it actually provides. I've just added a proposal doc to this PR, which includes a rationale for why I want to proceed with this proposal even though the safety plan are unresolved.

I agree that there is no guarantee that you won't need to do something unsafe, but I don't think that is the common case we need to make easy, especially since it is the case where the language can give you little support. Rust is an example of a language aiming for high performance but provides a safe primitive here. If you really need something outside what the language can reason about, then use something like a byte array and casts.

I think the main thing as a team we should be doing at this stage is deciding between high level approaches. So even if we ultimately decide to go with the approach currently presented in this document, the job of this document should be to present and discuss the alternatives under consideration, and presenting rationale for one choice over others. I very much feel this document needs to at least address the safer design alternative.

It looks like you added a blurb in the proposal text about discriminated unions -- I maintain that it requires more serious consideration since I think some of the concerns raised there are quite addressable.

Fully agreed with @josh11b. I feel like our safe algebraic data type facility should allow the type of micro-management of storage layout necessary for SsoString presented here. Of course, there will be cases where the discriminator is not accessible easily, but I don't think that's the common case, and that type of code is highly unsafe anyway, so there's little the language can do to help.

…n matching, update the discussion of safety, and add some section headings.

jonmeow

Adding some quick comments -- I haven't gotten far, but it's the end of the day and I don't want to run into commit skew if you're making changes before I get back.

docs/design/unions.md

jonmeow

Finishing a pass of comments.

docs/design/unions.md

jonmeow · 2020-08-13T16:50:12Z

proposals/p0139.md

+
+### No type punning
+
+This proposal does not permit accessing an inactive union member. This is unlike


Without the designator, this behavior can't be detected, right? And so in opt builds, this probably won't be caught and would not be an error?

I wonder, is "error" an apt description for something that would only be caught in debug build modes? Is this something that we should make clear in our documentation, that "error" does not mean a program will consistently fail?

As discussed by chat, I've switched to using the term "out of contract". I'm not sure I like the result; I'm open to alternative suggestions.

I'm fine with "out of contract", but can you also replace "does not permit"?

jonmeow · 2020-08-13T16:59:52Z

proposals/p0139.md

+However, that's a false analogy: the defining characteristic of an uninitialized
+or indeterminate value is that we know nothing about its state, whereas we know
+everything there is to know about the state of an inactive union. In particular,


This feels very similar to me as if we simply said every struct defaults members to uninit, and thus we "know everything" and don't need require struct members be initialized.

(as I point out, this very much feels like an inconsistency -- I disagree with the false analogy claim)

See the next sentence: when I say we "know nothing" about the state of an uninitialized value, what I mean is that if an operation's precondition is guaranteed to be true of an uninitialized value, then that precondition is also guaranteed to be true of every possible value. To put it another way, the knowledge that a variable is uninitialized never helps you deduce that a program is correct, or deduce anything about the behavior of a correct program.

By contrast, there are preconditions that are satisfied by an inactive union, but not by an active union (in particular, the precondition of create:), and the knowledge that a union is inactive can help you deduce that a program is correct (e.g. if that union is passed to create:).

Does that make sense? If so, any suggestions for how to make this point clearer in the text?

To me it also seems like a distinction without a difference.

To put it another way, the knowledge that a variable is uninitialized never helps you deduce that a program is correct, or deduce anything about the behavior of a correct program.

Certainly it does. A storage location should be uninitialized before the allocation it is a part of is deallocated, and before create, regardless of whether it is a part of a union or not.

A storage location should be uninitialized before the allocation it is a part of is deallocated, and before create, regardless of whether it is a part of a union or not.

I agree, but I don't understand how that relates to what I said.

jonmeow · 2020-08-13T17:05:07Z

proposals/p0139.md

+
+### `create` and `destroy` are union-only
+
+In this proposal, `create` and `destroy` can only be applied to union members,


If these only apply to union members, perhaps union_create and union_destroy might be better names? I'm just wondering if we want to reserve these two words as keywords for such a narrow purpose.

I've rephrased to try to make clear why I think that would be counterproductive. If you still want to change these keywords, can you elaborate on what problem you're trying to solve, i.e. what might go wrong if we stuck with the proposed names?

There are use cases for leaving local variables and struct fields uninitialized:

var Int32: x = uninitialized; if (...) { create x = 42; } else { create x = 0; }

Regarding the proposed names, combined with reserved keywords it means that people can't write create or destroy identifiers. It's just not obvious to me that unions are common enough for that. shrug

Dmitri, for local variables, suppose you have:

var Int32: x = uninit; if (...) { x = 42; } (some body of code generating B) if (B) { x = 0; }

If you require create, and it's enforced in the way described here, then you would need:

var Int32: x = uninit; var bool: created = false; if (...) { create x = 42; created = true; } (some body of code generating B) if (B) { if (created) { x = 0; } else { create x = 0; } }

Or some other refactoring. I think this could get to be more complex, and would ideally not be required for most types. The link of requiring users to not just initialize, but also state that they expect this is the first initialization, seems like a burden that's maybe okay in the scope of unions (as a niche feature) but would likely be problematic taken broadly.

This sort of gets back to the discussion of uninit vs. a union with no active member: I've been assuming that = uninit is like default-initialization in C++: it creates the object, but leaves it in a partially-formed state (i.e. a state that can only be assigned to or destroyed). Consequently, it has to be an error to apply create to that object, because you can't create an object that already exists.

I agree that there are important use cases for manually controlling the lifetime of a local variable, but I think you should have to opt into that as part of the variable's declaration, e.g.

struct ManualLifetimeInt32 { union { Int32 object; } } ... var ManualLifetimeInt32: x; if (...) { create: x.object = 42; } else { create: x.object = 0; } ... destroy x.object;

ManualLifetimeInt32 could of course be a template, and might have a nicer API. One variant to explore would be a type where the underlying object starts out live, and is destroyed by the wrapper's destructor, but can be destroyed and recreated in the middle; that seems to be the more common pattern in C++, possibly because C++ makes it easier to write.

Regarding the proposed names, combined with reserved keywords it means that people can't write create or destroy identifiers.

I expect that to be the case regardless, since operator create and operator destroy are planned to be how we spell constructors and destructors.

jonmeow · 2020-08-13T17:06:50Z

proposals/p0139.md

+at the end of a union, but it's not clear that there are any compelling use
+cases for it. In particular, packing at the end of a union is not necessary to


In the other doc, I think I've suggested two use cases when I saw this:

Avoid asymmetric behavior that creates bitfield ordering surprises for users.

Sequential bitfield unions.

In an earlier comment, I suggested another one: packing a discriminator into the trailing bits of the union.

I wouldn't describe 1 as a "use case" (i.e. something a user might want to do), and as noted on your other comment, the surprises in question don't seem very serious to me. As for 2, I don't find it very compelling; it seems like a very rare corner case (note that C and C++ don't support it, and I'm not aware of any complaints about that), and there's a viable if somewhat annoying workaround (nest the second union inside the first).

As for packing into the trailing bits of the union, is there a reason the user couldn't pack the discriminator into the leading bits instead?

Considering the example I gave before, with a PackedPtr, where we'd be reusing the low-order (alignment) bits: The high order bits aren't necessarily available in general, so one ordering of the union versus the discriminator will use at most only a mask (and if the value of the discriminator is known, as it typically will be, can directly use the pointer value or ptr-1), whereas the other ordering will use bit-shifts. Which ordering is best will depend on endianness -- for little-endian, we do want to use the leading bits not the trailing ones, but it seems like that is us getting lucky rather than things working out by design.

I agree 1 isn't something a user might intend to do, but it is something they might accidentally do from misunderstanding how byte packing of unions works (or just clumsy reordering of struct members). This gets to the thread on the design:

@geoffromer:

Dangerous how? As far as I can tell, if users mistakenly think that packing can happen at the end of a bitfield, the worst case scenario is that their objects are somewhat larger than they expected.

@gribozavr:

That can be pretty bad for the types of software Carbon wants to target.

To emphasize on Dmitri's point, the issue is that it would be larger than the developer expects, which would be a bug. That could mean a difficult to detect, difficult to debug issue. In goal terms, I believe this is problematic under "Adhere to the principle of least surprise," an understandability issue for a performance feature.

Regarding 2, and in general, I'm fine if you don't consider issues compelling. Please try to address them in the alternatives regardless. Simply saying "it's not clear that there are any compelling use cases for it" without offering any discussion of cost-benefit or pros-cons feels more dismissive of concerns, at least to me.

jonmeow · 2020-08-13T17:11:43Z

proposals/p0139.md

+allowing packing at the end of a union would make it more difficult to tell when
+accesses to two separate fields can cause a data race.


You already do this at the start of the boundary -- why is it measurably worse at the end of the boundary?

Maybe it'd be better to allow field groups at the struct level, to determine when layout packing applies, making it explicit?

You already do this at the start of the boundary -- why is it measurably worse at the end of the boundary?

Because the end of the boundary might also be the start of another boundary, which multiplies the number of fields that can conflict with each other.

Maybe it'd be better to allow field groups at the struct level, to determine when layout packing applies, making it explicit?

It sounds like you're suggesting an approach where adjacent fields would not be packed together unless they're in the same field group, but that would defeat the whole purpose of introducing field groups in the first place; for example it would mean that is_small is not packed together with small.size or large.size.

The more fundamental problem is that we're trying to address cases where the packing boundaries cut across the logical field structure boundaries. For example, for packing purposes small.size groups with is_small but not with small.buffer, but for purposes of which fields are live at the same time, small.size groups with small.buffer but not with is_small. That means that at most one of those sets of boundaries can line up with open/close curly braces. C/C++ chose to make curly braces represent packing boundaries, but that means sometimes you have to put a field inside a union even though it's supposed to always be available. I'm proposing that Carbon should make curly braces represent logical field structure, but that means we can't use curly braces to define packing boundaries.

zygoloid · 2020-08-13T22:43:27Z

docs/design/unions.md

+apply `create` to a union that already has an active member, or apply `destroy`
+to a union that does not have an active member. `destroy` can be thought of as a
+unary operator, but a `create` statement has the syntax and semantics of a
+variable declaration, with `create` taking the place of `var`, and the field


Suggested change

variable declaration, with `create` taking the place of `var`, and the field

variable declaration, with `create` taking the place of `var T`, where `T` is the type of the field, and the field

zygoloid

Generally, this design seems very promising to me. Treating union and group as not-exactly-type containers has some appeal, and I like that this lets us form efficient structure layouts that C++ makes painful or impossible to achieve.

Tying the possibility of a subobject having a different lifetime from its container to whether the subobject is a union member seems quite appealing. It'll be interesting to see how this plays out with (for example) vector, where we might not be able to expose raw pointers to an array of elements, because the elements are actually a union { var T: value; }. But I think this is a good base to build from.

zygoloid · 2020-08-13T22:48:38Z

docs/design/unions.md

+have the same types, names, and order, and their names are part of the names of
+their fields:


Suggested change

have the same types, names, and order, and their names are part of the names of

their fields:

have the same types, names, and order. The name of a field group is part of the name of

each field in the group:

docs/design/unions.md

zygoloid · 2020-08-13T22:52:03Z

docs/design/unions.md

+Assert(str.small.size == 0);
+destroy str.small;
+create: str.large = (.size = 0, .capacity = 100, .buffer = MakeBuffer(100));
+Assert(str.large.capacity == 100);


What is str.large here? Is it possible to refer to that without a following .field? Does that expression have a type? Can I pass it to a function?

Not supposed to be leading questions -- I think it could possible be OK for the answer to be that you must always write .something after str.large, so it's really just as if large. is a prefix of the member names. But that would potentially lead to some non-uniformity. I think it could also possibly be OK for the answer to be that you get some unspecified anonymous type -- but that would somewhat contract the statement below that field groups don't have types.

zygoloid · 2020-08-13T22:59:03Z

docs/design/unions.md

+However, field groups are not objects, and do not have types; like unions, field
+groups are a way of controlling the layout of the fields of a struct. For
+example, if `large` and `small` were structs, the bitfields would not save any
+space, and `is_small` would have to be followed by 63 bytes of padding in order
+to ensure proper alignment of `large`.


One possible path through the "what is the type of str.large?" question would be to generalize what we mean by "type" to include types whose representation spans some set of bits instead of some set of bytes, and possibly a non-contiguous set of bits -- or at least, a sequence of bits not starting at the first bit in their alignment unit and not ending at the last bit in such a unit.

zygoloid · 2020-08-13T23:10:00Z

docs/design/unions.md

+field group containing a single field. The ending offset of a union is the
+maximum ending offset of any field in any of its field groups, rounded up to the
+next whole byte.


Forcing a byte boundary to the end of a union means we can't do

union { bitfield(63) var Int63: x; bitfield(63) var PackedPtr(Foo): y; } bitfield(1) var Int1: active;

Do we need to pad to the end of the byte here? What do we get in exchange?

zygoloid · 2020-08-14T01:12:35Z

docs/design/unions.md

+destruction of an active union member as also destroying any immediately
+preceding bitfields, and then recreating them with the same contents. This


I think what this is acknowledging is that any store to a bitfield may actually also write to adjacent bitfield members, because the hardware doesn't support bit-level access, and that can result in a data race.

So in some sense yes, any store to a union is more expensive than a regular store, because we will first load the entire storage unit, including any unchanged members that are nearby, then change the bitfield in question with some bitwise operations, then store back to the entire storage unit. But I think the cost is the same for a destroy+create as it would be for a regular store.

docs/design/unions.md

zygoloid · 2020-08-14T01:22:40Z

docs/design/unions.md

+nested subobjects), and such pointers are indistinguishable from pointers to any
+other object. Reliably sanitizing accesses via such pointers would require
+dynamically tracking which union member (if any) each pointer points to,
+propagating that information to subobject accesses, and instrumenting every


Perhaps. We'd need to be able to represent pointers in such a way that we can invalidate all pointers that point to a union member when it becomes inactive. We could use pointer tagging for this, at the expense of not having the tag bits available for other purposes -- and that still leaves us instrumenting every access, but perhaps in a way we were going to anyway. Or we could track the set of pointers that point to the union member, which would likely require instrumenting at least every pointer assignment that we can't statically prove doesn't point to a union member.

I think a better baseline behavior would be to say that members (union or otherwise) cannot have their address taken by default. This is a desirable property for other reasons too. That still leaves open this problem for the case where unions are explicitly made addressable, but that's at least only a (hopefully small) subset of all union members.

zygoloid · 2020-08-14T01:29:10Z

docs/design/unions.md

+**TODO:** It looks very difficult to support exposing C++ unions to Carbon in an
+automated way, unless we are willing to allow type-punning via ordinary pointer
+reads, and allow assignment through a pointer to implicitly destroy and create


I don't entirely follow this. If a C++ union is exposed to Carbon semantically as a Carbon struct { union { ... } } (with the proper layout enforced by the interoperability layer), the Carbon semantics would presumably apply to it in Carbon code, and the C++ semantics would presumably apply in C++ code -- Carbon wouldn't be able to change the active member by assignment but C++ would.

Is the concern here about migration rather than interoperability?

zygoloid · 2020-08-14T01:31:12Z

proposals/p0139.md

+at the end of a union, but it's not clear that there are any compelling use
+cases for it. In particular, packing at the end of a union is not necessary to


In an earlier comment, I suggested another one: packing a discriminator into the trailing bits of the union.

geoffromer

(I haven't finished responding to comments)

docs/design/unions.md

geoffromer · 2020-08-15T00:43:14Z

docs/design/unions.md

+nested subobjects), and such pointers are indistinguishable from pointers to any
+other object. Reliably sanitizing accesses via such pointers would require
+dynamically tracking which union member (if any) each pointer points to,
+propagating that information to subobject accesses, and instrumenting every


(Clarification: there was a race between @zygoloid's comment and mine; I was responding to the original comment)

geoffromer · 2020-08-15T00:45:31Z

proposals/p0139.md

+at the end of a union, but it's not clear that there are any compelling use
+cases for it. In particular, packing at the end of a union is not necessary to


As for packing into the trailing bits of the union, is there a reason the user couldn't pack the discriminator into the leading bits instead?

gribozavr · 2020-08-17T11:25:04Z

docs/design/unions.md

+}
+```
+
+A union consists of zero or more _members_, at most one of which can be live at


A bikeshed, but since you are already using the term "field" when referring to things contained in a struct, why not use the same term here for unions as well?

Updated to clarify why the two aren't synonymous.

docs/design/unions.md

gribozavr · 2020-08-17T11:34:29Z

docs/design/unions.md

+accessed during pattern matching, and has no effect on whether the pattern
+matches. If the pattern mentions a union member, the corresponding subpattern is
+matched against that member, which means that user code must ensure that pattern
+matching does not reach that point unless that member is active. Note that this


It seems like there's a consequence that we would have to document exactly how pattern matching reads from memory and finds the best match, and that would become a part of language semantics. I personally don't think we want to do that given the optimization potential. I'd rather say that union members can't participate in pattern matching; given how subtle the code handling unions is, the code should spell out the exact memory access order for readability and making the intent explicit.

We don't have to document it exactly, we just have to constrain it enough to enable some basic pattern matching usages. In particular, we have to constrain it enough that code for a user-defined type can ensure that pattern matching applies to the discriminator before it applies to the union members.

I'm extremely reluctant to say that unions can't participate in pattern matching, because I think it will actually make pattern matching a much more complicated feature. If unions can't participate in pattern matching, then we have to invent some way for pattern matching on a user-defined type to delegate to non-pattern-matching-based code (i.e. a pretty heavyweight pattern matching customization mechanism), or say that user-defined types that support pattern matching can't be implemented in terms of unions. On the other hand, if we allow pattern matching on unions, we can allow user-defined types to support pattern matching by delegating to pattern matching on their implementation details.

gribozavr · 2020-08-17T11:58:01Z

docs/design/unions.md

+
+## Overview
+
+Fields of a struct can be grouped into _unions_. For example:


Can fields within a union have non-trivial copy, assignment, or destruction semantics?

Does a struct with a union get a compiler-generated copy, assign, or destroy operation? If yes, what does it do with the fields inside the union?

(Sorry, I can't find this point discussed anywhere in the proposal, so I'm commenting in an arbitrary place.)

A union member can have any type, but structs with unions won't get compiler-generated operations. Updated the text to make those points explicit.

gribozavr · 2020-08-17T12:09:15Z

docs/design/unions.md

+
+```
+struct SsoString {
+  bitfield(1) var Bool: is_small;


Fully agreed with @josh11b. I feel like our safe algebraic data type facility should allow the type of micro-management of storage layout necessary for SsoString presented here. Of course, there will be cases where the discriminator is not accessible easily, but I don't think that's the common case, and that type of code is highly unsafe anyway, so there's little the language can do to help.

gribozavr · 2020-08-17T12:23:06Z

docs/design/unions.md

+space, and `is_small` would have to be followed by 63 bits of padding in order
+to ensure proper alignment of `large`.
+
+## Layout


An alternative that I'd like us to consider is to by default let the compiler choose the layout, but allow the user to specify the layout exactly. For example, the parts of a single bitfield should be tagged with the name of the bitfield to demand that they are packed together, members that should be laid out at specific offsets should be annotated as such, any padding must be explicitly inserted (or the following field must be annotated as unaligned) etc.

We could demand that code must always explicitly specify the layout in unions.

Do you think explicit layout control specifically needs to be part of this proposal, rather than a separate one?

We could demand that code must always explicitly specify the layout in unions.

Sure, but what problem would that solve?

gribozavr · 2020-08-17T12:42:55Z

docs/design/unions.md

+
+Conversely, C++ is more restrictive in one respect: the members of a C++ union
+must be objects, and the union's alignment must conform to the alignment
+requirements of its member objects. For example, there doesn't seem to be any


I'm not convinced it is important in practice to expose Carbon unions to C++. Most unions of this kind in C++ are typically very well encapsulated (like SsoString above), and expose an API that does not mention unions; so I expect the same to be true in Carbon as well. My suggestion is to represent Carbon unions as opaque byte blobs in C++ if the type does happen to de exported to C++.

Furthermore, since there's a semantic gap (like the fact that Carbon requires using special create and destroy operators to change the active member), I'm not sure that C++ union is really a good match, even if we tried hard to make it work. I think a more fruitful direction would be to expose Carbon unions to C++ as classes with no public data members, and only member functions to manipulate the data. The contract of those member functions would match Carbon's language semantics exactly.

gribozavr · 2020-08-17T12:47:51Z

proposals/p0139.md

+prevents the toolchain from detecting and diagnosing bugs, and may foreclose
+useful optimization opportunities.
+
+These restrictions also substantially simplifies the design of the language: we


Suggested change

These restrictions also substantially simplifies the design of the language: we

These restrictions also substantially simplify the design of the language: we

gribozavr · 2020-08-17T13:01:47Z

proposals/p0139.md

+substantively, there are two cases in which union names seem like they might be
+useful. However, in both cases the actual utility seems limited.
+
+First, it's tempting to think of an inactive union as equivalent to an


Suggested change

First, it's tempting to think of an inactive union as equivalent to an

First, it's tempting to think of an union without an active member as equivalent to an

gribozavr · 2020-08-17T13:03:42Z

proposals/p0139.md

+expectation that an integer variable holds a meaningful value. There is no
+corresponding default expectation that unions will always have an active member,
+so there is no need for special syntax to opt out of such a default.


IDK about that. Whether someone has such expectations depends on a person and their background. I, for example, was surprised that unions allow having no active member. In my opinion, it should be at least an opt-in feature (since not every union will need it).

Revised this passage to hopefully make the argument clearer.

gribozavr · 2020-08-17T13:05:12Z

proposals/p0139.md

+However, that's a false analogy: the defining characteristic of an uninitialized
+or indeterminate value is that we know nothing about its state, whereas we know
+everything there is to know about the state of an inactive union. In particular,


To me it also seems like a distinction without a difference.

To put it another way, the knowledge that a variable is uninitialized never helps you deduce that a program is correct, or deduce anything about the behavior of a correct program.

Certainly it does. A storage location should be uninitialized before the allocation it is a part of is deallocated, and before create, regardless of whether it is a part of a union or not.

gribozavr · 2020-08-17T13:22:52Z

proposals/p0139.md

+
+### `create` and `destroy` are union-only
+
+In this proposal, `create` and `destroy` can only be applied to union members,


There are use cases for leaving local variables and struct fields uninitialized:

var Int32: x = uninitialized; if (...) { create x = 42; } else { create x = 0; }

geoffromer

I've pushed out some fixes I had in progress, but before I respond to the remaining comments I'm going to put together an overview proposal for the design direction this is part of, in order to address some of the feedback about how this fits together with discriminated unions.

geoffromer · 2020-08-17T19:57:34Z

docs/design/unions.md

+
+## Overview
+
+Fields of a struct can be grouped into _unions_. For example:


A union member can have any type, but structs with unions won't get compiler-generated operations. Updated the text to make those points explicit.

geoffromer · 2020-08-17T19:58:09Z

docs/design/unions.md

+}
+```
+
+A union consists of zero or more _members_, at most one of which can be live at


Updated to clarify why the two aren't synonymous.

geoffromer · 2020-08-17T20:25:10Z

docs/design/unions.md

+accessed during pattern matching, and has no effect on whether the pattern
+matches. If the pattern mentions a union member, the corresponding subpattern is
+matched against that member, which means that user code must ensure that pattern
+matching does not reach that point unless that member is active. Note that this


We don't have to document it exactly, we just have to constrain it enough to enable some basic pattern matching usages. In particular, we have to constrain it enough that code for a user-defined type can ensure that pattern matching applies to the discriminator before it applies to the union members.

I'm extremely reluctant to say that unions can't participate in pattern matching, because I think it will actually make pattern matching a much more complicated feature. If unions can't participate in pattern matching, then we have to invent some way for pattern matching on a user-defined type to delegate to non-pattern-matching-based code (i.e. a pretty heavyweight pattern matching customization mechanism), or say that user-defined types that support pattern matching can't be implemented in terms of unions. On the other hand, if we allow pattern matching on unions, we can allow user-defined types to support pattern matching by delegating to pattern matching on their implementation details.

geoffromer · 2020-08-17T21:24:23Z

docs/design/unions.md

+space, and `is_small` would have to be followed by 63 bits of padding in order
+to ensure proper alignment of `large`.
+
+## Layout


Do you think explicit layout control specifically needs to be part of this proposal, rather than a separate one?

We could demand that code must always explicitly specify the layout in unions.

Sure, but what problem would that solve?

geoffromer · 2020-08-17T21:32:10Z

proposals/p0139.md

+However, that's a false analogy: the defining characteristic of an uninitialized
+or indeterminate value is that we know nothing about its state, whereas we know
+everything there is to know about the state of an inactive union. In particular,


A storage location should be uninitialized before the allocation it is a part of is deallocated, and before create, regardless of whether it is a part of a union or not.

I agree, but I don't understand how that relates to what I said.

geoffromer · 2020-08-25T18:47:30Z

proposals/p0139.md

+
+### No type punning
+
+This proposal does not permit accessing an inactive union member. This is unlike


geoffromer · 2020-08-25T18:48:58Z

proposals/p0139.md

+expectation that an integer variable holds a meaningful value. There is no
+corresponding default expectation that unions will always have an active member,
+so there is no need for special syntax to opt out of such a default.


Revised this passage to hopefully make the argument clearer.

geoffromer · 2020-08-25T19:08:11Z

proposals/p0139.md

+
+### `create` and `destroy` are union-only
+
+In this proposal, `create` and `destroy` can only be applied to union members,


This sort of gets back to the discussion of uninit vs. a union with no active member: I've been assuming that = uninit is like default-initialization in C++: it creates the object, but leaves it in a partially-formed state (i.e. a state that can only be assigned to or destroyed). Consequently, it has to be an error to apply create to that object, because you can't create an object that already exists.

I agree that there are important use cases for manually controlling the lifetime of a local variable, but I think you should have to opt into that as part of the variable's declaration, e.g.

struct ManualLifetimeInt32 { union { Int32 object; } } ... var ManualLifetimeInt32: x; if (...) { create: x.object = 42; } else { create: x.object = 0; } ... destroy x.object;

ManualLifetimeInt32 could of course be a template, and might have a nicer API. One variant to explore would be a type where the underlying object starts out live, and is destroyed by the wrapper's destructor, but can be destroyed and recreated in the middle; that seems to be the more common pattern in C++, possibly because C++ makes it easier to write.

Regarding the proposed names, combined with reserved keywords it means that people can't write create or destroy identifiers.

I expect that to be the case regardless, since operator create and operator destroy are planned to be how we spell constructors and destructors.

github-actions · 2021-07-31T01:48:50Z

We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active, please comment or remove the inactive label.
This PR is labeled inactive because the last activity was over 90 days ago. This PR will be closed and archived after 14 additional days without activity.

github-actions · 2021-08-15T01:48:28Z

We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active or becomes active again, please reopen it.
This PR was closed and archived because there has been no new activity in the 14 days since the inactive label was added.

geoffromer added 3 commits August 3, 2020 09:57

Initial draft of unions design.

62fbdc3

Add discussion of safety and interop, and misc. cleanup.

dfbbd28

Merge remote-tracking branch 'upstream/trunk' into union

1cdb79e

geoffromer added the WIP label Aug 7, 2020

googlebot added the cla: yes PR meets CLA requirements according to bot. label Aug 7, 2020

geoffromer added the proposal A proposal label Aug 7, 2020

josh11b reviewed Aug 7, 2020

View reviewed changes

docs/design/unions.md Outdated Show resolved Hide resolved

josh11b reviewed Aug 12, 2020

View reviewed changes

geoffromer added 2 commits August 12, 2020 16:04

Add proposal document, expand discussion of initialization and patter…

092f159

…n matching, update the discussion of safety, and add some section headings.

Fix struct initializer syntax.

51884a0

geoffromer added proposal rfc Proposal with request-for-comment sent out and removed WIP labels Aug 12, 2020

jonmeow reviewed Aug 13, 2020

View reviewed changes

zygoloid reviewed Aug 13, 2020

View reviewed changes

zygoloid reviewed Aug 14, 2020

View reviewed changes

geoffromer added 2 commits August 14, 2020 15:03

Respond to reviewer comments.

96e53db

Respond to reviewer comments.

a7fe9fc

geoffromer commented Aug 15, 2020

View reviewed changes

gribozavr reviewed Aug 17, 2020

View reviewed changes

geoffromer added 2 commits August 25, 2020 11:40

Respond to reviewer comments.

8fcd122

Respond to reviewer comments.

54f3653

geoffromer commented Aug 25, 2020

View reviewed changes

Respond to reviewer comments.

7a11a60

geoffromer added WIP and removed proposal rfc Proposal with request-for-comment sent out labels Sep 11, 2020

jonmeow marked this pull request as draft April 20, 2021 16:19

jonmeow removed the WIP label Apr 20, 2021

github-actions bot added the inactive Issues and PRs which have been inactive for at least 90 days. label Jul 31, 2021

github-actions bot closed this Aug 15, 2021

github-actions bot added the proposal deferred Decision made, proposal deferred label Jul 25, 2022

This was referenced Aug 1, 2022

Equivalent of C++ (nameless) union with nameless structs #1849

Open

Consider using a discriminating "tracking variable" or "discriminating function" for untagged unions in testing/debug builds. #1896

Closed


		### No type punning

		This proposal does not permit accessing an inactive union member. This is unlike


		### `create` and `destroy` are union-only

		In this proposal, `create` and `destroy` can only be applied to union members,

		at the end of a union, but it's not clear that there are any compelling use
		cases for it. In particular, packing at the end of a union is not necessary to

		allowing packing at the end of a union would make it more difficult to tell when
		accesses to two separate fields can cause a data race.

	variable declaration, with `create` taking the place of `var`, and the field
	variable declaration, with `create` taking the place of `var T`, where `T` is the type of the field, and the field

		have the same types, names, and order, and their names are part of the names of
		their fields:

		destruction of an active union member as also destroying any immediately
		preceding bitfields, and then recreating them with the same contents. This


		## Overview

		Fields of a struct can be grouped into _unions_. For example:

	These restrictions also substantially simplifies the design of the language: we
	These restrictions also substantially simplify the design of the language: we

	First, it's tempting to think of an inactive union as equivalent to an
	First, it's tempting to think of an union without an active member as equivalent to an

Initial design for unions #139

Initial design for unions #139

Conversation

geoffromer commented Aug 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonmeow left a comment

Choose a reason for hiding this comment

jonmeow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zygoloid Aug 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zygoloid Aug 13, 2020 • edited Loading

Choose a reason for hiding this comment

zygoloid left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geoffromer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geoffromer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jul 31, 2021

github-actions bot commented Aug 15, 2021

zygoloid Aug 15, 2020 •

edited

Loading

zygoloid Aug 13, 2020 •

edited

Loading