Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

organize the concepts of enum and union #618

Closed
andrewrk opened this issue Nov 17, 2017 · 17 comments
Closed

organize the concepts of enum and union #618

andrewrk opened this issue Nov 17, 2017 · 17 comments
Labels
accepted This proposal is planned. breaking Implementing this issue could cause existing code to no longer compile or have different behavior. enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Nov 17, 2017

  • Right now we have enum which can be "dumb" enums where it's just named number values.
  • enum can also have associated types, which makes it a "tagged union". This is very close to unions, the only difference is that with unions the tag is a secret field only used by debug safety. It's silly that the initialization syntax for these kind of enums is different than for unions. (and the union init syntax is better since it mirrors structs.)
    • extern enum does not make sense in this case
  • union works like a C union, except we have debug safety to make sure you don't access the wrong field
    • extern union maps to a C union, and disables the safety field.

Bottom line: enum does too much and it acts too much like union. It violates "only one way to do things".

Solution:

  • enum is only for "dumb" enums. you can specify the tag type and tag values.
  • Add enum union which is a union that always has the tag field. extern enum union works, it makes (in C) struct Foo { enum { ... } tag; union { ... } payload; }. Init syntax is the same as union.
  • You can specify the integer tag type and integer tag values just like enum.
  • Allow switch to work with enum union, like how it currently does with enums with payloads.
  • Types cannot be left off of enum union. You have to use void when you want void.
  • Initialization of an enum union looks exactly like a union. You might have to use Foo { .field = {} } for void types.
  • enum union creates a sub-type which is a dumb enum type. It's the type of the tag. You get a value of this type if you do Foo.field.
  • A enum union can implicitly cast to its enum tag type. This means you can do e.g. foo == Foo.field.

Now there's only one way to do things. If you need a dumb enum, use enum. Otherwise if enum union fits your use case, use that. Otherwise, use the flexibility that union provides.

@andrewrk andrewrk added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. enhancement Solving this issue will likely involve adding new logic or components to the codebase. labels Nov 17, 2017
@andrewrk andrewrk added this to the 0.3.0 milestone Nov 17, 2017
@PavelVozenilek
Copy link

union works like a C union, except we have debug safety to make sure you don't access the wrong field

Low level data fiddling is big use case for C unions. "Debug safety" would be harmful here. One example is using tagged pointers.

In some situations the tag selector is already present somewhere else.

@andrewrk
Copy link
Member Author

Zig unions with safety are compatible with a tag selector being somewhere else. The safety tag is omitted in release-fast builds.

You're going to have to elaborate on exactly how safety would be harmful because it prevents only what would be undefined behavior. I suspect that the "low level data fiddling" you are thinking of would be a Type Based Alias Analysis violation.

@thejoshwolfe
Copy link
Contributor

Or are we talking about a situation where the exact size and layout of the fields is important, in which case use a packed something.

@PavelVozenilek
Copy link

I am guessing (it is not documented) that if there are safety checks one could not do the "casting via union" as in C.

Example: advanced allocator with memory guards around the block and with lot of metadata inside. This can be manipulated by incrementing/decrementing pointer, but as this way is closed in Zig, raw unions may be usable here.

Another example is easy extraction of lowest bits from a tagged pointer and the true pointer value itself.

I think all of above could be accomplished via @ptrToInt casting and back, but why to make it more cumbersome than it needs to be?

@andrewrk
Copy link
Member Author

andrewrk commented Nov 17, 2017

"casting via union" as in C.

This is undefined behavior, at least in C99.

It's not more cumbersome than it needs to be. You modify which field of the union is active by assigning a new value to the entire union, rather than a specific field. The use case with an advanced allocator is handled, no problem.

@PavelVozenilek
Copy link

I'll probably go the @ptrToInt / @intToPtr way, for allocators and containers. It still looks as the easiest one.

In this thread I got concerned by the emphasize on safety by limiting expressibility. Things like leaks, runaway pointers or nulls can be caught trivially, complex code is more defiant to change.


Couple of ideas about the enums:

  1. Enums could be defined inline, like:
fn foo() -> enum { This, That}  { .. }
var x : foo.return_type = foo();

fn bar( x : enum { A, B, C }) { ... }
var x : bar.x  = ...
bar(x);

No need to invent yet another name, and the enum definition is next to supposed use, not somewhere far away, prone to misuse.

  1. Enum could include numbers and/or numeric ranges:
fn get_bits_size() -> enum { 8, 16, 32, 64, NOT_APPLICABLE } { ... }

fn amazonian_indian_counting() -> enum { 1 .. 3, MANY, DONT_KNOW } { ... }

@kyle-github
Copy link

Could someone provide some examples? I think I get it, but... I have been wrong more than once :-)

Other languages show sum types with different syntax:

const MyType = i32 | f64 | foo;

So the solution above from @andrewrk is going to be:

const MyType = enum union {
       i_val: i32;
       f_val: f64;
       foo_val: foo;
};

var a: MyType = { .i_val = 16 };
var b: MyType = { .f_val = 3.14159 };

a = b; <--- fails to compile?

const ASTNode = enum union {
      name: NameNode;
      number:NumberNode;
      ....
};

Is that close?

How does switch work with this? The ASTNode example is pretty common.

@PavelVozenilek you have some good examples there. I have made use of tagged pointers in some code (particularly when implementing VMs such as in Smalltalk or Java). While it is not portable to use unions, since the code is supposed to be very, very tied to a specific machine type, this is not a problem in practice.

One thing I would like to see is the ability to convert something into a bit array such that I can use it to access specific bits directly. Then tagged pointers would be a breeze.

@hasenj
Copy link

hasenj commented Nov 23, 2017

What will happen to the implicit error union types, e.g. %u8?

@andrewrk
Copy link
Member Author

andrewrk commented Dec 1, 2017

Instead of enum union, we'll have this:

const Letter = enum {
    A,
    B,
    C,
};
const Payload = union(Letter) {
    A: i32,
    B: f64,
    C: bool,
};

This gives the union a tag field, which has the type of the enum given.

Like a switch statement on an enum value, if you fail to enumerate all the enum fields in a union declaration, it is a compile error.

union(T) implicitly casts to T.

So,

What will happen to the implicit error union types, e.g. %u8?

Error unions are independent from this issue.

@PavelVozenilek
Copy link

In the example:

const Letter = enum {
    A,
    B,
    C,
};
const Payload = union(Letter) {
    A: i32,
    B: f64,
    C: bool,
};

Has the Letter type some separate use? Like being able to extract the tag value, do switch on the union's tag ( not on the data type), getting and using tag's datatype?

@jido
Copy link

jido commented Dec 1, 2017

Does the enum have to be defined ahead of time or can it be inferred from the union definition, as in:

const Payload = union(enum) {
    A: i32,
    B: f64,
    C: bool,
}

Also this syntax does not give name to the alternatives, is that intentional? Can you do
const x: Payload = 5.35; const y: f64 = x;
?

@andrewrk
Copy link
Member Author

andrewrk commented Dec 1, 2017

The enum type has a separate use:

  • you can explicitly cast the union type to the enum type to get the tag value.
  • You can check for tag equality
  • you can assign custom tag values in the enum
const x = Payload { .B = 5.35 };
const y = x.B;
assert(Letter(x) == Letter.B);

At least for now, the enum will have to be defined ahead of time.

@andrewrk andrewrk added the accepted This proposal is planned. label Dec 3, 2017
@andrewrk
Copy link
Member Author

andrewrk commented Dec 3, 2017

Also make it so that enums support signed integer tag types.

@andrewrk
Copy link
Member Author

andrewrk commented Dec 3, 2017

On second thought, I'm going to accept @jido's proposal. So you can do:

const Payload = union(enum) {
    A: i32,
    B: f64,
    C: bool,
}

And this automatically creates the enum. You can also do:

const Payload = union(enum(u32)) {
    A: i32 = 3,
    B: f64 = 10,
    C: bool = 100,
}

And now you've configured the integer tag type of the enum, and specified the tag values. Then you can access the automatically created enum type with @TagType(Payload), and you can access the integer type u32 with @TagType(@TagType(Payload)).

A union(enum) with all fields void is equivalent to enum.

@jido
Copy link

jido commented Dec 3, 2017

Ha, and what if you want to give a default value to the members of the union? That is what it looks like with the latter syntax. It is confusing.

@PavelVozenilek
Copy link

PavelVozenilek commented Dec 3, 2017

This syntax

const Payload = union(enum(u32)) {
    A: i32 = 3,
    B: f64 = 10,
    C: bool = 100,
}

feels really strange.

Edit: the visual pattern

name : type = value

should be reserved for variable/const definition with initialization.

@andrewrk
Copy link
Member Author

andrewrk commented Dec 3, 2017

Ha, and what if you want to give a default value to the members of the union?

You can't do that. Neither structs nor unions support giving a default value to a field.

That is what it looks like with the latter syntax. It is confusing.

The other option we have is: if you want to change the enum tag values, you would have to specify the enum separately.

andrewrk added a commit that referenced this issue Dec 4, 2017
andrewrk added a commit that referenced this issue Dec 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. breaking Implementing this issue could cause existing code to no longer compile or have different behavior. enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Projects
None yet
Development

No branches or pull requests

6 participants