Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discriminant bits #2684

Closed
wants to merge 12 commits into from
Closed

Discriminant bits #2684

wants to merge 12 commits into from

Conversation

skade
Copy link
Contributor

@skade skade commented Apr 15, 2019

Summary

This RFC proposes to expose the minimum size necessary to encode the discriminant of an enum, without exposing the exact encoding itself. This can be useful to write bitlevel collections.

Thanks @joshtriplett @Manishearth @eddyb and VLQC for early review <3.

Some details, especially naming, are very much bikesheddable.

text/0000-discriminant-bits.md Outdated Show resolved Hide resolved
text/0000-discriminant-bits.md Outdated Show resolved Hide resolved
Copy link
Contributor

@sgrif sgrif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth calling out how this interacts with explicit values that are larger than the smallest possible values, as well as explicit reprs. e.g. if I write

enum Foo {
    Bar = 7,
    Baz = 8,
}

technically the compiler still only needs 1 bit for the discriminant. We only need to convert to the larger values on cast. What's the return value of these methods in this case? Similarly if I write #[repr(u8)], is bit_size always 8?


Adding the proposed functions probably entails adding a new compiler intrinsic `discriminant_size`.

Empty enums are of size 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about enums with a single variant?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously size zero.

As I see it, empty enums are specified in the proposal simply as an acknowledgement that they were taken into account. From a mathematical standpoint, they take NEG_INFINITY bytes. This proposal has chosen not to do this, and to treat them as ZSTs (which indeed aligns with some other parts of the language).

[reference-level-explanation]: #reference-level-explanation

The feature may interact with non-exaustive enums.
In this case, still, the currently used discriminant size should be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this allow people to couple themselves to the number of variants in an enum which was explicitly requested to be non-exhaustive (to avoid that coupling)?

@SimonSapin
Copy link
Contributor

I think when talking about all this it is useful to separate:

  • The discriminant: the return value of std::mem::discriminant, a value that uniquely identifies a variant within an enum type. This has a dedicated Discriminant type that, so far, intentionally reveals very little. Not even that this is an integer.

  • The tag: this is part of the memory representation of #[repr(C)] and #[repr($Int)] enums. It may or may not be part of the memory representation of other enums. (E.g. it is not when the niche optimization is used, like in Option<&str>.)

  • The conversion to integer with as, for enums without any variant field (a.k.a “C-like” enum).

As an example of these things being separate, we have precedent in the bool type of a documented gurantee that conversion with as produces 1 for true and 0 for false, while making no guarantee about the memory representation.

I think it's worth calling out how this interacts with explicit values that are larger than the smallest possible values

So far the Variant = 7 in enum declaration is used for conversion to integer with as, and for the tag (with #2363).

This RFC should expand on the motivation for Discriminant::bit_size and Discriminant::into_bits and decide: should they be consistent with the (memory representation’s) tag? With the integer conversion with as?

text/0000-discriminant-bits.md Outdated Show resolved Hide resolved
Similarly, `std::mem::size_of<Discriminant<Cell>>()` is at least 1 byte.
For that reason, the book later goes on and replaces `Vec<Cell>` by [`fixedbitset`][game-of-life-exercise], ending up with a much less intuitive implementation.

If it were possible to read the exact necessary size and the bit representation the descriminant, we could have a `PackedBits<T>` that uses exactly as much space as necessary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If it were possible to read the exact necessary size and the bit representation the descriminant, we could have a `PackedBits<T>` that uses exactly as much space as necessary.
If it were possible to read the exact necessary size and the bit representation of the discriminant, we could define a type `PackedBits<T>` that uses exactly as much space as necessary.

`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant.

```rust
const fn bit_size() -> usize { }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is a method then self is needed somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a static method as far as I can tell. The runtime value of Discriminant is not needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK; It's not clear from the context that it is using the type variable T from Discriminant<T>; this should be made clear. Moreover, it is not a method because there's no such thing as a "static method" anymore. These are associated non-method functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've suggested to make this an associated constant, btw.

text/0000-discriminant-bits.md Outdated Show resolved Hide resolved

## Disciminant data

`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant.
`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent an enum's discriminant.

- Why is this design the best in the space of possible designs?
- What other designs have been considered and what is the rationale for not choosing them?
- What is the impact of not doing this?
- `from_data` and `into_data` could instead be straight `From/Into` implementations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why isn't it? As aforementioned there may be specific guarantees and non-guarantees we want to make which makes From and Into less apt in terms of communication of those guarantees to/with users.

- What other designs have been considered and what is the rationale for not choosing them?
- What is the impact of not doing this?
- `from_data` and `into_data` could instead be straight `From/Into` implementations
- Alternatively, `from/into_bits` could return a `Bits<T>` type with a richer interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would that richer interface be like?

# Future possibilities
[future-possibilities]: #future-possibilities

The feature is self-contained and I don't see direct extensions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The feature is self-contained and I don't see direct extensions.
The feature is self-contained and there are no direct extensions.


Using these enums in collections is wasteful, as each instance reserves at least 1 byte of space.
Similarly, `std::mem::size_of<Discriminant<Cell>>()` is at least 1 byte.
For that reason, the book later goes on and replaces `Vec<Cell>` by [`fixedbitset`][game-of-life-exercise], ending up with a much less intuitive implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is an implementation that works, but it is less intuitive; is that important? And is a more intuitive implementation sufficient justification to guarantee things I've mentioned below for all time?

Is PackedBits<T> more efficient than fixedbitset -- substantially so?


If it were possible to read the exact necessary size and the bit representation the descriminant, we could have a `PackedBits<T>` that uses exactly as much space as necessary.

This allows for an efficient representation of discriminant sets, which is both useful for simple enums, but also for crating an index of all discriminant values present in collection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This allows for an efficient representation of discriminant sets, which is both useful for simple enums, but also for crating an index of all discriminant values present in collection.
This allows for an efficient representation of discriminant sets, which is both useful for simple enums, but also for creating an index of all discriminant values present in collection.

@Centril Centril added A-discriminant Discriminant related proposals & ideas A-repr #[repr(...)] related proposals & ideas T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Apr 15, 2019
@joshtriplett
Copy link
Member

joshtriplett commented Apr 15, 2019 via email

`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant.

```rust
const fn bit_size() -> usize { }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a static method as far as I can tell. The runtime value of Discriminant is not needed

const fn bit_size() -> usize { }
```

This number is not subject to optimisation, so e.g. `Option<&str>` reports a bitsize of `1`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While being called "enum layout optimization", it's not really an optimization, but a clear set of rules of how the discriminant is represented.

What is the use-case for knowing the bitsize for enums whose variants have fields? I'm wondering if it would make more sense to have bit_size return Option<usize> in order to only return a bit_size where an actual tag field exists.

`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant.

```rust
const fn bit_size() -> usize { }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure usize is the best. Other places use u32 when dealing with bit counts.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

## Disciminant data
Copy link

@sftim sftim Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Disciminant data
## Discriminant data

`Discriminat<T>` gains the methods `into_bits` and `from_bits`:

```rust
fn into_bits(&self) -> u128
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first saw this, I had to wonder what crazy kind of code would have an enum with more than 2^64 variants. Maybe #[repr(u128)] should be mentioned as justification?

Copy link
Contributor

@Centril Centril Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind it's not just about having > 2^64 variants. You can have much fewer and use enum Foo { ..., VariantN = DiscrimExprN, ... } to use up a whole lot of bits.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was my point. I had to think for a while before I remembered that explicit discriminants can be assigned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyb had actually recommended using that size, as that would be the final width of the internal Discriminant value.

@ExpHP
Copy link

ExpHP commented Apr 17, 2019

This should mention the semantics for negative discriminants. I see three options for into_bits():

  1. Sign extension: -1i8 becomes -1i8 as i128 as u128, or 0xFFFF_FFFF_..._FFFF_FFFF..
  2. Zero extension of repr: -1i8 becomes (-1i8).into_bits() as u128, or 0x0000_0000_..._0000_00FF.
  3. Zero extension of bit_size bits: Like above, but we only keep bit_size bits. -1i8 becomes 0b00000000_..._00000111 if bit_size() == 3.

Similarly, there is a question for from_bits:

  • It could be strict and require the higher order bits to match into_bits().
  • It could be permissive and ignore the bits beyond bit_size.

My thoughts are:

  • Sign extension (option 1) is consistent with the behavior of Enum::Variant as u128.
  • Consider the use case of packing several enums into a bitset. Options 1 and 2 share a footgun when encoding a bunch of discriminants into a packed form. (one must truncate the higher order bits).
    • This makes option 2 worthless compared to option 3, so I won't consider it further.
  • If from_bits is strict, then option 1 also has an even bigger footgun when decoding a bunch of discriminants from a packed form. (one must manually sign-extend. Yikes!)
  • If from_bits is permissive, this may come with a performance cost.

@ExpHP
Copy link

ExpHP commented Apr 17, 2019

I thought of an option 4, which seems to be most in spirit with the RFC:

  • into_bits() is opaque even for C-like enums. There is no guarantee that the value returned corresponds in any way to the integral constants assigned to the enum variants. However, it is guaranteed that only the bit_size most insignificant bits are nonzero.

With this option, from_bits should be strict.

This would basically use option 3 as the implementation, but without explicitly specifying it.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 17, 2019

There is no guarantee that the value returned corresponds in any way to the integral constants assigned to the enum variants

We do have a VariantIdx in the compiler (which is a u32 starting at 0). This would be much more condensed, but the implementation of into_bits would not be trivial anymore, making into_bits and from_bits require loads of logic and probably lookup tables.

@bill-myers
Copy link

Seems like this might be better handled by a "bit-packed serialization trait" implemented via custom or built-in derive that would give you the number of bits required to serialize a type and allow to (de)serialize a value into a bit stream.

In particular, such an approach would work for non-C-like enums and would also allow to recursively pack data.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 20, 2019

Seems like this might be better handled by a "bit-packed serialization trait" implemented via custom or built-in derive that would give you the number of bits required to serialize a type and allow to (de)serialize a value into a bit stream.

The problem with that is that it's very expensive. I believe the motivation of this RFC is to have an essentially zero cost operation (read one integer from memory) that gives you the discriminant.

@clarfonthey
Copy link
Contributor

There was a mention in my generic integers RFC about the concept of "bit sizes" for types, which could be used to make bit fields work. This seems like a special case of that to enums which is a bit odd to me.

@pnkfelix
Copy link
Member

Hi @skade

I know I've let this lie untouched for a while.

There has been a fair amount of feedback written in the comments here. The feedback includes a number of questions about the semantics of this feature, especially when it is combined with enums that assign explicit values to the discriminant.

Do you have plans to update the RFC text to incorporate the above feedback? if not, should we see if any other community member wants to drive this effort forward?

@skade
Copy link
Contributor Author

skade commented Nov 4, 2019

@pnkfelix Thanks for asking! I have plans for updating the RFC, but currently lacking a bit of bandwidth. I would love to have another community member as a peer to drive the process.

skade and others added 4 commits November 25, 2019 13:59
Co-Authored-By: Joe Clay <27cupsofcoffee@gmail.com>
Co-Authored-By: Joe Clay <27cupsofcoffee@gmail.com>
Co-Authored-By: Mazdak Farrokhzad <twingoow@gmail.com>
Co-Authored-By: Mazdak Farrokhzad <twingoow@gmail.com>
@skade
Copy link
Contributor Author

skade commented Nov 29, 2019

I spoke to @regexident and some others this week and the realisation is that a lot of people would like info from enums, but not consistently the same.

For that reason, I'm going to close this RFC and maybe redraft it if I manage to get a clear design that fits multiple peoples needs.

@skade skade closed this Nov 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-discriminant Discriminant related proposals & ideas A-repr #[repr(...)] related proposals & ideas T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.