-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discriminant bits #2684
Discriminant bits #2684
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth calling out how this interacts with explicit values that are larger than the smallest possible values, as well as explicit reprs. e.g. if I write
enum Foo {
Bar = 7,
Baz = 8,
}
technically the compiler still only needs 1 bit for the discriminant. We only need to convert to the larger values on cast. What's the return value of these methods in this case? Similarly if I write #[repr(u8)]
, is bit_size
always 8?
|
||
Adding the proposed functions probably entails adding a new compiler intrinsic `discriminant_size`. | ||
|
||
Empty enums are of size 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about enums with a single variant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously size zero.
As I see it, empty enums are specified in the proposal simply as an acknowledgement that they were taken into account. From a mathematical standpoint, they take NEG_INFINITY
bytes. This proposal has chosen not to do this, and to treat them as ZSTs (which indeed aligns with some other parts of the language).
[reference-level-explanation]: #reference-level-explanation | ||
|
||
The feature may interact with non-exaustive enums. | ||
In this case, still, the currently used discriminant size should be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this allow people to couple themselves to the number of variants in an enum which was explicitly requested to be non-exhaustive (to avoid that coupling)?
I think when talking about all this it is useful to separate:
As an example of these things being separate, we have precedent in the
So far the This RFC should expand on the motivation for |
Similarly, `std::mem::size_of<Discriminant<Cell>>()` is at least 1 byte. | ||
For that reason, the book later goes on and replaces `Vec<Cell>` by [`fixedbitset`][game-of-life-exercise], ending up with a much less intuitive implementation. | ||
|
||
If it were possible to read the exact necessary size and the bit representation the descriminant, we could have a `PackedBits<T>` that uses exactly as much space as necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it were possible to read the exact necessary size and the bit representation the descriminant, we could have a `PackedBits<T>` that uses exactly as much space as necessary. | |
If it were possible to read the exact necessary size and the bit representation of the discriminant, we could define a type `PackedBits<T>` that uses exactly as much space as necessary. |
text/0000-discriminant-bits.md
Outdated
`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant. | ||
|
||
```rust | ||
const fn bit_size() -> usize { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is a method then self
is needed somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a static method as far as I can tell. The runtime value of Discriminant
is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK; It's not clear from the context that it is using the type variable T
from Discriminant<T>
; this should be made clear. Moreover, it is not a method because there's no such thing as a "static method" anymore. These are associated non-method functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've suggested to make this an associated constant, btw.
|
||
## Disciminant data | ||
|
||
`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant. | |
`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent an enum's discriminant. |
- Why is this design the best in the space of possible designs? | ||
- What other designs have been considered and what is the rationale for not choosing them? | ||
- What is the impact of not doing this? | ||
- `from_data` and `into_data` could instead be straight `From/Into` implementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why isn't it? As aforementioned there may be specific guarantees and non-guarantees we want to make which makes From
and Into
less apt in terms of communication of those guarantees to/with users.
- What other designs have been considered and what is the rationale for not choosing them? | ||
- What is the impact of not doing this? | ||
- `from_data` and `into_data` could instead be straight `From/Into` implementations | ||
- Alternatively, `from/into_bits` could return a `Bits<T>` type with a richer interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would that richer interface be like?
# Future possibilities | ||
[future-possibilities]: #future-possibilities | ||
|
||
The feature is self-contained and I don't see direct extensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature is self-contained and I don't see direct extensions. | |
The feature is self-contained and there are no direct extensions. |
text/0000-discriminant-bits.md
Outdated
|
||
Using these enums in collections is wasteful, as each instance reserves at least 1 byte of space. | ||
Similarly, `std::mem::size_of<Discriminant<Cell>>()` is at least 1 byte. | ||
For that reason, the book later goes on and replaces `Vec<Cell>` by [`fixedbitset`][game-of-life-exercise], ending up with a much less intuitive implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there is an implementation that works, but it is less intuitive; is that important? And is a more intuitive implementation sufficient justification to guarantee things I've mentioned below for all time?
Is PackedBits<T>
more efficient than fixedbitset
-- substantially so?
|
||
If it were possible to read the exact necessary size and the bit representation the descriminant, we could have a `PackedBits<T>` that uses exactly as much space as necessary. | ||
|
||
This allows for an efficient representation of discriminant sets, which is both useful for simple enums, but also for crating an index of all discriminant values present in collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows for an efficient representation of discriminant sets, which is both useful for simple enums, but also for crating an index of all discriminant values present in collection. | |
This allows for an efficient representation of discriminant sets, which is both useful for simple enums, but also for creating an index of all discriminant values present in collection. |
On Mon, Apr 15, 2019 at 09:07:42AM -0700, Sean Griffin wrote:
+Empty enums are of size 0.
How about enums with a single variant?
Enums with a single variant should have a 0-bit discriminant as well.
|
text/0000-discriminant-bits.md
Outdated
`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant. | ||
|
||
```rust | ||
const fn bit_size() -> usize { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a static method as far as I can tell. The runtime value of Discriminant
is not needed
const fn bit_size() -> usize { } | ||
``` | ||
|
||
This number is not subject to optimisation, so e.g. `Option<&str>` reports a bitsize of `1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While being called "enum layout optimization", it's not really an optimization, but a clear set of rules of how the discriminant is represented.
What is the use-case for knowing the bitsize for enums whose variants have fields? I'm wondering if it would make more sense to have bit_size
return Option<usize>
in order to only return a bit_size
where an actual tag
field exists.
text/0000-discriminant-bits.md
Outdated
`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant. | ||
|
||
```rust | ||
const fn bit_size() -> usize { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure usize
is the best. Other places use u32
when dealing with bit counts.
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
## Disciminant data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Disciminant data | |
## Discriminant data |
`Discriminat<T>` gains the methods `into_bits` and `from_bits`: | ||
|
||
```rust | ||
fn into_bits(&self) -> u128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I first saw this, I had to wonder what crazy kind of code would have an enum with more than 2^64
variants. Maybe #[repr(u128)]
should be mentioned as justification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind it's not just about having > 2^64
variants. You can have much fewer and use enum Foo { ..., VariantN = DiscrimExprN, ... }
to use up a whole lot of bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was my point. I had to think for a while before I remembered that explicit discriminants can be assigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddyb had actually recommended using that size, as that would be the final width of the internal Discriminant
value.
This should mention the semantics for negative discriminants. I see three options for
Similarly, there is a question for
My thoughts are:
|
I thought of an option 4, which seems to be most in spirit with the RFC:
With this option, This would basically use option 3 as the implementation, but without explicitly specifying it. |
We do have a |
Seems like this might be better handled by a "bit-packed serialization trait" implemented via custom or built-in derive that would give you the number of bits required to serialize a type and allow to (de)serialize a value into a bit stream. In particular, such an approach would work for non-C-like enums and would also allow to recursively pack data. |
The problem with that is that it's very expensive. I believe the motivation of this RFC is to have an essentially zero cost operation (read one integer from memory) that gives you the discriminant. |
There was a mention in my generic integers RFC about the concept of "bit sizes" for types, which could be used to make bit fields work. This seems like a special case of that to enums which is a bit odd to me. |
Hi @skade I know I've let this lie untouched for a while. There has been a fair amount of feedback written in the comments here. The feedback includes a number of questions about the semantics of this feature, especially when it is combined with enums that assign explicit values to the discriminant. Do you have plans to update the RFC text to incorporate the above feedback? if not, should we see if any other community member wants to drive this effort forward? |
@pnkfelix Thanks for asking! I have plans for updating the RFC, but currently lacking a bit of bandwidth. I would love to have another community member as a peer to drive the process. |
Co-Authored-By: Joe Clay <27cupsofcoffee@gmail.com>
Co-Authored-By: Joe Clay <27cupsofcoffee@gmail.com>
Co-Authored-By: Mazdak Farrokhzad <twingoow@gmail.com>
Co-Authored-By: Mazdak Farrokhzad <twingoow@gmail.com>
I spoke to @regexident and some others this week and the realisation is that a lot of people would like info from enums, but not consistently the same. For that reason, I'm going to close this RFC and maybe redraft it if I manage to get a clear design that fits multiple peoples needs. |
Summary
This RFC proposes to expose the minimum size necessary to encode the discriminant of an enum, without exposing the exact encoding itself. This can be useful to write bitlevel collections.
Thanks @joshtriplett @Manishearth @eddyb and VLQC for early review <3.
Some details, especially naming, are very much bikesheddable.