-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal (HARD MODE): Bit vector type (bag-of-bits) #8388
Comments
This is true, but why do you want to do this? Undefined is about eliding copies, and copies are not expensive here. |
See also #7512, #7605 (and possibly #8196) I like most of this proposal. I think having rotations and bit operations like inserting and extracting a set of bits based on a bitmap, extension by zero or the last one defined on them (RISCV has put in a lot of effort to come up with a finite set of sensible ones https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-draft.pdf) is a great idea, but I think they should be @functions like @Rotl, @ROTR. Using << and >> for rotations seems just confusing and shifting bitmasks is a fairly common thing to do. One could define them efficiently such that << on b32 would have a shift amount would only take the last 5 bits into account, i.e. is effectively an integer mod 32, << on b64 would have a shift amount that only takes the last 6 bits into account, i.e. is effectively an integer mod 64 etc. I think enums based on bXX that can be ored (using |) together make perfect sense. |
I really like the bit types and I think they are worth it. However, I have some reserves about If we need something else, let's define something else. There exist a generic shift operation that covers both regular shifts and rotations, but also have some extra use cases: it is a "funnel shift" (term mainly used by Nvidia). Here are some strawman syntax to explain:
|
What about |
This is functionally identical to the existing unsigned API, but better indicates intent for values that are known to be bitmasks. It would also align with any future Zig support for raw bit types: see ziglang/zig#7512 and ziglang/zig#8388.
It seems like this issue is working to improve two distinct functionality gaps:
I like the motivation for each of these, but I wonder if the first issue would be better solved with broader language support for bit-level alignments and packed arrays/vectors. For example:
Either of these would remove the need for If we had a natural way to represent a packed boolean vector, then the operations proposed here seem a lot more natural, and as a nice bonus, they don't require introducing any new types. |
To clarify the intended semantics a bit:
This is equivalent to |
Extracted from discussion in #7693 (such an idea has also shown up in other discussions). cc @daurnimator in particular.
There are two things which Zig is currently unable to represent cleanly:
packed struct
for known-length values, but is incredibly verbose and does not scale)The second point could be solved with a builtin, but the first goes much deeper, effectively requiring a new type. I propose exactly this, or rather a new type family, in the signed/unsigned integer tradition: bit vector, a bag of bits with no arithmetic structure, written as
bXX
(vector ofXX
bits, up to 65535),bsize
(vector of {word width} bits), orbbyte
(vector of {byte length} bits; see #7693). Such a type may be@bitCast
ed to/from an integer type of at least/at most the same length, respectively (as an integer type has a canonical method of extension, whereas a bit vector type does not); it will not coerce either way, see below for explanation.A bit vector may be used with bitwise operators
&
,|
,^
,~
, but may not be used with arithmetic operators+
,-
,*
,/
. Shift operators>>
and<<
take an integer on the right and are interpreted as rotations, i.e. bits shifted off one end are shifted onto the other1. The bits of such a type may be defined or undefined independently:(@as(b8, undefined) & 0xf0) & 0x0f
evaluates to 0 rather thanundefined
as would the analogous expression withu8
.A bit vector may also be indexed or sliced, for bit test/set and packed fields, but only with comptime-known indices:
bXX[n]
(single index) is an assignablebool
,bXX[n..m]
is an assignableb{m - n}
. Concatenation/repetition is also possible2, for instance to construct a repeating bit mask; in this case the bit vector operands need not be comptime-known, but the multiplier must. Bits are numbered in integer significance order, that isv[0]
is the LSB of@bitCast(usize, v)
3.Peer type resolution works reversly from integers: instead of automatically upcasting, a bit vector will automatically downcast; that is,
b5 & b3
will produce ab3
(bits are matched by index:(b5 & b3)[0] == b5[0] and b3[0]
and so forth). This is because, unlike signed and unsigned integers, there is no obvious way to extend a bit vector4.Real Use Cases?
Currently, we use
[*]u8
to represent raw memory, which has two major issues:ubyte
/ibyte
for smallest addressable unit of memory #7693) It assumes 8-bit bytes, and will break on machines where this is not the caseundefined
granularity only extends to the byte level, which causes all kinds of problems for packed data (bit fields, the various uses ofPackedIntArray
)Discussions in multiple places have touched on the idea of a bag-of-bits type to address this issue; c'est ici.
But Why A New Feature?
There is no way, no how, no dice to represent a scalable bit-level-defined value in current Zig. It simply cannot be done. Integers are scalable, but only in bytes (unless you want to pack them, into...), and go all undefined together; packed structs can have definition on any level you like, but are incredibly verbose for this use case and not scalable by any means except perhaps
@Type
jiggery-pokery. This is a tangible, useful feature, exactly suited to Zig's problem domain, and there's just no way to do it without a new feature.On a deeper level, the use of bit vectors will typically produce code on the order of single machine instructions; working around the lack of such hardware-level features with existing features could perhaps be done, but if each resulting operation only takes 3 steps, that's a 3-fold performance hit. The purpose of Zig is to generate machine code; if there are certain basic machine operations that cannot be represented, that's a deficiency.
Why Not Just Rework
&
,|
,^
,~
?Because then every integer will have to track defined/undefined bits in safe modes, and we still don't have rotations.
This Proposal Breaks:
Noisily
bXX
,bsize
,bbyte
as identifiersSilently
Nothing.
Musings
Footnotes
I chose
>>
and<<
to be interpreted as rotations as I believe these to be the only meaningful shift operations on non-numeric bit data. The goal of maintaining commutativity with@bitCast
as well as consistency with bit indices between targets is the primary motivation to index bits in significance order. ↩I was on the fence about including these, but again, they have legitimate use cases and add no language complexity. ↩
In the first draft of this proposal I defined index order in line with platform endianness, i.e. on a big-endian machine
v[0]
would instead be the most significant bit of@bitCast(usize, v)
. This would match packed struct behaviour, and increasing indices would be stored at non-decreasing byte addresses; however there would then be horrific inconsistencies with rotations and concatenations, see below. ↩This, together with the lack of a meaningful way for a bit vector value to "overflow", means there is no need for a
comptime_bit_vector
type. ↩The text was updated successfully, but these errors were encountered: