-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more math friendly definition of shift operators #7605
Comments
This proposal gives up a lot of safety for absolutely no gain. Making the shift direction change depending on the operand sign is another crazy idea, don't you think having a And performance-wise the extra checks aren't exactly free. |
Making the Besides, the Otherwise I think the generic shift operations should be fairly "stupid" and fast. I don't think they should be clever and try to fix things.
If you are compiling in a mode with runtime safety then:
The check of course disappears if you compile in a mode with no safety checking. But if you are doing that then you are throwing your math safety checking anyway so whatever. The only downside I see of removing the explicit |
The unease about the definition of << (and >>) stems from the schizophrenic interpretation of status quo integers as simultaneous arithmetic integers, bit strings and modular Integers, which give different answers as to what is the "right" definition. TL;DR There is no mathematically "better" definition of the shift operators. Different interpretations of what a bit string means give different answers. Restricting the shifts to operations on bit strings only allows for a definition without undefined behaviour that is easy on modern hardware. For the common use of shifts as multiplication by a power of 2 one should introduce a power operation and prefer an explicit multiplication by a power of 2 over a shift (using ^ for "to the power").
SLLW rd rs1 rs2 // rd = as(u32, rs1 << (rs2 &31)) is defined such that the shift amount n (in register rs2) only depends on n mod 32 so there is no problem with n (rs2) being too large, and it will happily ignore bits 32 to 63 (which in RV are sign extended from bit 31)
Edit: there is a sense in which a "clean sweep" shift (i.e. math.shl) is better mathematically even for bit vectors: if (x << n ) << m == x << (n + m) |
@RogierBrussee The number representation, shift operations and modulus are quite directly mapping to hardware instructions. If you use an exponent of 2, this is not anymore directly and explicit readable. For example, you could have an exponent that 1.is a power of two or 2.is not a power of two. So this would require an extra lookup of the potential exponent, which is against the zig zen:
So one does want the common+efficient hardware instructions be explicit from the code, because zig is a system programming language (doing stuff efficient). |
@matu3ba I don't really understand what you mean. If your intent is to use a as a number and express multiplication by a power of 2, then the expression a * 2^n is visibly multiplication by a power of 2 (not a power of 3 nor any other number). IMHO it communicates intent rather better than a << n , even though the programmer is expected (but not required) to know that the compiler will optimise it to a shift. Thus the expression a *2^n is in no way slower than doing the compiler trick of using the shift instruction yourself at least in optimised builds, you don't risk risking falling into the trap of the low operator precedence of <<, .and the vagaries of undefined behaviour are for the compiler to sort out. If on the other hand the intent is to manipulate a as a string of bits which should be shifted n places, then there is a: b32 and the expression a << n is entirely appropriate, well defined, and without undefined behaviour. Finally, if you really, really feel there is value in mingling arithmetic and bit operations on the same datatype (e.g for porting code from C) I proposed c_uint32 for exactly that. |
|
I would like to add my 3 cents to this discussion. I agree with @RogierBrussee numbers and bits are a different thing.
|
I just came across this as a new user trying out zig for Advent of Code puzzles. I found it extremely un-ergonomic to have to cast to a non-standard integer bitwidth in order to perform a shift. My "not overthinking it" proposal would be to allow shifting a u32 by another u32 and have it be undefined behavior (i.e. runtime panic by default) to shift by a value greater than the bitwidth. I'd note that this is already effectively the behavior for non-power-of-two bitwidths on the left hand side:
This code results in a runtime panic: |
If you are unsure about stuff, ask in one of the community channels. |
If you want to do stuff optimal on the hardware, you need to understand how things are represented and operated on in memory.
This is not possible. You cant do formula replacement to optimally code, if intermediate results overflowing is well defined.
Can you please elaborate how this justifies the added complexity in relation to my arguments above? The number representation is identical to u32, so the point remaining is convenience (to differentiate between "I dont want this type to use shifts(<<)" and "go fast". The same effect could be however reached, if one makes RHS expressions being adjusted by =SAFETYOPTION as I suggested here.
At some point you want to mix both convenience types and you get the same problem again with yet more complexity. |
A hypothetical to explain my argument: with the logic of the current shift operator, we don't allow My point here is that for I'll note that I wouldn't mind having other shift operator variants to help communicate other intents, similar to having |
Agreed.
Always is inevitably a vast oversimplification. In particular see your own words below.
First I quote from the blog yo post "I fully agree with Yodaiken that C has a problem, and that reliably writing C has become incredibly hard since undefined behavior is so difficult to avoid. It is certainly worth reducing the amount of things that can cause UB in C, and developing practical tools to detect more advanced kinds of UB such as strict aliasing violations." The argument that undefined behaviour may be sort of unavoidable in some situations does not make it good. Now more to the point of the question. Whether shifts should have undefined behaviour for shifts depends on the interpretation of your 32 bits . That is the point of my remarks on 13 march. Having distinct datatypes for distinct interpretations : as signed or non negative integers, modular (aka wrapping) integers (like C's unsigned int) or a bitfield give different interpretation of what a shift over more than the (bitlength -1) should be. In particular I agree with @bnprks that if u32 is interpreted as a poor mans approximation of a non negative integer you may as well make it undefined behaviour, to shift over 32 or more bits, and so it is OK (up to undefined behaviour!!!!!) to say (a << b) == (a << (b&31)) The right hand side is what essentially all hardware supports natively (Arm32 would prefer (a << b ) == (a << (b&255) )) a * 2^b. (here 2^b is 2 to the power b NOT 2 xor b) with operator precedence rules and notation that one can expect from high school and visible overflow that reflects the intended use. If you interpret u32 as a modular (aka wrapping) integer there is only one sane way to interpret a << b without any undefined behaviour: multiplication by 2^b mod 2^bitlength aka clean sweep shift. (a << b) == (b < 32)? (a << (b & 31)) : 0 ( == ( -(b< 32))& (a << (b &31)) ) I, and it seems @bnprks, think one should use a different notation for this operatin e.g. (a <<% b). However, I would prefer even more to have separate types m32 for a modular (aka wrapping) 32 integer mod 2^32 and write a *% (2^%b), (which again is neither less nor more efficient than the above)). If you interpret u32 as a bitfield the bitfield equivalent of a <<% b (i.e. clean sweep and a << (b &31) are both sane and without undefined behaviour, but the former is slightly saner and the latter slightly more efficient. I would just prefer to have a separate type bitfield32 or b32 or whatever. I think talk about complexity is misguided here. Things are equally efficient but conceptually simpler , especially since it clears up undefined behaviour.
|
@bnprks Your wish for consistency is planned to be done. @SpexGuy : my additional question: SpexGuy: |
I dont understand the problem. You can use |
I think this is the way zig should go. I see #3806 as a prerequisite though. I think we should lock this issue until #3806 is either accepted+implemented or rejected. |
I have no clue what 'a <<| b' is supposed to mean. If "wrap around shifting" is another name for rotating, it is a very useful operation (on bitfields, not on numbers, not on modular numbers!!) available natively on near all CPU's, but using the notation <<% is inconsistent with +%, -%, and *% if for x: u32, n: u32 the expression (x << n) can be interpreted as then (x <<% n) should mean clean sweep shift, because if % means "use the corresponding operation but interpret it as being done using modular (aka wrapping) arithmetic (like your CPU does)" the sensible meaning is (x <<% n) == "(x times 2 to the power n) modulo 2 to the power 32" NOTE1: Defined in this way, there is no undefined behaviour in <<%, and it has a clear mathematical meaning. Now why would I prefer to write (x << n) for numbers in u32 as x*2^n ( x times (2 to the power n)) and x *% 2 ^% n
Quick quiz1: what should be undefined behaviour and the proper semantics of ( y << n) and ( y <<% n) for y: i32, n :u32 |
100 times this. Please don't use <<% for "wrap around shifting". If my understanding of how the zig tokenizer works, it would be possible to pick "basically anything else" |
I think that there are some cases where bitwise operations can be mixed with arithmetic operations and make a lot of sense. The use case I'm thinking of is related to hardware modeling and simulation. I would really like to be able to use zig for that task as an alternative to C, and zig's integral types have a lot going for them that makes them WAY more ergonomic than C's. Example: It's pretty common to have to read a handful of bits out of several different signal facilities, where each one contains some different subset of a memory address. Those values have to be shifted and then bitwise OR'd together to form the address. That address is very likely something that math is done with to determine things like it's offset into a cache line or page. My point here is not that this would become impossible with separate bitwise types, but rather to provide a counter-example to the assertion that bitwise and arithmetic operations categorically shouldn't mix. |
@tsmanner Do you have a blog entry or elaboration how you would envision such things? Hardware modeling is very broad in scope and has several dedicated DSLs. Operator overloading will likely not get accepted and operations on types, which are not aligned at byte boundary (at least by one bound, see exotic integers PR #10858), but I may be wrong on this. I suspect that you will likely end up with a DSL (type representation and operations), if you want very ergonomic behavior for the use case. However, I think any (comptime) interface description (or even comptime gen) to such a DSL from Zig would be highly appreciated.
As more C-like low-level language Zig intends to provide a fast "overview of the underlying memory representation" and ergonomic+type-checked "close" representation of underlying hardware operations. |
@matu3ba I don't have a blog post or anything, just a lot of practical experience doing hardware simulations directly in C and C++, no DSLs. There are a few cases where operator overloading in C++ can be helpful, but we use it very sparingly. The ergonomics I'm talking about have to do with signal width. It's very rare, in my experience, that a signal is 8, 16, 32, or 64 bits wide. Getting data from the hardware model, and putting data into it, requires some extra runtime safety checking on it to make sure that the high-order bits in the My interest in zig for this is mostly centered around [nearly] arbitrary unsigned integer widths, especially when extracting fields from a very wide hardware signal (e.g. a latch bank with 118 bits in it) that cross a word boundary. Example: consider a processor with 64-bit addresses and an L1 cache like this
To check cache reads and writes, we can set an expectation for the address and cache location that should be accessed based on an instruction's operand address. Looking at the actual hardware model, we have to reconstruct the address from 3 pieces of information in there:
uint8_t setid = fCacheReadWriteSetId.get(); // get the Set ID the hardware model is accessing
uint8_t index = fCacheReadWriteIndex.get(); // get the line index the hardware model is accessing
// This is two values, because the TLB entry is larger than 64 bits. It must contain both
// the logical address bits down to the cache index, and the full translated page address
// for the cache line.
uint64_t tlbEntry[2] = { fTlb[setid][index][0].get(), fTlb[setid][index][1].get() }; // get the TLB entry
// Some messy Bitwise AND and shift operations to isolate the logical address bits from
// the larger-than-64-bit TLB entry.
uint64_t addr = (extractLogicalAddressBits(tlbEntry) << (6 + 7)) // shift the most significant bits of the address from the TLB into place
| (static_cast<uin64_t>(index) << 6) // cast and shift and OR the index bits into place
| static_cast<uint64_t>(fOperandByteOffset.get()); // cast and OR the byte offset into place
expectEqual(expected_address, addr); // Check it const setid: u3 = fCacheReadWriteSetId.get();
const index: u7 = fCacheReadWriteIndex.get();
// 103 bits because 64 - 6 - 7 for the logical address is 51 bits in the TLB, and the absolute
// is probably translated on a 4k page boundary, which means the 12 least significant bits
// are always the same as the logical ones, leaving 52 translated bits.
const tlbEntry: u103 = (@intCast(u103, fTlb[setid][index][0].get()) << 64) | (@intCast(u103, fTlb[setid][index][1].get());
const addr: u64 = @intCast(u64, (tlbEntry & cTlbLogAddrMask) >> (52 - 6 - 7))
| (@intCast(u64, index) << 6)
| @intCast(u64, fOperandByteOffset.get(); In the C++ example, the The reconstructed address, I can then do arithmetic (addition and subtraction at least) and ordering compares with, to check things like distances between operands and whether or not the address lies within a special range (e.g. microcode reserved address ranges). I'm not aware of any common hardware instructions that do both bitwise and arithmetic operations at once. Having separate bitwise types and arithmetic types would add steps between those operations though, and my point was that the same value can reasonably be used both ways. In the example above, given the I hope that answers your question, or at least provides some insight into my motivations. |
@tsmanner As I understand it, it is a common use case to use bit operations to splice/combine integers, and then to do arithmetic on the combined result, not just bitops. For the following questions I do assume that you are not blocked by for example #10684 and #10920. If you are, please write into the comments (or contribute on fixing it).
Testing implementations against another sounds very useful to check for correctness of exotic integers + comptime and sounds also useful to check your hand-crafted stuff. |
@matu3ba you are correct, I am not blocked by anything. I was only trying to provide a concrete example of a user wanting to do both bitops and arithmetic with the same value. I like your ideas about how to go about implementing them, my own are similar. The code base I work with is pretty old, and most of the people working on it have expertise in hardware design not software design. The hand rolled stuff is, unfortunately, very common in there still. |
Of course people splice together bits to integers and use them as
indexes in arrays, but your example does not actually _use_
artithmetic (+,-,*) and bitops (&, |,^, <<, rotl, ...) together.
I think your example would be written something like (assuming bxxx is
an xxx bit bitfield, and @zeroext zeroextends a bitfield or the bits
of an integer with given number of bits)
const setid: u3 = fCacheReadWriteSetId.get();
const index: u7 = fCacheReadWriteIndex.get();
// 103 bits because 64 - 6 - 7 for the logical address is 51 bits in
the TLB, and the absolute
// is probably translated on a 4k page boundary, which means the 12
least significant bits
// are always the same as the logical ones, leaving 52 translated bits.
const tlbEntry: b103 = @zeroext(b103, fTlb[setid][index][0].get()) <<
64 | @zeroext(b103, fTlb[setid][index][1].get();
const addr: u64 = @as(u64,
@intcast(b64, (tlbEntry &
cTlbLogAddrMask) >> 52 - 6 - 7 ) //no _need_ for parenthesised
(52 - 6 - 7)
|
***@***.***(b64, index) << 6)
| @zeroext(b64, fOperandByteOffset.get())
);
It would also make sense to write @intcast(b64, (tlbEntry &
cTlbLogAddrMask) >> 52 - 6 - 7 ) as
@bitfield(b103, tlbEntry & cTlbLogAddrMask, 52 - 6 - 7, 103).
In my opinion explicitly marking where the bit representation is used
makes things clearer, YMMV.
…On Sun, Feb 20, 2022 at 4:38 PM matu3ba ***@***.***> wrote:
@tsmanner <https://github.com/tsmanner> As I understand it, it is a
common use case to use bit operations to splice/combine integers, and then
to do arithmetic on the combined result, not just bitops.
For the following questions I do assume that you are not blocked by for
example #10684 <#10684> and #10920
<#10920>. If you are, please write
into the comments (or contribute on fixing it).
1. Did you try to implement comptime-generation and validation of the
necessary masks (if the bit-offset is not byte-aligned) for a simple use
case?
I am abit surprised that there is no tooling to
verify/validate/(auto-)generate the masks from the hardware description, as
this sounds like a very common task.
2. Did you thought about making a zig library, ie some examples for a
few architectures? Usually, it is simpler to generalize the problem(s) with
existence/annoyance of workarounds in a more complete (even if toy)
application.
3. Did you play with comptime-generating the necessary stuff for
extractLogicalAddressBits and write it as c/c++ file at runtime?
Testing implementations against another sounds very useful to check for
correctness of exotic integers + comptime and sounds also useful to check
your hand-crafted stuff.
—
Reply to this email directly, view it on GitHub
<#7605 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AR7SWUPPGAA5TYGAO7T2B4LU4EDGHANCNFSM4VOVDXFA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
That is correct which is why my post was about ergonomics and not the ability to accomplish. Requiring casting between bitwise and arithmetic types just so that shift+or construction of values in my example doesn't lend any expressiveness to the program that isn't already present in the selection of type (e.g. |
I think another reason why it's important to keep the cast explicit is Zig doesn't implicitly integer promote the lhs of a shift operator but C/C++ do. In C/C++ it's always legal to left-shift by values < 31, but some of these are nonsense in Zig and I think it's useful to have a compiler error to say so. For example these are legal shift operations in C: uint32_t shift1() {
uint8_t lhs = 0xab;
uint8_t rhs = 24;
return lhs << rhs; // 0xab000000u
}
uint32_t shift2() {
uint8_t lhs = 0xab;
return lhs << 24; // 0xab000000u
} In Zig you don't get automatic integer promotion like this which means I think both of these programs should remain compilation errors lest we give C/C++ programmers more ways to shoot themselves in the foot: export fn shift1() u32 {
var lhs: u8 = 0xab;
var rhs: u8 = 24;
// error: expected type 'u3', found 'u8'
// note: unsigned 3-bit int cannot represent all possible unsigned 8-bit values
return lhs << rhs;
}
export fn shift2() u32 {
const lhs: u8 = 0xab;
// error: type 'u3' cannot represent integer value '24'
return lhs << 24;
} |
Currently, if you shift with a comptime-known RHS, everything is easy peasy lemonboy squeezy:
However if the RHS is runtime known, it requires an often-awkward cast:
There are 2 choices to resolve the situation:
std.math.shr
andstd.math.shl
do.@intCast
.This proposal is to modify the
>>
and<<
operators to match what we have instd.math.shr
andstd.math.shl
. This allows the RHS to be any integer type, even signed, and there is no possibility of illegal behavior; the entire range of all integers is well defined.The reason for status quo is performance, to force people to think about the tradeoff when choosing between these two strategies. This proposal argues that this mathematical definition of bit shifting is more sensible. For code that wants to avoid the extra possible instructions arising from making this definition of shifting work on each platform, there are some pretty reasonable ways to accomplish this, so much so that they will often happen by accident:
@intCast
above - the operations can lower to better machine instructions.The interesting thing here is that both definitions of bit shifting operations can be implemented in terms of the other in the standard library, so the question is, which one should we be pushing on users with the convenient shifting syntax? This proposal answers firmly with "the mathematically clean one". I do believe this matches other precedents, such as how
@intCast
works by preserving the mathematical integer value.Related proposals:
The text was updated successfully, but these errors were encountered: