Skip to content

Proposal: Explicit Shift Operators #5220

Open
@ghost

Description

Explicit Shift Operators

This proposal suggests making bit-shifting operations explicit and type-agnostic.

But why?

Currently, the behavior of >> and shrExact depend on the user knowing the type being worked on.

const warn = @import("std").debug.warn;

pub fn main() void {
    const signed = @bitCast(u8, @intCast(i8, 1) << 7 >> 7);
    warn("Signed:   {0b:0>8}\n", .{signed});

    const unsigned = @bitCast(u8, @intCast(u8, 1) << 7 >> 7);
    warn("Unsigned: {0b:0>8}\n", .{unsigned});

    const signedExact = @bitCast(u8, @shrExact(@intCast(i8, 1) << 7, 7));
    warn("SignedExact:   {0b:0>8}\n", .{signedExact});

    const unsignedExact = @bitCast(u8, @shrExact(@intCast(u8, 1) << 7, 7));
    warn("UnsignedExact: {0b:0>8}\n", .{unsignedExact});
}

which outputs:

Signed:   11111111
Unsigned: 00000001
SignedExact:   11111111
UnsignedExact: 00000001

Needing to know the type to know what an operator does is a common argument against operator overloading, so by this logic Zig's current shr's behavior is inconsistent with it's opinions.

Furthermore, this implicit operator overloading makes it difficult to port shift-heavy bitwise algorithms between signed and unsigned types, as the behavior is now completely different. For example, take this conversation between intellectual gentlemen:

<DrJensen> https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=cfb576205954af096255c96e8a9dcc24
<DrJensen> comparing my software division output to the hardware division
           output
<DrJensen> "hard scalar" is correct
<DrRichmond> Oh, the easy solution is just promote all your i8s to i16s to do
             the division
<DrJensen> "soft scalar" outputs 0 instead of -42 when dividing -128 by 3
<DrRichmond> https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=4e0c5a75f93bbf42890647f78415cfeb
<DrJensen> but where is it overflowing?
<DrJensen> dividend.wrapping_abs() I guess
<DrJensen> oh probably `((dividend & (1 << i)) >> i)`
<DrJensen> which means if I turn it into a u8..
<DrJensen> https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3cd9a6633c124d1612d0d7400ae22fe9 :D
* DrJensen CLAPS
<DrJensen> ajajajajajaja
<DrRichmond> wut the fug
<DrRichmond> why does the signedness matter? they both have 8 bits
<DrJensen> ofc it does, `<< i >> i` will do a signed rightshift if it's an i8
<DrJensen> which floods with 1's
<DrJensen> converting to u8 will flood 0's
<DrRichmond> what
<DrRichmond> fuck that noise
<DrJensen> mfw DrRichmond is too used to >>>= from all his java programming
<DrRichmond> I just think bit shift should actually operate at the bit level
             and not have a different meaning depending on whether your number
             is signed or not
<DrJensen> ye maybe
<DrJensen> I don't really care tbh
<DrRichmond> maybe you should because your code would have been correct the
             first time if rust didn't break convention here)
<DrJensen> except...it doesn't
<DrJensen> "When shifting an unsigned value, the >> operator in C is a logical
           shift. When shifting a signed value, the >> operator is an arithmetic
           shift."
<DrJensen> https://stackoverflow.com/questions/7622/are-the-shift-operators-arithmetic-or-logical-in-c
<DrJensen> or technically, "implementation defined" ;)
<DrRichmond> :|
<DrJensen> would you like me to teach you C++, DrRichmond?
<DrJensen> It's okay, I can make time.
<DrMiller> you can make time but can you make game
<DrRichmond> fuckitdood the more you know

The amount of ignorance and confusion in this conversation spans multiple dimensions. We could be smug about knowing arcane rules, but there's clearly something wrong about this simple operator. They almost halved their throughput! There is unnecessary cognitive overhead when working with right-shift.

Ergo! We take (the only?) good idea from a certain other language and break them into separate operators, with some differences.

"We should toss in some more operators, it responded well to that."

In Zig, there are four bitshifting operators.

Left Bitshift:         `<<`,    `<<=`,   `shlExact`,       `shlWithOverflow`
Right Bitshift:        `>>`,    `>>=`,   `shrExact`,       `shrWithOverflow`
Sticky Left Bitshift:  `<<<`,  `<<<=`,   `shlStickyExact`, `shlStickyWithOverflow`
Sticky Right Bitshift: `>>>`,  `>>>=`,   `shrStickyExact`, `shrStickyWithOverflow`

<< and >> operators are 'zero-flooding', i.e, they move the bit-buffer in a direction and leave 0's in its wake. However, it is sometimes convenient to use the 'sticky' bitshifts, which fills its wake with the value of the leftmost bit (in >>>) or rightmost bit (in <<<).

Bitshifts are sometimes used to simulate arithmetic operations using simpler hardware. For example, in some situations, left-shifting a number is equivalent to multiplying it by a power of 2. However, this is only consistent for unsigned integers fixedpoints, as the left-shift may occupy the Two's Complement's MSB:

const a: u8 = 1 << 3; // 8 = 1 * 2^3
const b: i8 = 2 << 5; // 32 = 2 * 2^5
const c: i8 = 64 << 1; // -128 != 64 * 2^1!

Similarly, a right shift may, in some situations, be used to divide by a power of 2, but this is only consistent for unsigned integers fixedpoints, as shifted signed numbers do not follow division's rounding rules:

const a: u8 = 128 >> 1; // 64 = 128 / 2^1
const b_incorrect: i8 = -128 >> 1; // 64 != -128 /  2^1!
const b_correct: i8  = -128 >>> 1; // -64 = -128 / 2^1!
const b_what: i8 = -1 >> 1; // 127 != -1 / 2^1!
const b_ohno: i8 = -1 >>> 1; // -1 != -1 / 2^1

For this reason, manually representing multiplication or division via shifts is a choice one should consider very carefully. Keep in mind the compiler will automatically convert multiplies and divides to shifts whenever it has enough information to do so.

Wiggling, Pulsating Innards

Implementation is fairly straightforward, but some gotchas are likely to arise.

Drawbacks

  • Different. As far as the author is aware, no other language has defined shifting like this, although it is arguably less strange than what Java and Javascript define.
  • Basically the opposite of Java and Javascript's definitions, so it's likely to knot up caffeinated greybeards' muscle memory.

Rationale and alternatives

C's shifting is defined as follows:

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2^E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.
-- http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

This undefined and implementation-defined behavior should be taken in context with the era:

Arithmetic right shifts for negative numbers are equivalent to division using rounding towards 0 in one's complement representation of signed numbers as was used by some historic computers, but this is no longer in general use.
-- https://en.wikipedia.org/wiki/Arithmetic_shift#Non-equivalence_of_arithmetic_right_shift_and_division

So, for one's complement machines, having n << p and n >> p be synonymous with n * 2^p and n / 2^p is sensible for both signed and unsigned numbers. In two's complement, this is no longer sensible. C's indecisive solution is to essentially allow the hardware engineers to overload the << and >> operators.

By definition, we can not rely on this. Other languages have tried to imitate C's indecisiveness in this matter, allowing language behavior to be defined by historical curiosity and ambivalent waffling.

By defining separate operators for shifts, in terms of zero vs sticky, we sidestep the arithmetic context altogether and treat them exactly as they are: bitwise operators that move and flood. Any arithmetic context is a convenient side-effect.

Alternative Implementation: Chop off its hands

We could have only two bitshift operators, << and >>, and remove sticky shifting altogether, requiring any equivalent functionality to be recreated using ~(mask << a) and ~(mask >> a). However this is less desirable, as the mask created by sticky shifting is conditional on the MSB or LSB of the bit-buffer, requiring a rewrite to involve a if / else branch, which is much less zen-inducing than <<< and >>>

Prior art

Unresolved questions

Does defining shifts in terms of "Flood-zero's" and "Sticky" make users expect a "Flood-one's" operator? Does the lack of one make programming less convenient?

Future possibilities

Shifts have always been in a strange situation, not quite a bitwise operation, not quite an arithmetic one, and weighed down by historic waffling in computing hardware. We are in a position to clarify the purpose of these fundamental tools, decreasing the number of dumb bugs people deal with every day, and increasing their Zen of Zig, one bit at a time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposalThis issue suggests modifications. If it also has the "accepted" label then it is planned.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions