Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__rust_u128_mulo aborts when rhs is zero #367

Closed
bjorn3 opened this issue Jul 13, 2020 · 12 comments
Closed

__rust_u128_mulo aborts when rhs is zero #367

bjorn3 opened this issue Jul 13, 2020 · 12 comments

Comments

@bjorn3
Copy link
Member

bjorn3 commented Jul 13, 2020

if self > Self::max_value().aborting_div(other) {

This is not a problem when using cg_llvm, as it is never called. However unlike cg_llvm, cg_clif does use it for the checked u128 multiplication implementation.

cc https://github.com/bjorn3/rustc_codegen_cranelift/issues/1063

@AaronKutch
Copy link
Contributor

I'm going to try fixing this and improving performance.

@AaronKutch
Copy link
Contributor

AaronKutch commented Aug 18, 2020

edit: With some help, I found the answer to be adding some bounds to the associated types inside the associated types, such as type H: HInt<D = Self>; and type D: DInt<H = Self>;.

@nagisa I was having trouble using LargeInt to implement functions using generics when I had a revelation. u128 does not have any primitives larger than it, and u8 does not have any primitives smaller than it. Some functions need to split a larger integer into smaller ones, while others need to use widening operations. If there is only one trait such as LargeInt, then that trait will be fundamentally biased depending on if an impl such as impl LargeInt for u128 exists or if impl LargeInt for u8 exists. The single trait cannot do both at the same time, and that asymmetry will propagate up in subtle ways when implementing functions. What I am trying to do instead is replace LargeInt by two traits:

/// Trait for integers twice the bit width of another integer. This is implemented for all
/// primitives except for `u8`, because there is not a smaller primitive.
pub(crate) trait DInt: Int {
    /// Integer that is half the bit width of the integer this trait is implemented for
    type H: HInt;

    /// Returns the low half of `self`
    fn lo(self) -> Self::H;
    /// Returns the high half of `self`
    fn hi(self) -> Self::H;
    /// Returns the low and high halves of `self` as a tuple
    fn lo_hi(self) -> (Self::H, Self::H);
}

/// Trait for integers half the bit width of another integer. This is implemented for all
/// primitives except for `u128`, because it there is not a larger primitive.
pub(crate) trait HInt: Int {
    /// Integer that is double the bit width of the integer this trait is implemented for
    type D: DInt;

    /// Widens (zero extends or sign extends) the integer to have double bit width
    fn widen(self) -> Self::D;
    /// Widens the integer to have double bit width and shifts the integer into the higher bits
    fn widen_hi(self) -> Self::D;
    /// Constructs a double bit width integer using lower and higher parts
    fn from_lo_hi(lo: Self, hi: Self) -> Self::D;
    /// Widening multiplication. This cannot overflow.
    /// We use `widen_mul` to avoid colliding with `widening_mul` in case RFC PR #2417 is
    /// implemented.
    fn widen_mul(self, rhs: Self) -> Self::D;
}

In practice, these work wonders such as converting this incomprehensible mess:

trait Mul: LargeInt {
    fn mul(self, other: Self) -> Self {
        let half_bits = Self::BITS / 4;
        let lower_mask = !<<Self as LargeInt>::LowHalf>::ZERO >> half_bits;
        let mut low = (self.low() & lower_mask).wrapping_mul(other.low() & lower_mask);
        let mut t = low >> half_bits;
        low &= lower_mask;
        t += (self.low() >> half_bits).wrapping_mul(other.low() & lower_mask);
        low += (t & lower_mask) << half_bits;
        let mut high = Self::low_as_high(t >> half_bits);
        t = low >> half_bits;
        low &= lower_mask;
        t += (other.low() >> half_bits).wrapping_mul(self.low() & lower_mask);
        low += (t & lower_mask) << half_bits;
        high += Self::low_as_high(t >> half_bits);
        high += Self::low_as_high((self.low() >> half_bits).wrapping_mul(other.low() >> half_bits));
        high = high
            .wrapping_add(self.high().wrapping_mul(Self::low_as_high(other.low())))
            .wrapping_add(Self::low_as_high(self.low()).wrapping_mul(other.high()));
        Self::from_parts(low, high)
    }
}

into this (note: I haven't checked this one for translation bugs because of compilation problems, but I have verified the two traits in other simpler generic implementations)

// This generic implementation uses small multiplications to generate large multiplications, hence the `DInt` bound
trait Mul: Int + DInt {
    fn mul(self, other: Self) -> Self {
        let mut tmp0 = self.lo().widen_mul(other.lo());
        let mut sum = tmp0.lo();
        sum += self.hi().wrapping_mul(other.lo());
        tmp0 += sum.widen_hi();
        let mut tmp1 = sum.widen_hi();
        sum = tmp0.hi();
        tmp0 &= lower_mask;
        sum += other.hi().wrapping_mul(self.lo());
        tmp0 += sum.widen_hi();
        tmp1 += sum.hi();
        tmp1 += self.hi().wrapping_mul(other.hi()).widen_hi();
        tmp1 = tmp1
            .wrapping_add(self.hi().wrapping_mul(other.hi()))
            .wrapping_add(self.lo().wrapping_mul(other.hi()).widen_hi());
        tmp0 + tmp1
    }
}

There is one serious problem that I don't know how to get around, however. I run into this error when I do some computation on a different sized integer and transform it back for the return value:

fn f<I: Int + DInt>(x: I) -> I {
    let t = x.lo();
    // ...
    t.widen() // expected type parameter `I`
              found associated type `<<I as int::DInt>::H as int::HInt>::D`
}

fn g<I: Int + HInt>(x: I) -> I {
    let t = x.widen();
    // ...
    t.lo() // expected type parameter `I`
              found associated type `<<I as int::HInt>::D as int::DInt>::H`
}

The type checker doesn't understand that we went in a circle such as DInt -> HInt -> DInt or HInt -> DInt -> HInt. Does someone with deep knowledge of typechecking shenanigans know the best solution to this problem?

The only alternative I know is using a single bikeshedded version of LargeInt that cannot support impl LargetInt for u8 because the vast majority of cases in compiler-builtins would use DInt anyway, and I could manually implement the remaining cases.

@amosonn
Copy link

amosonn commented Sep 1, 2020

You need to add bounds on the associated types, i.e.

trait DInt: Int {
    type H: HInt<D=Self>;
    ...
}

trait HInt: Int {
    type D: DInt<H=Self>;
    ...
}

Toy example:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7d22241eb541e4db9500b3e9bbc12e43

@AaronKutch
Copy link
Contributor

I almost have the divisionless algorithm working, just that the signed versions are giving me extreme trouble with iX::MIN cases. I have run out of the time I can allocate to rust for this week however, so this will need to wait

@nagisa
Copy link
Member

nagisa commented Sep 13, 2020

I suggest you to copy the algorithm LLVM uses to lower the operation as I implemented in https://reviews.llvm.org/rG73e8a784e62f945a51363c8b5ec4eaedcf9f87e8.

@nagisa
Copy link
Member

nagisa commented Sep 13, 2020

In particular (iNh being half-size, iN being original sized unsigned integer):

%0 = %LHS.HI != 0 && %RHS.HI != 0
%1 = { iNh, i1 } @umul.with.overflow.iNh(iNh %LHS.HI, iNh %RHS.LO)
%2 = { iNh, i1 } @umul.with.overflow.iNh(iNh %RHS.HI, iNh %LHS.LO)
%3 = mul nuw iN (%LHS.LOW as iN), (%RHS.LOW as iN)
%4 = add iN (%1.0 as iN) << Nh, (%2.0 as iN) << Nh
%5 = { iN, i1 } @uadd.with.overflow.iN( %4, %3 )
%res = { %5.0, %0 || %1.1 || %2.1 || %5.1 }

(edit: ah, signed, yeah I had trouble with them too!)

@AaronKutch
Copy link
Contributor

Is it doing a widening multiplication, and checking if the high part is not zero? That is the fastest way to do it, but only up to the largest widening multiplication supported on an architecture. I have a algorithm based on leading_zeros that I am developing for u128, that doesn't require full u128 by u128 widening multiplication, which I think should be faster.

@nagisa
Copy link
Member

nagisa commented Sep 14, 2020

Is it doing a widening multiplication, and checking if the high part is not zero?

No, none of the operations in this are “widening” in a sense that there are no 256-bit operations involved for a 128-bit UMULO (but there are some simpler 128-bit operations). Here's the algorithm above with types substituted for u128 umulo, maybe it'll be clearer:

%0 = %LHS.HI != 0 && %RHS.HI != 0
%1 = { i64, i1 } @umul.with.overflow.i64(i64 %LHS.HI, i64 %RHS.LO)
%2 = { i64, i1 } @umul.with.overflow.i64(i64 %RHS.HI, i64 %LHS.LO)
%3 = mul nuw i128 (%LHS.LOW as i128), (%RHS.LOW as i128)          ; note: regular multiplication, can't wrap
%4 = add i128 (%1.0 as i128) << 64, (%2.0 as i128) << 64
%5 = { i128, i1 } @uadd.with.overflow.i128(i128 %4, i128 %3)
%res = { %5.0, %0 || %1.1 || %2.1 || %5.1 }                                             ; res is { i128, i1 }

Do note that this is the unsigned version, I couldn't figure out one for signed integers.

@AaronKutch
Copy link
Contributor

Oh I see what is happening. I will use that instead.

@AaronKutch
Copy link
Contributor

This is fixed in master now.

@AaronKutch
Copy link
Contributor

@alexcrichton can a new compiler-builtins version be released? It would be nice to have all the changes I made make it into Rust 1.50.

@alexcrichton
Copy link
Member

Thanks for the ping! Looks like Amanieu just took care of it a few hours ago, so I'll go ahead and close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants