-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Issue for bigint helper methods #85532
Comments
I'd like a mul_mod, as shown in #85017, because I think you can't implement it efficiently without asm and it's a basic block for power_mod and other things. |
Another set of methods that could be useful that I'll probably offer implementations for at some point: /// `(self << rhs) | carry`
fn carrying_shl(self, rhs: u32, carry: Self) -> (Self, Self);
/// `(self >> rhs) | carry`
fn borrowing_shr(self, rhs: u32, carry: Self) -> (Self, Self);
/// `self << rhs`
fn widening_shl(self, rhs: u32) -> (Self, Self);
/// `self >> rhs`
fn widening_shr(self, rhs: u32) -> (Self, Self); Essentially, return the two halves of a rotation, i.e. |
From @scottmcm in the original PR:
|
Why don't we add |
Mostly effort implementing them efficiently. In the meantime, you can do it with four calls to the |
I was very confused by this function name at first, since borrowing in Rust usually refers to references. I am not a native speaker, but I do formal mathematical work in English professionally, and yet I never before heard the term "borrowing" in the context of subtraction. So I think this, at least, needs some explanation in the docs. (I would have expected something like The current docs for some of the other methods could probably also be improved: they talk about not having the "ability to overflow", which makes it sound like not overflowing is a bad thing. |
The word borrow here comes from the terminology for a full subtractor. I am thinking that maybe the borrowing_sub function could be removed altogether. The same effect that borrowing_sub has can be obtained from carrying_add by making the first carrying_add in the chain have a set carry bit, and then bitnot every rhs. This fact could be put in the documentation of carrying_add. |
Considering how the primary goal of these methods is to be as efficient as possible, usually optimising down to a single instruction, I don't think it'd be reasonable to just get rid of subtraction in favour of telling everyone to use addition instead. Definitely open to changing the name, though. |
These helper methods will not be very useful to me unless they are implemented for every kind of integer. Here is an implementation for a widening multiplication-addition for u128:
I have tested this with my crate edit: There is a version of this that uses the Karatsuba trick to use 3 multiplications instead of 4, but it incurs extra summations, branches, and is not as parallel. For typical desktop processors the above should be the fastest. |
I would make a PR for that. |
Some alternative signatures include |
I would also change up the documentation headers for the
I specifically note |
|
But unsigned overflow and signed overflow are different. For example, on x86_64, while unsigned and signed integers share addition and subtraction instructions, unsigned overflow is detected using the carry flag while signed overflow is detected using the overflow flag. As a concrete example: Edit: I think I had misread your comment and thought the middle part of your comment was the current doc, not your suggestion, so it looks like I completely misinterpreted your final comment. |
Yes signed and unsigned overflow are different, but the |
I think all of these are good suggestions, and like mentioned earlier, these changes definitely should go in a PR if you have the time. I think one important thing to note is that so far the APIs here seem good, but the documentation definitely could use some work. Although if there's a bigger case for changing the subtraction behaviour to be more in line with what's expected (the existing behaviour is mostly modelled after the x86 instructions adc and sbb), then I'm for that. That said, the main goal is to make it relatively painless to write correct code that compiles down to the right instructions in release mode, so, I would say we should make sure that happens regardless of what's done. I would have added an explicit test for that but I honestly don't know how. |
…riplett Add more text and examples to `carrying_{add|mul}` `feature(bigint_helper_methods)` tracking issue rust-lang#85532 cc `@clarfonthey`
…riplett Add more text and examples to `carrying_{add|mul}` `feature(bigint_helper_methods)` tracking issue rust-lang#85532 cc ``@clarfonthey``
…riplett Add more text and examples to `carrying_{add|mul}` `feature(bigint_helper_methods)` tracking issue rust-lang#85532 cc ```@clarfonthey```
…riplett Add more text and examples to `carrying_{add|mul}` `feature(bigint_helper_methods)` tracking issue rust-lang#85532 cc ````@clarfonthey````
…riplett Add more text and examples to `carrying_{add|mul}` `feature(bigint_helper_methods)` tracking issue rust-lang#85532 cc `````@clarfonthey`````
Multiplication, and carry-less multiplication, are inherently a widening operation. Unfortunately, at the time of writing, the types in Rust don't capture this well, being built around fixed-width wrapping multiplication. Rust's stdlib can rely on compiler-level optimizations to clean up performance issues from unnecessarily-wide multiplications, but this becomes a bit of an issue for our library, especially for u64 types, since we rely on intrinsics, which may be hard for compilers to optimize around. This commit adds widening_mul, based on a proposal to add widening_mul to Rust's primitive types: rust-lang/rust#85532 As well as several other tweaks to how xmul is provided, moving more arch-level details into xmul, but still limiting when it is emitted.
It turns out that rustc was able to optimize the current implementation of |
Opened #133663 to add an intrinsic for |
Replace (Classic problem that Fixed in #133674 |
…nieu Fix chaining `carrying_add`s Something about the MIR lowering for `||` ended up breaking this, but it's fixed by changing the code to use `|` instead. I also added an assembly test to ensure it *keeps* being [`adc`](https://www.felixcloutier.com/x86/adc). cc rust-lang#85532 (comment), which noticed this.
…nieu Fix chaining `carrying_add`s Something about the MIR lowering for `||` ended up breaking this, but it's fixed by changing the code to use `|` instead. I also added an assembly test to ensure it *keeps* being [`adc`](https://www.felixcloutier.com/x86/adc). cc rust-lang#85532 (comment), which noticed this.
Rollup merge of rust-lang#133674 - scottmcm:chain-carrying-add, r=Amanieu Fix chaining `carrying_add`s Something about the MIR lowering for `||` ended up breaking this, but it's fixed by changing the code to use `|` instead. I also added an assembly test to ensure it *keeps* being [`adc`](https://www.felixcloutier.com/x86/adc). cc rust-lang#85532 (comment), which noticed this.
Add a compiler intrinsic to back `bigint_helper_methods` cc rust-lang#85532 This adds a new `carrying_mul_add` intrinsic, to implement `wide_mul` and `carrying_mul`. It has fallback MIR for all types -- including `u128`, which isn't currently supported on nightly -- so that it'll continue to work on all backends, including CTFE. Then it's overridden in `cg_llvm` to use wider intermediate types, including `i256` for `u128::carrying_mul`.
Add a compiler intrinsic to back `bigint_helper_methods` cc rust-lang#85532 This adds a new `carrying_mul_add` intrinsic, to implement `wide_mul` and `carrying_mul`. It has fallback MIR for all types -- including `u128`, which isn't currently supported on nightly -- so that it'll continue to work on all backends, including CTFE. Then it's overridden in `cg_llvm` to use wider intermediate types, including `i256` for `u128::carrying_mul`.
Rollup merge of rust-lang#133663 - scottmcm:carrying_mul_add, r=Amanieu Add a compiler intrinsic to back `bigint_helper_methods` cc rust-lang#85532 This adds a new `carrying_mul_add` intrinsic, to implement `wide_mul` and `carrying_mul`. It has fallback MIR for all types -- including `u128`, which isn't currently supported on nightly -- so that it'll continue to work on all backends, including CTFE. Then it's overridden in `cg_llvm` to use wider intermediate types, including `i256` for `u128::carrying_mul`.
Add a compiler intrinsic to back `bigint_helper_methods` cc rust-lang#85532 This adds a new `carrying_mul_add` intrinsic, to implement `wide_mul` and `carrying_mul`. It has fallback MIR for all types -- including `u128`, which isn't currently supported on nightly -- so that it'll continue to work on all backends, including CTFE. Then it's overridden in `cg_llvm` to use wider intermediate types, including `i256` for `u128::carrying_mul`.
Tidy up bigint multiplication methods This tidies up the library version of the bigint multiplication methods after the addition of the intrinsics in rust-lang#133663. It follows [this summary](rust-lang#85532 (comment)) of what's desired for these methods. Note that, if `2H = N`, then `uH::MAX * uH::MAX + uH::MAX + uH::MAX` is `uN::MAX`, and that we can effectively add two "carry" values without overflowing. For ease of terminology, the "low-order" or "least significant" or "wrapping" half of multiplication will be called the low part, and the "high-order" or "most significant" or "overflowing" half of multiplication will be called the high part. In all cases, the return convention is `(low, high)` and left unchanged by this PR, to be litigated later. ## API Changes The original API: ```rust impl uN { // computes self * rhs pub const fn widening_mul(self, rhs: uN) -> (uN, uN); // computes self * rhs + carry pub const fn carrying_mul(self, rhs: uN, carry: uN) -> (uN, uN); } ``` The added API: ```rust impl uN { // computes self * rhs + carry1 + carry2 pub const fn carrying2_mul(self, rhs: uN, carry: uN, add: uN) -> (uN, uN); } impl iN { // note that the low part is unsigned pub const fn widening_mul(self, rhs: iN) -> (uN, iN); pub const fn carrying_mul(self, rhs: iN, carry: iN) -> (uN, iN); pub const fn carrying_mul_add(self, rhs: iN, carry: iN, add: iN) -> (uN, iN); } ``` Additionally, a naive implementation has been added for `u128` and `i128` since there are no double-wide types for those. Eventually, an intrinsic will be added to make these more efficient, but rather than doing this all at once, the library changes are added first. ## Justifications for API The unsigned parts are done to ensure consistency with overflowing addition: for a two's complement integer, you want to have unsigned overflow semantics for all parts of the integer except the highest one. This is because overflow for unsigned integers happens on the highest bit (from `MAX` to zero), whereas overflow for signed integers happens on the second highest bit (from `MAX` to `MIN`). Since the sign information only matters in the highest part, we use unsigned overflow for everything but that part. There is still discussion on the merits of signed bigint *addition* methods, since getting the behaviour right is very subtle, but at least for signed bigint *multiplication*, the sign of the operands does make a difference. So, it feels appropriate that at least until we've nailed down the final API, there should be an option to do signed versions of these methods. Additionally, while it's unclear whether we need all three versions of bigint multiplication (widening, carrying-1, and carrying-2), since it's possible to have up to two carries without overflow, there should at least be a method to allow that. We could potentially only offer the carry-2 method and expect that adding zero carries afterword will optimise correctly, but again, this can be litigated before stabilisation. ## Note on documentation While a lot of care was put into the documentation for the `widening_mul` and `carrying_mul` methods on unsigned integers, I have not taken this same care for `carrying_mul_add` or the signed versions. While I have updated the doc tests to be more appropriate, there will likely be many documentation changes done before stabilisation. ## Note on tests Alongside this change, I've added several tests to ensure that these methods work as expected. These are alongside the codegen tests for the intrinsics.
Tidy up bigint multiplication methods This tidies up the library version of the bigint multiplication methods after the addition of the intrinsics in #133663. It follows [this summary](rust-lang/rust#85532 (comment)) of what's desired for these methods. Note that, if `2H = N`, then `uH::MAX * uH::MAX + uH::MAX + uH::MAX` is `uN::MAX`, and that we can effectively add two "carry" values without overflowing. For ease of terminology, the "low-order" or "least significant" or "wrapping" half of multiplication will be called the low part, and the "high-order" or "most significant" or "overflowing" half of multiplication will be called the high part. In all cases, the return convention is `(low, high)` and left unchanged by this PR, to be litigated later. ## API Changes The original API: ```rust impl uN { // computes self * rhs pub const fn widening_mul(self, rhs: uN) -> (uN, uN); // computes self * rhs + carry pub const fn carrying_mul(self, rhs: uN, carry: uN) -> (uN, uN); } ``` The added API: ```rust impl uN { // computes self * rhs + carry1 + carry2 pub const fn carrying2_mul(self, rhs: uN, carry: uN, add: uN) -> (uN, uN); } impl iN { // note that the low part is unsigned pub const fn widening_mul(self, rhs: iN) -> (uN, iN); pub const fn carrying_mul(self, rhs: iN, carry: iN) -> (uN, iN); pub const fn carrying_mul_add(self, rhs: iN, carry: iN, add: iN) -> (uN, iN); } ``` Additionally, a naive implementation has been added for `u128` and `i128` since there are no double-wide types for those. Eventually, an intrinsic will be added to make these more efficient, but rather than doing this all at once, the library changes are added first. ## Justifications for API The unsigned parts are done to ensure consistency with overflowing addition: for a two's complement integer, you want to have unsigned overflow semantics for all parts of the integer except the highest one. This is because overflow for unsigned integers happens on the highest bit (from `MAX` to zero), whereas overflow for signed integers happens on the second highest bit (from `MAX` to `MIN`). Since the sign information only matters in the highest part, we use unsigned overflow for everything but that part. There is still discussion on the merits of signed bigint *addition* methods, since getting the behaviour right is very subtle, but at least for signed bigint *multiplication*, the sign of the operands does make a difference. So, it feels appropriate that at least until we've nailed down the final API, there should be an option to do signed versions of these methods. Additionally, while it's unclear whether we need all three versions of bigint multiplication (widening, carrying-1, and carrying-2), since it's possible to have up to two carries without overflow, there should at least be a method to allow that. We could potentially only offer the carry-2 method and expect that adding zero carries afterword will optimise correctly, but again, this can be litigated before stabilisation. ## Note on documentation While a lot of care was put into the documentation for the `widening_mul` and `carrying_mul` methods on unsigned integers, I have not taken this same care for `carrying_mul_add` or the signed versions. While I have updated the doc tests to be more appropriate, there will likely be many documentation changes done before stabilisation. ## Note on tests Alongside this change, I've added several tests to ensure that these methods work as expected. These are alongside the codegen tests for the intrinsics.
A nice thing I noticed while writing the demo in #135750: Manually writing naïve quadratic multiplication of a #[no_mangle]
pub fn quadratic_mul(a: u128, b: u128) -> u128 {
const N: usize = 2;
let a: [u64; N] = unsafe { std::mem::transmute(a) };
let b: [u64; N] = unsafe { std::mem::transmute(b) };
let mut out = [0; N];
for j in 0..N {
let mut carry = 0;
for i in 0..(N - j) {
(out[j + i], carry) = u64::carrying_mul_add(a[i], b[j], out[j + i], carry);
}
}
unsafe { std::mem::transmute(out) }
} gives this assembly: quadratic_mul:
mov r8, rdx
mov rax, rdx
mul rdi
imul r8, rsi
add rdx, r8
imul rcx, rdi
add rdx, rcx
ret Which is essentially identical to what you get from using ordinary_mul_128:
mov r8, rdx
mov rax, rdx
mul rdi
imul rsi, r8
add rdx, rsi
imul rcx, rdi
add rdx, rcx
ret |
that's because LLVM expands wide multiplications into a bunch of word-sized operations using nearly the same algorithm: https://github.com/llvm/llvm-project/blob/84c89d0aa4beff4a4d6c36eda125278c48e41128/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L10822 |
It absolutely makes sense, I'm just glad to see that we're emitting something good enough that the extra steps optimize away. |
Add an example of using `carrying_mul_add` to write wider multiplication Just the basic quadratic version that you wouldn't actually use for really-big integers, but it's nice and short so is useful as for a demonstration of why you might find `carrying_mul_add` useful :) cc rust-lang#85532 `@clarfonthey`
Add an example of using `carrying_mul_add` to write wider multiplication Just the basic quadratic version that you wouldn't actually use for really-big integers, but it's nice and short so is useful as for a demonstration of why you might find `carrying_mul_add` useful :) cc rust-lang#85532 ``@clarfonthey``
Add an example of using `carrying_mul_add` to write wider multiplication Just the basic quadratic version that you wouldn't actually use for really-big integers, but it's nice and short so is useful as for a demonstration of why you might find `carrying_mul_add` useful :) cc rust-lang#85532 ```@clarfonthey```
Add an example of using `carrying_mul_add` to write wider multiplication Just the basic quadratic version that you wouldn't actually use for really-big integers, but it's nice and short so is useful as for a demonstration of why you might find `carrying_mul_add` useful :) cc rust-lang#85532 ````@clarfonthey````
Add an example of using `carrying_mul_add` to write wider multiplication Just the basic quadratic version that you wouldn't actually use for really-big integers, but it's nice and short so is useful as for a demonstration of why you might find `carrying_mul_add` useful :) cc rust-lang#85532 `````@clarfonthey`````
Add an example of using `carrying_mul_add` to write wider multiplication Just the basic quadratic version that you wouldn't actually use for really-big integers, but it's nice and short so is useful as for a demonstration of why you might find `carrying_mul_add` useful :) cc rust-lang#85532 ``````@clarfonthey``````
Rollup merge of rust-lang#135750 - scottmcm:cma-example, r=cuviper Add an example of using `carrying_mul_add` to write wider multiplication Just the basic quadratic version that you wouldn't actually use for really-big integers, but it's nice and short so is useful as for a demonstration of why you might find `carrying_mul_add` useful :) cc rust-lang#85532 ``````@clarfonthey``````
Feature gate:
#![feature(bigint_helper_methods)]
This is a tracking issue for the following methods on integers:
carrying_add
borrowing_sub
carrying_mul
carrying_mul_add
widening_mul
These methods are intended to help centralise the effort required for creating efficient big integer implementations, by offering a few methods which would otherwise require special compiler intrinsics or custom assembly code in order to do efficiently. They do not alone constitute big integer implementations themselves, but are necessary building blocks for a larger implementation.
Public API
Steps / History
widening_mul
RFC: widening_mul rfcs#2417carrying_add
,borrowing_sub
,carrying_mul
, andwidening_mul
Add carrying_add, borrowing_sub, widening_mul, carrying_mul methods to integers #85017carrying_add
andborrowing_sub
on signed types: Reimplementcarrying_add
andborrowing_sub
for signed integers. #93873bigint_helper_methods
#133663("without the ability to overflow" can be confusing.)
Unresolved Questions
bigint_helper_methods
#133663widening_mul
that simply returns the next-larger type? What would we do foru128
/i128
?The text was updated successfully, but these errors were encountered: