Binary GCD #755

erik-3milabs · 2025-01-30T15:06:08Z

This PR introduces an implementation of Optimized Binary GCD. Ref: Pornin, Algorithm 2.

Upsides to this technique:

it is up to 27x faster than the gcd algorithm currently implemented in crypto_bigint (see below) (really, it just has a different complexity bound).
does not need UNSAT_LIMBS
it is actually constant time (unlike the current implementation, which is sneakily vartime in the maximum of the bitsizes of the two operands).

Benchmark results

Word = u64

limbs	gcd (vt)	gcd (ct)	new_gcd (ct)
2	10.687 µs	20.619 µs	3.6090 µs
4	29.121 µs	56.433 µs	7.1124 µs
8	99.819 µs	195.02 µs	16.184 µs
16	359.39 µs	710.26 µs	44.294 µs
32	1.6804 ms	3.3097 ms	136.49 µs
64	6.9717 ms	13.028 ms	494.16 µs
128	29.099 ms	57.325 ms	2.3335 ms
256	143.22 ms	244.89 ms	8.7722 ms

…n-zero.

dolevmu

Looks great overall. I find the implementation constant time. Offered a few annotations and documentations to clarify the code after going over it, and suggested some changes to boost performance. I think it is worth reading and considering, but I approve this.

Great work thanks @erik-3milabs !

src/uint/bingcd/gcd.rs

dolevmu · 2025-02-05T15:30:05Z

src/uint/bingcd/gcd.rs

+        // Todo: tweak this threshold
+        if LIMBS < 8 {


Good question, but looks like you answered it, your problem is that the answer may depend on the machine. Suppose I gave you an answer, do you know what to do in the code? If I say "this is optimal for 64-bit architecture and that for 32-bit".. If so, let's check on another machine.

src/uint/bingcd/gcd.rs

src/uint/bingcd/xgcd.rs

dolevmu · 2025-02-06T07:05:23Z

src/uint/bingcd/xgcd.rs

+            b = Uint::select(&b, &b.wrapping_sub(&a), b_odd);
+            matrix.conditional_subtract_top_row_from_bottom(b_odd);
+
+            // Div b by two and double the top row of the matrix when a, b ≠ 0.


According to Algorithm 2, you multiply by 2 always, but I suppose you know what you are doing and he is wrong on this edge case.

Also, it multiplies the second row and not the first one (f1,g1) is multiplied by two.

This is a modification I made to the algorithm. It prevents integer overflows in an upcoming PR. I've updated the docstring of this function to indicate this.

yes, but this code has a and b swapped, so I should be doubling the first row ;-)

Note: that doubling problem was fixed because I reverted the (a, b) swap.

Turns out, this is wrong: this leads to buggy behavior for specific pairs of numbers. I'm refactoring some code to bypass this issue.

Fixed in #761

src/uint/bingcd/xgcd.rs

dolevmu · 2025-02-06T07:28:36Z

src/uint/cmp.rs

@@ -25,6 +25,12 @@ impl<const LIMBS: usize> Uint<LIMBS> {
        Uint { limbs }
    }

+    /// Swap `a` and `b` if `c` is truthy, otherwise, do nothing.
+    #[inline]
+    pub(crate) const fn conditional_swap(a: &mut Self, b: &mut Self, c: ConstChoice) {


In light of my previous comment, it is worth to consider if we can efficiently select between two vectors or matrices, and similarly swap columns/rows efficiently (which you did implement).

@tarcieri do you have any ideas on improving the performance of this and other swapping operations?

* This reverts commit 0897439 * This adds further annotation

erik-3milabs · 2025-02-07T10:04:49Z

src/uint/bingcd/gcd.rs

+            Uint::conditional_swap(&mut a, &mut b, do_swap);
+
+            // subtract b from a when a is odd
+            a = a.wrapping_sub(&Uint::select(&Uint::ZERO, &b, a_odd));


@tarcieri what do you think of this line? Previously, it was like this:

a = Uint::select(&a, &a.wrapping_sub(&b), a_odd);

The current code is 25-10% faster for Uints with few limbs (1, 2, 3, etc.)

I'm surprised there's that much of a difference. Are you sure it's always faster or is it faster depending on a?

I'm pretty confident it is always faster.

The reason I think this is faster, is that we are now selecting between a constant and a variable, instead of two variables. Given that select is const and loves to be inlined, the compiler can now optimize the select operation.

Recall, Uint::select calls Limb::select, which in turn calls

impl ConstChoice { /// Return `b` if `self` is truthy, otherwise return `a`. #[inline] pub(crate) const fn select_word(&self, a: Word, b: Word) -> Word { a ^ (self.0 & (a ^ b)) } }

When a is the constant ZERO, this can be optimized as:

self.0 & b

saving two XOR operations, or 2/3's of this operation.

Returning to the gcd subroutine, this select is in the hot loop of this algorithm. In total, the loop executes:

Uint::is_odd (1 op)

Uint::lt (2 ops/word),

ConstChoice::and (1 op),

Uint::wrapping_sub (4 ops/word),

Uint::select (3 ops/word -> 1 ops/word)

Uint::shr (3 ops/word)

So, there is a reduction from 12 to 10 ops/word, or a 17% improvement.

When a is the constant ZERO

Aah, ok

tarcieri · 2025-02-14T19:33:05Z

@erik-3milabs I just set master to v0.7.0-pre in #765

This means you can now make breaking changes, such as removing the existing safegcd implementation and changing trait impls like Gcd and InvMod to use bingcd instead

erik-3milabs · 2025-03-11T14:57:12Z

@erik-3milabs I just set master to v0.7.0-pre in #765

This means you can now make breaking changes, such as removing the existing safegcd implementation and changing trait impls like Gcd and InvMod to use bingcd instead

@tarcieri While I could still modify the Gcd trait to use this algorithm, this PR does not yet introduce the tools necessary to replace InvMod.

Aside from that, what else would be required to see this PR merged?

tarcieri · 2025-03-11T15:13:13Z

Aah, lack of invmod support would definitely be a problem. Is it something you plan on addressing eventually? My understanding is, like safegcd, that invmod is a big part of binary GCD's intended usage.

It seems a little weird to have multiple implementations of GCD algorithms which effectively do the same thing, though to completely replace safegcd in addition to invmod you'd also need support for boxed types (though with const_mut_refs stable it should be a lot easier to share an implementation).

Also seems it needs a rebase due to upstream changes.

erik-3milabs · 2025-03-14T08:31:43Z

Aah, lack of invmod support would definitely be a problem. Is it something you plan on addressing eventually? My understanding is, like safegcd, that invmod is a big part of binary GCD's intended usage.

You're right. This PR only introduces the gcd algorithm; PR #761 extends the algorithm into xgcd. Stripping some things from the xgcd algorithm gives invmod. Given that I don't need invmod myself, I am not too keen on implementing it 🙈

It seems a little weird to have multiple implementations of GCD algorithms which effectively do the same thing, though to completely replace safegcd in addition to invmod you'd also need support for boxed types (though with const_mut_refs stable it should be a lot easier to share an implementation).

I agree that having two algorithms is overkill. Let me see about implementing this for Boxed<X> as well.

Also seems it needs a rebase due to upstream changes.

Yeah, you're right. Let me address that right away.

erik-3milabs added 30 commits January 27, 2025 10:20

Impl new_inv_mod_odd

e8d8f4f

Modify new_inv_mod_odd algorithm

f46213d

Make as_limbs_mut const

378e2ee

Introduce const conditional_swap

45fc11d

Improve Int::checked_mul notation

4ba745c

Introduce new_gcd

02ceb4f

Get bingcd working

c6891f4

Fix fmt

b9fb154

65mus U1024::gcd

ffe7bb2

Clean up

dc6f517

Clean

a3253d8

Remove DOUBLE requirement

6b95681

Extract restricted xgcd.

b5c9951

Introduce const_min and const_max

09b9ee7

Clean up summarize

fbd39e6

Clean up compact

a7f8dae

Update ExtendedInt

41d32f6

Impl Matrix

e445ceb

Make new_odd_gcd constant time

30aabf1

Replace shr by proper div_2k

51b93f0

Remove ExtendedInt::abs

714d608

Update restricted_extended_gcd

6d9a3fe

Annotate new_gcd

32b8e9f

Refactor IntMatrix

cf064a4

Refactor ExtendedInt into ExtendedInt and ExtendeUint

e4f4359

Fix bug

4b84597

Inline ExtendedUint and ExtendedInt

e822db1

Expand Uint::gcd benchmarking

a1ff0a8

Expand Uint::new_gcd testing

46211cf

Annotate new_gcd.rs

a824a4d

erik-3milabs added 5 commits February 3, 2025 15:58

Align gcd return values with their type

39a4e88

Remove sneaky swap operation

0897439

Expand bingcd testing

4007347

Refactor bingcd test suite

3f28258

Minor optimization; bingcd can always divide b by two; a is always no…

9a941da

…n-zero.

dolevmu approved these changes Feb 6, 2025

View reviewed changes

erik-3milabs added 12 commits February 6, 2025 12:01

Improve bingcd annotation

d1347e3

Split optimized_bingcd in two parts.

87d8ee7

Tune optimized_bingcd parameters

23a8dd5

Make compact generic in K

2d3df09

Annotate the use of shl_vartime and shr_vartime

30189d1

Take iterations out of the optimized_bingcd_ loop

2e93021

Indicate restricted_extended_gcd as _vartime in iterations

d464a02

Revert "Remove sneaky swap operation"

579d93e

* This reverts commit 0897439 * This adds further annotation

Annotate partial_binxgcd_vartime

89d4d29

Fix clippy

fc686d1

Fix docstring Matrix::conditional_double_bottom_row

b782790

Optimize conditional sub operation in classic_bingcd

859bbb9

erik-3milabs commented Feb 7, 2025

View reviewed changes

erik-3milabs mentioned this pull request Feb 14, 2025

Binary XGCD #761

Draft

tarcieri mentioned this pull request Feb 28, 2025

core-only "heapless" support RustCrypto/RSA#51

Open

tarcieri mentioned this pull request Mar 12, 2025

Implement safegcd-bounds #634

Open

erik-3milabs added 3 commits March 14, 2025 09:51

Merge branch 'master' into bingcd

1cd2c65

Fix fmt

b8a8844

Fix clippy

4700594

tarcieri mentioned this pull request Mar 18, 2025

Move away from traits implemented for specific Uint sizes? #793

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary GCD #755

Binary GCD #755

erik-3milabs commented Jan 30, 2025

dolevmu left a comment

dolevmu Feb 5, 2025

dolevmu Feb 6, 2025

dolevmu Feb 6, 2025

erik-3milabs Feb 6, 2025

erik-3milabs Feb 12, 2025

erik-3milabs Feb 14, 2025

dolevmu Feb 6, 2025

erik-3milabs Feb 14, 2025

erik-3milabs Feb 7, 2025 •

edited

Loading

tarcieri Feb 13, 2025

erik-3milabs Feb 14, 2025 •

edited

Loading

tarcieri Feb 14, 2025

tarcieri commented Feb 14, 2025

erik-3milabs commented Mar 11, 2025

tarcieri commented Mar 11, 2025 •

edited

Loading

erik-3milabs commented Mar 14, 2025

Binary GCD #755

Are you sure you want to change the base?

Binary GCD #755

Conversation

erik-3milabs commented Jan 30, 2025

Upsides to this technique:

Benchmark results

dolevmu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erik-3milabs Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erik-3milabs Feb 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tarcieri commented Feb 14, 2025

erik-3milabs commented Mar 11, 2025

tarcieri commented Mar 11, 2025 • edited Loading

erik-3milabs commented Mar 14, 2025

erik-3milabs Feb 7, 2025 •

edited

Loading

erik-3milabs Feb 14, 2025 •

edited

Loading

tarcieri commented Mar 11, 2025 •

edited

Loading