-
Couldn't load subscription status.
- Fork 286
Higher radix multiplier encoding #7991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Higher radix multiplier encoding #7991
Conversation
c5111f1 to
354bb4e
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #7991 +/- ##
===========================================
- Coverage 79.09% 78.63% -0.46%
===========================================
Files 1699 1701 +2
Lines 196512 196571 +59
===========================================
- Hits 155428 154575 -853
- Misses 41084 41996 +912 ☔ View full report in Codecov by Sentry. |
354bb4e to
ffe136e
Compare
ffe136e to
2f37f7b
Compare
60fcbd3 to
5a5471a
Compare
ed26c5a to
a08b3f6
Compare
We were still repeatedly using `bv[bv.size() - 1]` in place, when using `sign_bit(bv)` adds clarity and avoids having to understand encoding details. Also, use `.back()` to avoid unnecessary repeat arithmetic (which the compiler may or may not optimise away).
We can avoid lowering, and eventually re-use this as part of other algorithms.
1. Duplicate some code to specialise it for signed/unsigned and, thereby, add clarity. 2. De-duplicate code to handle all cases directly in lt_or_le. 3. Make sure we constant-propagate whatever is possible in unsigned comparison to avoid introducing unnecessary fresh literals.
The Radix-4 multiplier pre-computes products for x * 0 (zero), x * 1 (x), x * 2 (left-shift x), x * 3 (x * 2 + x) to reduce the number of sums by a factor of 2. Radix-8 extends this up to x * 7 (reducing sums by a factor of 3), and Radix-16 up to x * 15 (reducing sums by a factor of 4). This modified approach to computing partial products can be freely combined with different (tree) reduction schemes. Benchmarking results can be found at https://tinyurl.com/multiplier-comparison (shortened URL for https://docs.google.com/spreadsheets/d/197uDKVXYRVAQdxB64wZCoWnoQ_TEpyEIw7K5ctJ7taM/). The data suggests that combining Radix-8 partial product pre-computation with Dadda's reduction can yield substantial performance gains while not substantially regressing in other benchmark/solver pairs.
Uses extra sign-bit to keep bit widths small.
Implements the algorithm of Section 4 of "Further Steps Down The Wrong Path : Improving the Bit-Blasting of Multiplication" (see https://ceur-ws.org/Vol-2908/short16.pdf).
Prints a vector of literals in human-readable form, including a decimal representation when all literals are constants.
Implements a symbolic version of the algorithm proposed by Schönhage and Strassen in "Schnelle Multiplikation großer Zahlen", Computing, 7, 1971.
a08b3f6 to
1c60d3a
Compare
The Radix-4 multiplier pre-computes products for x * 0 (zero), x * 1 (x), x * 2 (left-shift x), x * 3 (x * 2 + x) to reduce the number of sums by a factor of 2. Radix-8 extends this up to x * 7 (reducing sums by a factor of 3), and Radix-16 up to x * 15 (reducing sums by a factor of 4). This modified approach to computing partial products can be freely combined with different (tree) reduction schemes.
Benchmarking results can be found at
https://tinyurl.com/multiplier-comparison (shortened URL for https://docs.google.com/spreadsheets/d/197uDKVXYRVAQdxB64wZCoWnoQ_TEpyEIw7K5ctJ7taM/). The data suggests that combining Radix-8 partial product pre-computation with Dadda's reduction can yield substantial performance gains while not substantially regressing in other benchmark/solver pairs.