-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
double-and-add algorithm #11
Comments
noted! thanks Tal. cc: @DmytroTym, @mickeyasa |
Sure! It's a stripped down version from cuda-fixnum (extended modular arithmetic library) built by Hamish from Polygon Zero. It's an older repository, but it's performant: https://github.com/data61/cuda-fixnum. |
I feel like two different things are at play here: modular multiplication, which is what Montgomery multiplication does, and single-scalar multiplication, which is what double-and-add does. Admittedly, we have room to improve in both. For modular multiplication, we want to see if our scheme based on Barrett reduction can be competitive with Montgomery CIOS, and if not, we'll switch to CIOS. We only have basic Barrett implemented for now. As for double-and-add, you are also right that it's suboptimal, we're planning on moving to more efficient algorithms in the near future. |
To comment on this: we haven't implemented a more-efficient single-scalar multiplication than double-and-add yet, but we don't really have any use-cases for it so it's still low priority. |
The addition operation in projective coordinates implements operator overloading that computes a scalar multiplication using a naive double-and-add algorithm. Would be worth specializing the operation by switching it with Montgomery multiplication (CIOS) for more efficient multiplications.
Here's an efficient implementation: https://github.com/MinaProtocol/gpu-groth16-prover-3x/blob/master/multiexp/arith.cu#L289. It uses cooperative groups and thread ranks in cuda. For your codebase, it would simply require changing out the 'fixnum' calls to equivalent PTX instructions you implement in 'ptx.cuh'.
The text was updated successfully, but these errors were encountered: