-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize ScalarMult with NAF #10
Conversation
@jimmysong Thanks for rebasing. I was planning to get these merged before the btcec repo is merged into btcd. |
@jimmysong Can you rebase this again? There is probably going to be a conflict since the PrintBytePoints stuff has been changed. I'll be pushing to get this and the other pr in next week. |
@davecgh rebased and ready to go. |
e71e1b8
to
e750009
Compare
|
This implements a speedup to ScalarMult using the endomorphism available to secp256k1. Note the constants lambda, beta, a1, b1, a2 and b2 are from here: https://bitcointalk.org/index.php?topic=3238.0 Preliminary tests indicate a speedup of between 17%-20% (BenchScalarMult). More speedup can probably be achieved once splitK uses something more like what fieldVal uses. Unfortunately, the prime for this math is the order of G (N), not P. Note the NAF optimization was specifically not done as that's the purview of another issue. Changed both ScalarMult and ScalarBaseMult to take advantage of curve.N to reduce k. This results in a 80% speedup to large values of k for ScalarBaseMult. Note the new test BenchmarkScalarBaseMultLarge is how that speedup number can be checked. This closes btcsuite#1
I haven't narrowed it down to this PR or the endomorphism one, but I suspect it's this one that is the cause. The memory usage has skyrocketed. I let btcd run with |
Ok, I've verified this PR is the culprit. I've been running the endomorphism PR for a couple of hours now and memory usage is stable and very similar to master. |
@davecgh I've implemented the speedup to NAF as you've asked. I put this in a separate commit so you don't have to figure out what's changed. But basically, I used your suggestion to use byte arrays instead of a large int array. |
// P1 below is P in the equation, P2 below is ϕ(P) in the equation | ||
p1x, p1y := curve.bigAffineToField(Bx, By) | ||
// For NAF, we need the negative point | ||
p1yNeg := new(fieldVal).Set(p1y).Negate(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor optimization here. If you use NegateVal
you can negate and set the value in one operation without having to copy it first with Set
.
p1yNeg := new(fieldVal).NegateVal(p1y, 1)
// non-zero. | ||
// The algorithm here is from Guide to Elliptical Cryptography 3.30 (ref above) | ||
// Essentially, this makes it possible to minimize the number of operations | ||
// since the resulting ints returned will be at least 50% 0's. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is no longer accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be fixed. not sure why it's not outdated yet.
d42f2eb
to
7a364d7
Compare
Feel free to squash the two commits. I'm done reviewing the changes. Thanks for splitting them out for review as it made it easier! I've been running on the new NAF code since this afternoon and memory usage is now stable and similar to the memory usage on master. I also noticed the speed increase. Nicely done! |
Use Non-Adjacent Form (NAF) of large numbers to reduce ScalarMult computation times. Preliminary results indicate around a 8-9% speed improvement according to BenchmarkScalarMult. The algorithm used is 3.77 from Guide to Elliptical Curve Crytography by Hankerson, et al. This closes btcsuite#3
Squashed. |
Use Non-Adjacent Form (NAF) of large numbers to reduce ScalarMult computation times.
Preliminary results indicate around a 8-9% speed improvement according to BenchmarkScalarMult.
The algorithm used is 3.77 from Guide to Elliptical Curve Crytography by Hankerson, et al.
This closes #3