Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ScalarMult with NAF #10

Merged
merged 2 commits into from
Feb 5, 2015
Merged

Conversation

jimmysong
Copy link
Contributor

Use Non-Adjacent Form (NAF) of large numbers to reduce ScalarMult computation times.

Preliminary results indicate around a 8-9% speed improvement according to BenchmarkScalarMult.

The algorithm used is 3.77 from Guide to Elliptical Curve Crytography by Hankerson, et al.

This closes #3

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) when pulling d8b15e8 on jimmysong:3 into d694428 on conformal:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.37%) when pulling c4bb9ca on jimmysong:3 into d694428 on conformal:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.17%) when pulling c9b455f on jimmysong:3 into d694428 on conformal:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.38%) when pulling 867aabe on jimmysong:3 into d694428 on conformal:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.37%) when pulling 94680e9 on jimmysong:3 into d694428 on conformal:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.04%) to 97.82% when pulling 99aa45e on jimmysong:3 into f9365fd on btcsuite:master.

@davecgh
Copy link
Member

davecgh commented Jan 22, 2015

@jimmysong Thanks for rebasing. I was planning to get these merged before the btcec repo is merged into btcd.

@davecgh
Copy link
Member

davecgh commented Feb 1, 2015

@jimmysong Can you rebase this again? There is probably going to be a conflict since the PrintBytePoints stuff has been changed. I'll be pushing to get this and the other pr in next week.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 97.32% when pulling 26217f9 on jimmysong:3 into 9535058 on btcsuite:master.

@jimmysong
Copy link
Contributor Author

@davecgh rebased and ready to go.

@jimmysong jimmysong force-pushed the 3 branch 2 times, most recently from e71e1b8 to e750009 Compare February 3, 2015 15:59
@coveralls
Copy link

Coverage Status

Coverage increased (+0.42%) to 97.13% when pulling e750009 on jimmysong:3 into 46829e8 on btcsuite:master.

@davecgh
Copy link
Member

davecgh commented Feb 3, 2015

$ golint ./...
btcec.go:666:9: if block ends with a return statement, so drop this else and outdent its block

This implements a speedup to ScalarMult using the endomorphism available to secp256k1.

Note the constants lambda, beta, a1, b1, a2 and b2 are from here:

https://bitcointalk.org/index.php?topic=3238.0

Preliminary tests indicate a speedup of between 17%-20% (BenchScalarMult).

More speedup can probably be achieved once splitK uses something more like what fieldVal uses. Unfortunately, the prime for this math is the order of G (N), not P.

Note the NAF optimization was specifically not done as that's the purview of another issue.

Changed both ScalarMult and ScalarBaseMult to take advantage of curve.N to reduce k.
This results in a 80% speedup to large values of k for ScalarBaseMult.
Note the new test BenchmarkScalarBaseMultLarge is how that speedup number can
be checked.

This closes btcsuite#1
@davecgh
Copy link
Member

davecgh commented Feb 4, 2015

I haven't narrowed it down to this PR or the endomorphism one, but I suspect it's this one that is the cause. The memory usage has skyrocketed. I let btcd run with --nocheckpoints to force all of the script validation and ecc to run and it was over 1.6GB after a few hours. Without these PRs it's aroud 200MB. I'm going to let it run with just the endomorphism to verify.

@davecgh
Copy link
Member

davecgh commented Feb 4, 2015

Ok, I've verified this PR is the culprit. I've been running the endomorphism PR for a couple of hours now and memory usage is stable and very similar to master.

@jimmysong
Copy link
Contributor Author

@davecgh I've implemented the speedup to NAF as you've asked. I put this in a separate commit so you don't have to figure out what's changed. But basically, I used your suggestion to use byte arrays instead of a large int array.

// P1 below is P in the equation, P2 below is ϕ(P) in the equation
p1x, p1y := curve.bigAffineToField(Bx, By)
// For NAF, we need the negative point
p1yNeg := new(fieldVal).Set(p1y).Negate(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor optimization here. If you use NegateVal you can negate and set the value in one operation without having to copy it first with Set.

p1yNeg := new(fieldVal).NegateVal(p1y, 1)

// non-zero.
// The algorithm here is from Guide to Elliptical Cryptography 3.30 (ref above)
// Essentially, this makes it possible to minimize the number of operations
// since the resulting ints returned will be at least 50% 0's.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is no longer accurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be fixed. not sure why it's not outdated yet.

@jimmysong jimmysong force-pushed the 3 branch 2 times, most recently from d42f2eb to 7a364d7 Compare February 5, 2015 05:21
@davecgh
Copy link
Member

davecgh commented Feb 5, 2015

Feel free to squash the two commits. I'm done reviewing the changes. Thanks for splitting them out for review as it made it easier!

I've been running on the new NAF code since this afternoon and memory usage is now stable and similar to the memory usage on master. I also noticed the speed increase. Nicely done!

Use Non-Adjacent Form (NAF) of large numbers to reduce ScalarMult computation times.

Preliminary results indicate around a 8-9% speed improvement according to BenchmarkScalarMult.

The algorithm used is 3.77 from Guide to Elliptical Curve Crytography by Hankerson, et al.

This closes btcsuite#3
@jimmysong
Copy link
Contributor Author

Squashed.

@conformal-deploy conformal-deploy merged commit 6c36218 into btcsuite:master Feb 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize ScalarMult with NAF + Windowing
4 participants