Reduce amount of modular reduction calls in point add/double #12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Another low-ish hanging fruit for #10.
When two integers of a maximal bit size are added together, the result takes at most 1 more bit to store. That property can be taken advantage of such that the modular reduction in the point addition and doubling formulas can mostly be ignored when adding two primefield elements together.
This will end up getting corrected by itself the next time those elements are multiplied together, and it brings some quite nice performance improvements.
In the order of 70% faster point addition / point-scalar multiplication for P-521, and slightly smaller improvements for the other weierstrass curves.
I also took the time to do the same for the other curves, but their new behaviour haven't been tested, as such feel free to let me know if you want me to revert those specific changes.
PS : here is a criterion report highlighting the performance gains on my system
criterion.zip