Skip to content
This repository has been archived by the owner on Dec 9, 2021. It is now read-only.

Issue 68: Refactor existing modular arithmetic code #80

Merged
merged 24 commits into from
Sep 27, 2020

Conversation

unzvfu
Copy link
Collaborator

@unzvfu unzvfu commented Sep 20, 2020

This PR addresses the easy parts of #68. Almost all of the functions that were specialised to 4 or 6 limbs have been replaced with a single version that takes a const generic parameter N that can be any positive integer. There are a lot of changes, but they are all very repetitive and straight forward I think.

I say "almost all" of the functions because it became clear that more significant restructuring will be necessary to refactor all the repeated montgomery_multiplication implementations. Specifically, the dependence on the representation is more explicit in that function (though it is of course there in the other arithmetic functions). I have a solution in mind for tidying this all up that I will implement eventually (it's already partly done), but at this stage I think it's a better idea for me to do some actual performance work on the arithmetic. :)

Speaking of performance, it's important that we don't introduce any performance regressions with these refactoring PRs. I couldn't detect any performance issues with this PR with my setup, but the reviewer should double-check this on their setup!

(The commit log incorporates all the commits from my previous PR, because this one was built on top of that; the messages can be ignored, it will merge fine. Sorry for the noise; I'll rebase properly next time.)

@unzvfu unzvfu self-assigned this Sep 20, 2020
@unzvfu unzvfu marked this pull request as ready for review September 20, 2020 00:13
Copy link
Contributor

@dlubarov dlubarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great otherwise! I'll just do a quick cargo bench u64 once I'm able to build, to double check for perf changes as you suggested, though it seems like there shouldn't be any.

Copy link
Contributor

@dlubarov dlubarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I ran the u64 benches a few times on your branch & master, and saw some 5-10% swings in both directions. I don't think performance was actually affected; most likely my laptop is just being inconsistent.

@unzvfu
Copy link
Collaborator Author

unzvfu commented Sep 22, 2020

So after all that, I am now somewhat doubtful that rustc optimises const generic code properly. I finished implementing Montgomery squaring last night and saw that, with const generics the code was almost twice as slow as Montgomery multiplication, whereas when I simply replaced the const generic parameter with the number 4 (for the Tweedle* curves) in the same code, it took only about 60% of the time of Montgomery multiplication which is much more in line with expectations (and a pretty great result itself!).

I'll have to investigate a bit further to see if other people have had performance problems with const generics; I assumed they would work much like non-type template parameters in C++, but there's no way we should be seeing such huge performance differences if that were the case. Our options might be (i) return to the previous situation with explicit specialisations to limb length 4 and 6, or (ii) use macros instead. Let's discuss further at the next catch-up.

@dlubarov
Copy link
Contributor

Interesting! Sorry I didn't think of this in advance, but I think the issue may be with #[unroll_for_loops]. (It's from this crate, not something officially supported by Rust devs. There's a ticket for official support, but it probably won't happen soon.) I think it only looks for for loops with integer literal bounds.

Maybe we could build a proc macro similar to #[unroll_for_loops], which replaced a for loop like

#[unroll_const_generics(max = 6)]
for i in 0..n {
  body(i);
}

with

if 0 < n { body(0); }
if 1 < n { body(1); }
if 2 < n { body(2); }
if 3 < n { body(3); }
if 4 < n { body(4); }
if 5 < n { body(5); }
if 6 < n { body(6); }
// Fallback in case a loop has more than 6 iterations.
for i in 6..n {
  body(i);
}

Then I think we should end up with the unrolled code we want, after the compiler eliminated the dead code and the always-true conditionals. Admittedly it would be a huge hack, and it might slow down compilation (though probably not that bad for small-ish max values like 6).

@dlubarov
Copy link
Contributor

We actually started with macros, but eventually got rid of them here. As I recall we ran into a couple little annoyances. One was that some of our macros would produce dead code given certain parameters, e.g. in subtraction, we had a sometimes-empty for loop to check limbs which only existed in one operand. (Of course we could have written a separate macro for subtraction with different limb counts, or suppressed the warnings.)

Another little thing was that we ended up with lists of macro invocations like

sub_symmetric!(sub_4_4, 4);
sub_symmetric!(sub_6_6, 6);
sub_asymmetric!(sub_7_6, 7, 6);
sub_asymmetric!(sub_5_4, 5, 4);

which were slightly annoying to maintain.

Nothing that major though, so we could definitely consider returning to that approach.

Perhaps another option to consider would be just removing the BLS curve so that we only need to deal with ~256 bit fields. There's a chance that we might go back to BLS (Aztec and Matter Labs are going that direction), but I think it's relatively low.

@dlubarov
Copy link
Contributor

I think this seems OK to merge. I was a bit confused since our main method (cargo run --release) seems to fail earlier on this branch than on master, but after digging into the error, it looks like it's just due to a recent fix on master (e30ffb8). (There are still other errors even on master though.)

@unzvfu
Copy link
Collaborator Author

unzvfu commented Sep 27, 2020

I investigated a bit more on this topic. It seems rustc does unroll loops with const generic parameters to some extent, but for some reason my squaring code does not pass the compiler's internal threshold for code that's suitable to unroll. I'm fairly confident that the simple functions like cmp, add, etc. will be unrolled correctly with const generics (so this PR can indeed be merged), but anything more complicated will have to be dealt with manually.

Indeed, the unroll_for_loops procedural macro doesn't touch loops with const generic bounds, so we would have to resolve this in a manner similar to what you mentioned above. At this stage, I think it will be more advantageous to simply focus on specialising to 4 limbs; then we can adapt to 6 or some other number of limbs if and when the need arises. My sense is that this foray into writing const generic versions of the code has been more of a distraction than helpful unfortunately.

@unzvfu unzvfu merged commit a676a71 into master Sep 27, 2020
@unzvfu unzvfu deleted the issue-68-refactor-existing-modular-arithmetic-code branch September 27, 2020 04:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants