Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regressions #16

Closed
jeffro256 opened this issue Dec 10, 2024 · 1 comment
Closed

Performance regressions #16

jeffro256 opened this issue Dec 10, 2024 · 1 comment

Comments

@jeffro256
Copy link

jeffro256 commented Dec 10, 2024

I've believe that I've bisected a commit which introduce performance regressions. Here's a list of timings of I've compiled of the length of a single FCMP membership verify function call by commit and date:

May 13 - e0bb5b37bfe919805c4ae99a7309ed5b866ee441 - 25ms
May 13 - 0c776ae6b71cc1308cf7e4ac0211b51e0cd77b29 - 33ms
May 13 - 88e60c7d75680f4492777d0d1d2f7795c92d264e - 33ms
May 14 - b2742e86f3d18155fd34dd1ed69cb8f79b900fce - 33ms
May 23 - a17500708f5c6d79ec9cc33d53c771149db152c3 - 33ms
May 23 - a11db094aac0384b9e62930f6f9f0d062d436897 - 25ms
May 24 - 2e6c63d9863028333b15883585f16ff1713246dc - 25ms
May 25 - 534b837ef5fd8e3243166d0616b86bd9e67716d9 - 29ms
May 26 - c07d9cf1eb3f7891ce9c06af0bae918d9c2ea94f - 42ms, 32ms
May 26 - 00242acf9a1402d6361f057e51deecebf73e17b4 - 42ms, 32ms
July 18 - 5766902409b1740d8e612447035cc47ea49316c1 - 42ms, 32ms
July 18 - f584c284f38a765e87facc11742a8bbe7e07d385 - tests failed
July 18 - 7b17c717c7673b02676d32e77c8f5fd650c9539f - 138ms, 121ms, 127ms
July 19 - 378670be01e4329ccefdccd164c5279c2c9202ec - 97ms
July 20 - e92631affe98a24ec8f98653740b726dde853e00 - 90ms
December 8 - cbb5ffa04143f862a651cedb5f040c0ee29d5510 - 120ms, 137ms

Theses samples were collected by running the command cargo test --release -- --show-output after checking out the respective commit. They ran on a 13th Gen Intel(R) Core(TM) i7-1355U running Linux Mint 21.3,

The biggest performance regression appears in either commit f584c28 ("Remove dependency on flexible-transcript for GBPs, FCMPs") or 7b17c71 ("Further GBP cleanup"), former didn't pass. It introduced a slowdown of ~200%.

@kayabaNerve
Copy link
Owner

In July (specifically in the commit for which jeffro observed test failures), the proofs were redefined from decompressed proof objects into opaque byte blobs. This moved point decompression time into the proof itself, extending the benchmark from proof verification to decompression and verification.

914b80b and 7d972e2 removed unnecessary serialization calls from the proof, as it deserializes.

abca01f replaces the challenge fn introduced in the same commit. The old challenge fn had an extraneous hash. The introduced challenge fn removed the unnecessary hash but performed conversion to a scalar extremely inefficiently. The new challenge fn utilizes a new API to perform conversion to a scalar efficiently.

@jeffro256 noted an ~18% performance difference between modern Rust and the targeted Rust (which was moved to in a few commits in May, hence that regression). cc @tobtoht, monero-project/monero#9440. The issue is Ubuntu 20.04 is still supported by Monero yet has linker errors with Rust > 1.69 (at least, as used in Monero's cross-compilation pipeline). rust-lang/rust#71515 tracks Rust using its own linker on Ubuntu 20.04 (it does on nightly but not on stable yet) which may resolve this.

I believe that comments on all the regressions and successfully optimizes past the performance cost of decompression inside of verify. The new bench is 28ms for the FCMP with the targeted Rust version, outside of a batch, and 8.54ms for each in a batch of 100.

Thank you to @jeffro256 for their identification and help resolving this.

I'll finally note our discrete log challenge evaluation is ~15% of the verify function (not including the final multiexps with the batch verifier). This function does a bunch of short muls (8-bit coefficients) but does them by promoting the 8-bit coefficients to 256-bit scalars. This adds 248 doubles and adds.

I did try implementing a custom double-and-add ladder on the dlog-short-mul branch, but the 8 doubles, 8 adds, 16 reductions wasn't cheaper than the 256 doubles, 256 adds, 1 wide reduction. We could add a dedicated API to our Ciphersuite definition, and to our Helioselene library, for notable performance savings but the underlying crypto-bigint library doesn't expose an amenable function. This would have to be done after the Helioselene arithmetic is replaced with bespoke arithmetic, if/when that occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants