-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regressions #16
Comments
In July (specifically in the commit for which jeffro observed test failures), the proofs were redefined from decompressed proof objects into opaque byte blobs. This moved point decompression time into the proof itself, extending the benchmark from proof verification to decompression and verification. 914b80b and 7d972e2 removed unnecessary serialization calls from the proof, as it deserializes. abca01f replaces the challenge fn introduced in the same commit. The old challenge fn had an extraneous hash. The introduced challenge fn removed the unnecessary hash but performed conversion to a scalar extremely inefficiently. The new challenge fn utilizes a new API to perform conversion to a scalar efficiently. @jeffro256 noted an ~18% performance difference between modern Rust and the targeted Rust (which was moved to in a few commits in May, hence that regression). cc @tobtoht, monero-project/monero#9440. The issue is Ubuntu 20.04 is still supported by Monero yet has linker errors with Rust > 1.69 (at least, as used in Monero's cross-compilation pipeline). rust-lang/rust#71515 tracks Rust using its own linker on Ubuntu 20.04 (it does on nightly but not on stable yet) which may resolve this. I believe that comments on all the regressions and successfully optimizes past the performance cost of decompression inside of verify. The new bench is 28ms for the FCMP with the targeted Rust version, outside of a batch, and 8.54ms for each in a batch of 100. Thank you to @jeffro256 for their identification and help resolving this. I'll finally note our discrete log challenge evaluation is ~15% of the verify function (not including the final multiexps with the batch verifier). This function does a bunch of short muls (8-bit coefficients) but does them by promoting the 8-bit coefficients to 256-bit scalars. This adds 248 doubles and adds. I did try implementing a custom double-and-add ladder on the dlog-short-mul branch, but the 8 doubles, 8 adds, 16 reductions wasn't cheaper than the 256 doubles, 256 adds, 1 wide reduction. We could add a dedicated API to our Ciphersuite definition, and to our Helioselene library, for notable performance savings but the underlying crypto-bigint library doesn't expose an amenable function. This would have to be done after the Helioselene arithmetic is replaced with bespoke arithmetic, if/when that occurs. |
I've believe that I've bisected a commit which introduce performance regressions. Here's a list of timings of I've compiled of the length of a single FCMP membership verify function call by commit and date:
Theses samples were collected by running the command
cargo test --release -- --show-output
after checking out the respective commit. They ran on a 13th Gen Intel(R) Core(TM) i7-1355U running Linux Mint 21.3,The biggest performance regression appears in either commit f584c28 ("Remove dependency on flexible-transcript for GBPs, FCMPs") or 7b17c71 ("Further GBP cleanup"), former didn't pass. It introduced a slowdown of ~200%.
The text was updated successfully, but these errors were encountered: