Performance regressions #16

jeffro256 · 2024-12-10T07:21:14Z

I've believe that I've bisected a commit which introduce performance regressions. Here's a list of timings of I've compiled of the length of a single FCMP membership verify function call by commit and date:

May 13 - e0bb5b37bfe919805c4ae99a7309ed5b866ee441 - 25ms
May 13 - 0c776ae6b71cc1308cf7e4ac0211b51e0cd77b29 - 33ms
May 13 - 88e60c7d75680f4492777d0d1d2f7795c92d264e - 33ms
May 14 - b2742e86f3d18155fd34dd1ed69cb8f79b900fce - 33ms
May 23 - a17500708f5c6d79ec9cc33d53c771149db152c3 - 33ms
May 23 - a11db094aac0384b9e62930f6f9f0d062d436897 - 25ms
May 24 - 2e6c63d9863028333b15883585f16ff1713246dc - 25ms
May 25 - 534b837ef5fd8e3243166d0616b86bd9e67716d9 - 29ms
May 26 - c07d9cf1eb3f7891ce9c06af0bae918d9c2ea94f - 42ms, 32ms
May 26 - 00242acf9a1402d6361f057e51deecebf73e17b4 - 42ms, 32ms
July 18 - 5766902409b1740d8e612447035cc47ea49316c1 - 42ms, 32ms
July 18 - f584c284f38a765e87facc11742a8bbe7e07d385 - tests failed
July 18 - 7b17c717c7673b02676d32e77c8f5fd650c9539f - 138ms, 121ms, 127ms
July 19 - 378670be01e4329ccefdccd164c5279c2c9202ec - 97ms
July 20 - e92631affe98a24ec8f98653740b726dde853e00 - 90ms
December 8 - cbb5ffa04143f862a651cedb5f040c0ee29d5510 - 120ms, 137ms

Theses samples were collected by running the command cargo test --release -- --show-output after checking out the respective commit. They ran on a 13th Gen Intel(R) Core(TM) i7-1355U running Linux Mint 21.3,

The biggest performance regression appears in either commit f584c28 ("Remove dependency on flexible-transcript for GBPs, FCMPs") or 7b17c71 ("Further GBP cleanup"), former didn't pass. It introduced a slowdown of ~200%.

The text was updated successfully, but these errors were encountered:

kayabaNerve · 2025-01-06T22:01:21Z

In July (specifically in the commit for which jeffro observed test failures), the proofs were redefined from decompressed proof objects into opaque byte blobs. This moved point decompression time into the proof itself, extending the benchmark from proof verification to decompression and verification.

914b80b and 7d972e2 removed unnecessary serialization calls from the proof, as it deserializes.

abca01f replaces the challenge fn introduced in the same commit. The old challenge fn had an extraneous hash. The introduced challenge fn removed the unnecessary hash but performed conversion to a scalar extremely inefficiently. The new challenge fn utilizes a new API to perform conversion to a scalar efficiently.

@jeffro256 noted an ~18% performance difference between modern Rust and the targeted Rust (which was moved to in a few commits in May, hence that regression). cc @tobtoht, monero-project/monero#9440. The issue is Ubuntu 20.04 is still supported by Monero yet has linker errors with Rust > 1.69 (at least, as used in Monero's cross-compilation pipeline). rust-lang/rust#71515 tracks Rust using its own linker on Ubuntu 20.04 (it does on nightly but not on stable yet) which may resolve this.

I believe that comments on all the regressions and successfully optimizes past the performance cost of decompression inside of verify. The new bench is 28ms for the FCMP with the targeted Rust version, outside of a batch, and 8.54ms for each in a batch of 100.

Thank you to @jeffro256 for their identification and help resolving this.

I'll finally note our discrete log challenge evaluation is ~15% of the verify function (not including the final multiexps with the batch verifier). This function does a bunch of short muls (8-bit coefficients) but does them by promoting the 8-bit coefficients to 256-bit scalars. This adds 248 doubles and adds.

I did try implementing a custom double-and-add ladder on the dlog-short-mul branch, but the 8 doubles, 8 adds, 16 reductions wasn't cheaper than the 256 doubles, 256 adds, 1 wide reduction. We could add a dedicated API to our Ciphersuite definition, and to our Helioselene library, for notable performance savings but the underlying crypto-bigint library doesn't expose an amenable function. This would have to be done after the Helioselene arithmetic is replaced with bespoke arithmetic, if/when that occurs.

kayabaNerve closed this as completed Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regressions #16

Performance regressions #16

jeffro256 commented Dec 10, 2024 •

edited

Loading

kayabaNerve commented Jan 6, 2025

Performance regressions #16

Performance regressions #16

Comments

jeffro256 commented Dec 10, 2024 • edited Loading

kayabaNerve commented Jan 6, 2025

jeffro256 commented Dec 10, 2024 •

edited

Loading