-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poly1305 AVX2 backend #49
Conversation
Currently a draft PR because there are still several bugs to fix:
|
Welp, somewhere along the way during my refactoring and documenting process, I've managed to make the AVX2 implementation incredibly slow 😑 When I first got the code running, it was speeding |
Huh, strange, I would assume that LLVM's SSA transform would render these equivalent, at least if the changes were purely superficial... |
Regarding runtime selection of backends: we've largely eschewed it so far for a few different reasons. @newpavlov's suggestion has been to push that selection higher up into e.g. the ChaCha20Poly1305 AEAD construction. You might check out this (unmerged) PR for runtime detection of PCLMULQDQ for POLYVAL for the past discussion: #11 |
The changes were all just reorganising the existing AVX2 intrinsics to leverage the Rust type system. Next weekend I'll try swapping back in the un-refactored code to confirm it was indeed faster, and then I'll start trying to figure out how the underlying assembly is being altered.
That makes sense to me; we should be consistently selecting the AVX2 backends for the ChaCha20 and Poly1305 components together. I do want auto-detection somewhere though, as it's very inconvenient both as an application developer to be forced to generate separate AVX2 binaries, and as a user to need to care about exactly which link I click to get performance. If someone does want smaller AVX2-only binaries though, it should be possible to obtain them. |
Hmm... it turns out that the AVX2 backend is only ridiculously slow (on my laptop) when compiled without the
|
dff6ae7
to
d7c38e3
Compare
I've force-pushed to switch to compile-time backend selection. Unfortunately this means removing the comparative fuzzer; I will rework the crashes it found into regular test cases, so we can still figure them out. As for the speed issue... it looks to me like the AVX2 backend just generates more assembly than the software backend. It would be useful for someone else to try running this branch to see what their experience is. |
FWIW, the https://github.com/RustCrypto/universal-hashes/blob/master/polyval/src/field.rs Kinda gross, but I did it for the same reason (at least initially): equivalence testing/fuzzing between the two |
Codecov Report
@@ Coverage Diff @@
## master #49 +/- ##
===========================================
- Coverage 69.21% 34.20% -35.01%
===========================================
Files 11 13 +2
Lines 471 953 +482
===========================================
Hits 326 326
- Misses 145 627 +482
Continue to review full report at Codecov.
|
I've rebased this onto current |
Looking again at the failures I reported earlier:
This test vector was removed by @tarcieri; was it just an invalid vector?
I re-added the fuzzer helper, and the eight test cases it found previously. Seven of them are still failing. |
16fc637
to
938a710
Compare
Originally derived from Goll and Gueron's AVX2 C code. The logic has been extensively rewritten and documented, and several bugs in the original C code were fixed.
The crash test cases were found by fuzzing with AFL after the AVX2 backend had been reviewed, refactored, and documented.
I've rebased the PR (so it now contains all the Poly1305 test vectors), and split out my (likely inefficient) fix for I'm working on |
3c481a6
to
c1131db
Compare
I found the bug! All tests and previous fuzzer crashes now pass 🎉 |
@str4d want to mark it as ready for review? |
Fixes `donna_self_test1` and `fuzz::crash_6`.
No changes to the logic or AVX2 instructions. Includes documentation for `fuzz::crash_3`.
Fixes all remaining fuzzer crashes and test vectors.
I had a couple of cleanups to do, and I wanted to throw the fuzzer at it again for a few minutes (to shake out any other low-hanging crashes). Done the first, and |
Found from the included new fuzzer crash.
That last crash is now fixed. I found it after 3 minutes of fuzzing, and didn't find anything else after an additional 20 minutes. I'll run some more fuzzing today, but for now I think this is ready to review 😄 |
For use with afl.rs: https://rust-fuzz.github.io/book/afl/tutorial.html
The AVX2 backend can be selected at compile time with
RUSTFLAGS "-Ctarget-feature=+avx2"
.