-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chacha20: Add full NEON backend. #310
Conversation
Observed performance changes on an Apple M1 Air: ``` name chacha-soft ns/iter chacha-neon ns/iter diff ns/iter diff % speedup chacha12_bench1_16b 34 (470 MB/s) 28 (571 MB/s) -6 -17.65% x 1.21 chacha12_bench2_256b 477 (536 MB/s) 132 (1939 MB/s) -345 -72.33% x 3.61 chacha12_bench3_1kib 1,897 (539 MB/s) 503 (2035 MB/s) -1,394 -73.48% x 3.77 chacha12_bench4_16kib 30,811 (531 MB/s) 7,914 (2070 MB/s) -22,897 -74.31% x 3.89 chacha20_bench1_16b 51 (313 MB/s) 47 (340 MB/s) -4 -7.84% x 1.09 chacha20_bench2_256b 777 (329 MB/s) 212 (1207 MB/s) -565 -72.72% x 3.67 chacha20_bench3_1kib 3,088 (331 MB/s) 821 (1247 MB/s) -2,267 -73.41% x 3.76 chacha20_bench4_16kib 50,251 (326 MB/s) 13,001 (1260 MB/s) -37,250 -74.13% x 3.87 chacha8_bench1_16b 26 (615 MB/s) 19 (842 MB/s) -7 -26.92% x 1.37 chacha8_bench2_256b 335 (764 MB/s) 92 (2782 MB/s) -243 -72.54% x 3.64 chacha8_bench3_1kib 1,328 (771 MB/s) 344 (2976 MB/s) -984 -74.10% x 3.86 chacha8_bench4_16kib 21,184 (773 MB/s) 5,371 (3050 MB/s) -15,813 -74.65% x 3.94 ```
Welp, didn’t realize that |
As it were we have another PR open to add ARM support to the You can use Unfortunately there's no M1-specific solution until this lands: github/roadmap#528
FWIW we'd be fine with a nightly-only feature. We have similar nightly-only ARM features in the
Sure looking forward to the eventual stabilization of |
FWIW, this code compiles and passes tests on my M1 Air using |
Also regarding this specifically:
There are various stable workarounds, such as using I'm guessing that's not the only nightly-only NEON intrinsics support you need, though. |
Maybe all the intrinsics you need are stabilized on 1.64? If that's the case, you can just feature-gate support to prevent MSRV breakages. It'd be good to know the actual MSRV of the feature. |
Ok, looks like |
Enabled CI (using the existing but commented-out I think this is good to go. Let me know if anything else needs changing. |
Cool, will try to review this week sometime |
@@ -32,6 +32,7 @@ hex-literal = "0.3.3" | |||
[features] | |||
std = ["cipher/std"] | |||
zeroize = ["cipher/zeroize"] | |||
neon = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth considering using a cfg!
attribute here /cc @newpavlov
I think the only reason to have a gated implementation at all is MSRV compatibility. We can bump to MSRV 1.61 in the next release (bumping to 1.60 has some nice additions like weak feature activation).
Semi-related issue: RustCrypto/sponges#24 (comment)
FWIW, we use cfg!
attributes to gate aes
and pmull
intrinsics on ARMv8
This otherwise seems like a fairly straightforward port of |
Going to merge this and follow up on how it should be gated |
Could you make a new release with this PR? |
Unfortunately I haven’t added the gating I suggested which should be implemented prior to a release |
I ported the NEON implementation of ChaCha from Crypto++ (public domain) to Rust with aarch64 intrinsics for a significant performance boost.
Observed performance changes on an Apple M1 Air:
Closes #287.
I’m not entirely certain I’ve got the flag/feature/cpuid token stuff right, and would appreciate any guidance about that. At this point the best I’ve got is that it very definitely works on my machine, an M1 Air. Also, no idea how one runs GitHub Actions on a aarch64/NEON platform. QEMU?