std.crypto.chacha: support larger vectors on AVX2 and AVX512 targets #15809

jedisct1 · 2023-05-21T22:47:37Z

Ryzen 7 7700, ChaCha20/8 stream, long outputs:

Generic: 3268 MiB/s
AVX2   : 6023 MiB/s
AVX512 : 8086 MiB/s

Apple M1 CPUs seem to also benefit from a 4-way implementation on micro-benchmarks, but only on full-round ChaCha, and I'm not sure we can generalize it to other aarch64 CPUs, or to real workloads.

~~So, enable this only on x86_64 for now.~~ verified to also improve performance on a Cortex A72, so enabling on aarch64, too.

Bump the rand.chacha buffer a tiny bit to take advantage of this. More than 8 blocks doesn't seem to make any measurable difference.

ChaChaPoly also gets a small performance boost from this, albeit Poly1305 remains the bottleneck.

Generic:  707 MiB/s
AVX2   :  981 MiB/s
AVX512 : 1202 MiB/s

Ryzen 7 7700, ChaCha20/8 stream, long outputs: Generic: 3268 MiB/s AVX2 : 6023 MiB/s AVX512 : 8086 MiB/s Apple M1 CPUs seem to also benefit from a 4-way implementation on micro-benchmarks, but only on full-round ChaCha, and I'm not sure we can generalize it to other aarch64 CPUs, or to real workloads. So, enable this only on x86_64 for now. Bump the rand.chacha buffer a tiny bit to take advantage of this. More than 8 blocks doesn't seem to make any measurable difference. ChaChaPoly also gets a small performance boost from this, albeit Poly1305 remains the bottleneck. Generic: 707 MiB/s AVX2 : 981 MiB/s AVX512 : 1202 MiB/s

Verified on Apple Silicon, but also on a Cortex A72.

lib/std/crypto/chacha20.zig

jedisct1 added 2 commits May 22, 2023 00:45

aarch64 appears to generally benefit from 4-way vectorization

ec7a3e9

Verified on Apple Silicon, but also on a Cortex A72.

jedisct1 added the standard library This issue involves writing Zig code for the standard library. label May 22, 2023

jedisct1 merged commit 5af89b3 into ziglang:master May 22, 2023

recursivetree reviewed May 22, 2023

View reviewed changes

lib/std/crypto/chacha20.zig Show resolved Hide resolved

jedisct1 deleted the chacha-vec-2-4 branch May 22, 2023 18:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std.crypto.chacha: support larger vectors on AVX2 and AVX512 targets #15809

std.crypto.chacha: support larger vectors on AVX2 and AVX512 targets #15809

jedisct1 commented May 21, 2023 •

edited

Loading

std.crypto.chacha: support larger vectors on AVX2 and AVX512 targets #15809

std.crypto.chacha: support larger vectors on AVX2 and AVX512 targets #15809

Conversation

jedisct1 commented May 21, 2023 • edited Loading

jedisct1 commented May 21, 2023 •

edited

Loading