You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
argon2: allocate blocks as a single chunk of bytes
While investigating the scaling performance of the parallel
implementation, I noticed a substantial chunk of time taken on
block allocation in `hash_password_into`.
The issue lies in `vec![Block::default; ...]`, which clones the supplied
block. This happens because the standard library lacks a suitable
specialization that can be used with `Block` (or, for that matter,
`[u64; 128]`).
Therefore, let's instead allocate a big bag of bytes and then transmute
it, or more precisely a mutable slice into it, to produce the slice of
blocks to pass into `hash_password_into_with_memory`.
One point to pay attention to is that `Blocks` currently specifies
64-byte alignment, while a byte slice has alignment of 1.
Luckily, `slice::align_to_mut` is particularly well suited for this. It
is also cleaner and less error prone than other unsafe alternatives I
tried (a couple of them using `MaybeUninit`).
This patch passes Miri on:
reference_argon2i_v0x13_2_8_2
reference_argon2id_v0x13_2_8_2
And the performance gains are considerable:
argon2id V0x13 m=2048 t=8 p=4
time: [3.3493 ms 3.3585 ms 3.3686 ms]
change: [-6.1577% -5.7842% -5.4067%] (p = 0.00 < 0.05)
Performance has improved.
argon2id V0x13 m=32768 t=4 p=4
time: [24.106 ms 24.253 ms 24.401 ms]
change: [-9.8553% -8.9089% -7.9745%] (p = 0.00 < 0.05)
Performance has improved.
argon2id V0x13 m=1048576 t=1 p=4
time: [181.68 ms 182.96 ms 184.35 ms]
change: [-28.165% -27.506% -26.896%] (p = 0.00 < 0.05)
Performance has improved.
(For the users that don't allocate the blocks themselves).
0 commit comments