argon2: add workaround for big 64-byte aligned allocations? #573

newpavlov · 2025-03-05T13:18:31Z

It was previously discussed in #566.

The claim is that for big allocations we can get higher performance by allocating len + 64 bytes with 1-byte alignment (we then can manually construct a 64-byte aligned region in the allocated memory with size len), than by directly allocating len bytes with 64-byte alignment.

cc @jonasmalacofilho

The text was updated successfully, but these errors were encountered:

jonasmalacofilho · 2025-03-05T13:44:14Z

than by directly allocating len bytes with 16-byte alignment.

Small correction: the issue is with allocations using alignment greater than 16 (or, more generally, more than the maximum alignment supported by calloc).

In particular, the case we care about is allocating Blocks, which are 64-byte aligned.

newpavlov · 2025-03-05T13:55:40Z

Oh, you are right. Fixed.

BTW do we really need the 64 byte alignment in the first place? IIUC this alignment is too strict for SIMD vectors and it looks like an optimization which accounts for cache line size.

jonasmalacofilho · 2025-03-05T14:09:23Z

Yes, (I think) it's more about cache line size and, specifically, preventing false sharing. It gives a ~5% improvement over 16-byte alignment, if I recall correctly.

I also tried 128-byte alignment, which in theory makes sense for modern 64-bit architectures (including x86-64), and is the value adopted by most general solutions for false sharing (e.g. crossbeam::utils::CachePadded) on these platforms, but any further improvements were offset by more instructions being generated, at least when I tested it (c0ce7f9).

It probably also matters that in Argon2 we can't false share any block, just blocks on the boundaries of the slices. False sharing here isn't as much of an issue as it can be in other cases.

newpavlov · 2025-03-05T14:16:43Z

Have you tried to directly mmap the memory?

jonasmalacofilho · 2025-03-05T14:18:07Z

No, I haven't.

newpavlov changed the title ~~argon2: add workaround for big 16-byte aligned allocations?~~ argon2: add workaround for big 64-byte aligned allocations? Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

argon2: add workaround for big 64-byte aligned allocations? #573

argon2: add workaround for big 64-byte aligned allocations? #573

newpavlov commented Mar 5, 2025 •

edited

Loading

jonasmalacofilho commented Mar 5, 2025

newpavlov commented Mar 5, 2025

jonasmalacofilho commented Mar 5, 2025 •

edited

Loading

newpavlov commented Mar 5, 2025

jonasmalacofilho commented Mar 5, 2025

argon2: add workaround for big 64-byte aligned allocations? #573

argon2: add workaround for big 64-byte aligned allocations? #573

Comments

newpavlov commented Mar 5, 2025 • edited Loading

jonasmalacofilho commented Mar 5, 2025

newpavlov commented Mar 5, 2025

jonasmalacofilho commented Mar 5, 2025 • edited Loading

newpavlov commented Mar 5, 2025

jonasmalacofilho commented Mar 5, 2025

newpavlov commented Mar 5, 2025 •

edited

Loading

jonasmalacofilho commented Mar 5, 2025 •

edited

Loading