Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync Argon2 implementation with latest upstream #2738

Open
solardiz opened this issue Sep 8, 2017 · 5 comments
Open

Sync Argon2 implementation with latest upstream #2738

solardiz opened this issue Sep 8, 2017 · 5 comments

Comments

@solardiz
Copy link
Member

solardiz commented Sep 8, 2017

Since we got the upstream Argon2 code into jumbo, there's been a contribution to upstream adding AVX2 support, which we lack:

https://github.com/P-H-C/phc-winner-argon2/pull/208/files

We should update, primarily to get this. (I expect that AVX-512 might follow, needing another sync.)

When we update, we should also try dropping the workaround we applied for #2650. I expect that the issue had been fixed upstream, and we must check this, and if not fixed (unlikely) then notify upstream that the issue is still unfixed.

We should also try to make it easier to sync to latest upstream. Can we possibly have all of Argon2 upstream code, even if with our changes, in a subdirectory, and keep the upstream's filenames? I understand this may be tricky to combine with our use of *_plug.c filenames. Maybe we need a generic alternative to those? Like also checking for *_plug.a static archives, which could be created from sets of object files built in subdirectories. Or we can treat Argon2 specially.

@solardiz
Copy link
Member Author

solardiz commented Sep 9, 2017

Just found that this fork of upstream https://github.com/WOnder93/argon2 already has not only AVX2, but also AVX-512 (and we can/should easily add MIC, and test), and support for use of preallocated memory (something I think we're adding with our own changes relative to upstream). Unfortunately(?), it also adds CPUID stuff (probably runtime CPU dispatching), which would be nice if we had it in JtR consistently, but is unneeded in our current way of doing things (with fallback binaries).

I think ideally someone (we?) should extract and contribute AVX-512 (and MIC) and preallocated memory to the main upstream project, and use that. So that our differences from upstream are smaller, and so that more projects benefit from this functionality.

There's also https://github.com/yandex/argon2, but it's C++ (and again, we don't need the runtime CPU dispatching - at least not for Argon2 separately), so unsuitable for us.

@solardiz
Copy link
Member Author

solardiz commented Sep 9, 2017

A related issue (should we track it separately?) is the potential optimization for 2i and 2id, where the data-independent indices can be reused across hash computations, rather than recomputed each time like upstream does. In our current hack of older upstream code, we're already passing the pseudo_rands array from the application, yet somehow we don't appear to be making this optimization. In latest upstream code, this array is gone - as far as I can tell, the data-independent indices are being calculated in smaller portions, which makes sense for that approach - but we'll probably need to reintroduce an equivalent of the array (as an option), to be written-to (if for the first time or when invoked with higher parameters) and reused (on subsequent calls with same or lower parameters) where the new upstream code normally does these things (so that we won't deviate from upstream too much). This probably means in next_addresses().

I think https://gitlab.com/omos/argon2-gpu already has this optimization (for GPU) - grep it for "precompute".

@solardiz
Copy link
Member Author

solardiz commented Oct 5, 2017

Meanwhile, AVX-512 support (not MIC, though) has been added upstream, so we'll get it for free when we sync.

@solardiz solardiz added this to the 1.8.0-jumbo-2 milestone Oct 21, 2017
@solardiz
Copy link
Member Author

Regarding "use of preallocated memory", I realized this can be implemented on top of current upstream code as-is using its memory (de)allocation callbacks, which we'll override. In our deallocator, we'll save the pointer and size and mark them as free in our own per-thread variables instead of calling free(). In our allocator, we'll check if the size is same as or smaller than the saved size and if our previous allocation is marked free (in our own Boolean variable) and if so reuse the old allocation, or realloc() as necessary, or error out if the previous allocation is not marked free (shouldn't happen). Better yet, we should reuse (y)escrypt's alloc_region() in there, simplifying some of those checks (alloc_region() already has similar checks) and taking advantage of its explicit huge pages support. We'd only call free_region() from the format's done().

However, we also need to override and disable upstream Argon2's memory cleansing. If they implement a callback for that as well (as I requested in P-H-C/phc-winner-argon2#231) we'd use that, otherwise this is trivial to patch (introduce a return; at the beginning of secure_wipe_memory(). This has major performance impact.

@solardiz
Copy link
Member Author

Looks unrealistic we'll make relevant changes and properly re-test them on all platforms before the release. Changed milestone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants