Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate possibility of re-enabling threads? #52

Closed
TomMettam opened this issue Sep 12, 2020 · 16 comments
Closed

Investigate possibility of re-enabling threads? #52

TomMettam opened this issue Sep 12, 2020 · 16 comments

Comments

@TomMettam
Copy link

Hello!

In issue #15 threads were disabled. My understanding is that this means that generating a hash with a parallelism value of > 1 becomes exponentially more expensive for the defender but cheaper for an attacker using a different library.

WebAssembly Threads are a lot more mature now, being enabled by default in Chrome since v74. Could the possibility of re-enabling threads be investigated?

@antelle
Copy link
Owner

antelle commented Sep 13, 2020

Hi!
Well, it should be not exponentially, but linearly more expensive: N threads * X operations = N * X operations in one thread.
It's interesting that increasing the number of threads doesn't make it worse on web, I don't know why. The parameters I used are Iterations=128, Memory=16384 KiB, Type=Argon2i: -t 128 -m 14.

Here's the time I get, while results are absolutely the same:

Native WASM
1 thread 1.0s 5.0s
2 threads 1.1s 5.1s
4 threads 1.2s 5.1s
8 threads 1.7s 5.1s
16 threads 2.0s 5.1s
128 threads 4.2s 5.2s
1024 threads 26s 7.3s

I need to investigate how threading works in argon2 and on which parameters we can benefit from it. I tried increasing the number of iterations, but it behaves the same way, that's very strange.

@antelle
Copy link
Owner

antelle commented Sep 13, 2020

More iterations, less memory, Argon2i, Iterations=10000, Memory=1024 KiB: -t 10000 -m 10 -i (results are the same again).

Native WASM
1 thread 4.1s 22s
2 threads 6.8s 23s
4 threads 9.5s 23s
8 threads 16s 24s

I'm not sure I get the point of having threading support in Argon2 🤔 when do we need it?

@antelle
Copy link
Owner

antelle commented Sep 13, 2020

If you're interested, I've just built a version with SIMD, check it out, it's 2x faster: https://antelle.net/argon2-browser/
(works only in Chrome with a flag enabled; not for production)
For threading support – I need some input here. I don't see it being helpful right now, but I'll play more with it, maybe I'll figure it out.

Just noticed how the native version behaves without threading support:

echo -n password | argon2 somesalt -t 10000 -m 10 -i -p 4
Type:		Argon2i
Iterations:	10000
Memory:		1024 KiB
Parallelism:	4
Hash:		93c411bb2d52104d943d13b30665095ba7e1ed81e0a95fb1899547ea4987e3e1
Encoded:	$argon2i$v=19$m=1024,t=10000,p=4$c29tZXNhbHQ$k8QRuy1SEE2UPROzBmUJW6fh7YHgqV+xiZVH6kmH4+E
1.446 seconds
Verification ok

And with threading support:

echo -n password | argon2 somesalt -t 10000 -m 10 -i -p 4
Type:		Argon2i
Iterations:	10000
Memory:		1024 KiB
Parallelism:	4
Hash:		93c411bb2d52104d943d13b30665095ba7e1ed81e0a95fb1899547ea4987e3e1
Encoded:	$argon2i$v=19$m=1024,t=10000,p=4$c29tZXNhbHQ$k8QRuy1SEE2UPROzBmUJW6fh7YHgqV+xiZVH6kmH4+E
6.834 seconds
Verification ok

@antelle
Copy link
Owner

antelle commented Sep 13, 2020

Okay I see, it reports CPU time, not wallclock: P-H-C/phc-winner-argon2#204.

@antelle
Copy link
Owner

antelle commented Sep 13, 2020

Threaded version is actually quite easy to build, however it doesn't give any improvement while the file size grows significantly (2x wasm and 4x js), you can also try doing it by adding -s USE_PTHREADS=1:

 39K argon2.wasm
 60K argon2.js

At the moment I don't think we should release it, but I'll keep an eye on its progress.

@glihm
Copy link

glihm commented Sep 13, 2020

Hi!
Well, it should be not exponentially, but linearly more expensive: N threads * X operations = N * X operations in one thread.
It's interesting that increasing the number of threads doesn't make it worse on web, I don't know why. The parameters I used are Iterations=128, Memory=16384 KiB, Type=Argon2i: -t 128 -m 14.

Here's the time I get, while results are absolutely the same:

Native WASM
1 thread 1.0s 5.0s
2 threads 1.1s 5.1s
4 threads 1.2s 5.1s
8 threads 1.7s 5.1s
16 threads 2.0s 5.1s
128 threads 4.2s 5.2s
1024 threads 26s 7.3s
I need to investigate how threading works in argon2 and on which parameters we can benefit from it. I tried increasing the number of iterations, but it behaves the same way, that's very strange.

I confirm, in my implementation I had tried too and the results are very closed even with more threads.

I do think this is completely wanted. Argon2 is designed to limit hardware brute-forcing.
Check this article guys, very instructive: https://eprint.iacr.org/2016/104.pdf
In the section 2: "Security Requirements of Password-Hashing Schemes".

From my understanding, the parallelism is here to enforce the hardware to use more resources, even if this is not speeding up the process.

So, activating the multi-threading is important in a way that you enforce someone trying to attack you to use more resources.

@antelle
Copy link
Owner

antelle commented Sep 13, 2020

@glihm it depends how calculations are made: either we set a predefined complexity and then threading support should make it faster (that's how I assume it works), or complexity grows with increasing parallelism (then single-threaded implementation would take more time for higher parallelism values).

@TomMettam
Copy link
Author

With the native implementation, increasing the amount of parallelism is intended to keep the time constant but increase complexity.

It's very strange to me that using a single thread with a larger parallelism value would not incur a larger time penalty. Perhaps the threading isn't being utilised correctly (so it's always using one thread)?

@TomMettam
Copy link
Author

I will do some experiments and research on this topic.

@antelle
Copy link
Owner

antelle commented Sep 13, 2020

increasing the amount of parallelism is intended to keep the time constant but increase complexity

Is it stated anywhere in docs? I see that it's the opposite: increasing parallelism allows implementations to calculate the hash using several threads, then how they make use of this possibility it's up to them. For example, in WASM we're not using threads at all, so the performance remains constant. The native tool uses them with different degree of success.

@antelle
Copy link
Owner

antelle commented Sep 13, 2020

Found it in docs:
image

So, it's like I describe it: it's number of chains that can be run. The complexity is exactly the same, however required work can be splitted.

@antelle
Copy link
Owner

antelle commented Sep 14, 2020

@TomMettam I've pushed my threading experiments to this branch, it contains a working example that can run up to 8 threads, you can play with it if you want. Compile it, break it, love it, hate it, let me know if you manage to get anything interesting out of it. I've also uploaded it here if you just want to run it in your browser and compare performance: https://share.antelle.net/argon2-threads/, spoiler: it's slow 🐌 (≈2x) and big 🥴 (≈+40k).

@antelle
Copy link
Owner

antelle commented Sep 14, 2020

I believe this is now resolved and we can close the issue. The branch is here, and I'll return to it once a year or if there's any news about threads in WASM.

@quexten
Copy link

quexten commented Feb 21, 2023

@TomMettam I've pushed my threading experiments to this branch, it contains a working example that can run up to 8 threads, you can play with it if you want. Compile it, break it, love it, hate it, let me know if you manage to get anything interesting out of it. I've also uploaded it here if you just want to run it in your browser and compare performance: https://share.antelle.net/argon2-threads/, spoiler: it's slow snail (≈2x) and big woozy_face (≈+40k).

I realize this is not an active topic anymore, I just want to note that in my compiled test, the threaded version is significantly faster than the non-threaded version.

On latest master branch, vs threaded branch on latest emscripten docker image and using Firefox I get ~10 seconds, vs ~2.5 seconds on parallelism = 16, iterations = 10, memory = 1GiB on a 4 core, 8 thread system.

@Nantris
Copy link
Contributor

Nantris commented Mar 23, 2023

Any chance of re-enabling threads?

@quexten how's testing been on your branch? Are you using it in any projects?

@Nantris
Copy link
Contributor

Nantris commented Apr 7, 2023

Friendly bump.

@antelle is this project still maintained? I noticed there haven't been any commits in a while but sometimes that's normal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants