Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove sha256crypt ($5$) and sha512crypt ($6$) from strong set. #35

Closed
wants to merge 1 commit into from

Conversation

zackw
Copy link
Collaborator

@zackw zackw commented Sep 10, 2018

These are the strongest hashes supported by glibc’s libcrypt, but
they have no particular protection against hardware parallel
attacks (e.g. using GPUs to iterate SHA-2) and should probably
only be used for new hashes in mixed environments where yescrypt
and/or bcrypt aren’t universally supported.

It’s difficult to find information about how difficult these hashes
specifically are to brute-force; the big online services whose user
databases got dumped seem to have skipped straight from SHA-1 or worse
straight to bcrypt. I think what I said in NEWS is accurate as far as
it goes, but it might be nice to add more concrete numbers and advice
to crypt.5.

These are the strongest hashes supported by glibc’s libcrypt, but
they have no particular protection against hardware parallel
attacks (e.g. using GPUs to iterate SHA-2) and should probably
only be used for new hashes in mixed environments where yescrypt
and/or bcrypt aren’t universally supported.

It’s difficult to find information about how difficult these hashes
specifically are to brute-force; the big online services whose user
databases got dumped seem to have skipped straight from SHA-1 or worse
straight to bcrypt.  I think what I said in NEWS is accurate as far as
it goes, but it might be nice to add more concrete numbers and advice
to crypt.5.
@zackw zackw requested a review from besser82 September 10, 2018 16:34
@rfc1036
Copy link

rfc1036 commented Sep 10, 2018

https://gist.github.com/epixoip/a83d38f412b4737e99bbef804a270c40 is a good benchmark, even if a bit old, and it shows that in practice sha512crypt can be cracked 10x faster than bcrypt.
So in the general case sha-512 should be as good as bcrypt as long as 10x more rounds are used: the glibc default is 5000.

Copy link
Owner

@besser82 besser82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't consider SHA-2 crypt strong anymore, we should do the same for bcrypt ($2*$) as the same reason applies there, too, but using FPGA instead of GPU.

See: http://www.openwall.com/lists/john-users/2017/06/25/1

@solardiz
Copy link
Collaborator

All of these hashes are tunable, so ideally any benchmarks we would consider here should be for defensive performance vs. cracking performance, or for cracking performance at similar and currently relevant defensive throughput (unless we assume that anyone using SHA-crypt would use the default of rounds=5000 or else migrate to another hash type? maybe). The benchmarks cited above are cracking-only at tunable settings that are historically used for benchmarking, but that have somewhat different defensive throughput (SHA-crypt's default of rounds=5000, and bcrypt at cost 5 unless stated otherwise), so direct/exact comparisons of these performance numbers between different hash types are at best not ideal.

Besides the performance aspects, a drawback of SHA-crypt (and especially sha256crypt) is its running time being highly dependent on password length. For unusually high lengths, it's O(length^2). For low lengths, in sha256crypt there are very noticeable (tens of percent) differences in running time for realistic and attack-relevant password length ranges - e.g., for 7 vs. 8 or 11 vs. 12 characters, with the exact thresholds varying by salt length. (For sha512crypt, similar slowdowns occur at much higher lengths - tens of characters - so not practically relevant as an infoleak.) This also limits the cost setting a sysadmin or distro vendor could realistically use - would have to consider the worst case (longest) allowed passwords, and as a result have unnecessarily high throughput (including for cracking) for more realistic lengths.

There's also a relatively tiny timing leak (in both sha256crypt and sha512crypt) of 8 bits of an intermediate hash value (independent of password length), so precise timing measurements would allow an attacker to rule out 255 out of 256 possible passwords (or better if also considering lengths), but I think this attack will remain impractical and thus only theoretical (although it can be demo'ed).

bcrypt, scrypt, and yescrypt are not ideal in that they're cache-timing unsafe. That's a tough security trade-off (avoiding this property makes the password hash weaker in other ways). But they avoid "worse than cache" timing leaks, whereas SHA-crypt literally has conditional branching based on sensitive data (not limited to password length - also on that 8-bit intermediate hash).

@solardiz
Copy link
Collaborator

The proposed change to NEWS (indirectly) calls bcrypt "newer", whereas it actually pre-dates SHA-crypt by a decade. For the same reason, bcrypt wasn't deliberately hardened against GPUs - those were quite different and were not used for password cracking at the time bcrypt was introduced. It just happened to be GPU-unfriendly (at first very unfriendly, but newer GPUs gradually become better at it).

I am just sharing what I know and notice. I do not yet vote for/against a change like what's proposed here.

@solardiz
Copy link
Collaborator

For our current results at SHA-crypt on FPGA, see:

https://www.openwall.com/lists/john-users/2018/07/23/1 (sha512crypt)
https://www.openwall.com/lists/john-users/2018/08/27/11 (sha256crypt)

These are much worse than the corresponding latest GPU speeds per chip, but are on par with latest GPUs per Watt. However, they're on FPGAs from many years ago (45nm, circa 2011).

@zackw
Copy link
Collaborator Author

zackw commented Sep 19, 2018

I had been under the impression that sha{256,512}crypt were known to be significantly weaker than bcrypt at any cost parameter level -- it seems I was wrong. Let's close this PR for now and make a note that we need to collect solid data to base any decision on.

The branch for #26 now contains a sketch implementation of a program called crypt-tune-costs that's meant to select appropriate cost parameters for the machine it's run on, but all it does is measure what cost parameter makes each hash take some amount of wall-clock time (defaulting to 250ms). I'm not even sure this is the right thing for the stated purpose of per-installation tuning, but perhaps it can inform this discussion a little? On my laptop, these are the numbers I get:

hash         cost     elapsed
----         ----     -------
yescrypt     10       316.05ms
scrypt       9        355.83ms
bcrypt       13       404.67ms
sha512crypt  293877   250.23ms
sha256crypt  266405   253.79ms
sha1crypt    270947   252.51ms
md5_sun      175569   251.48ms
des_bsd      1966043  251.21ms

(if you run the program yourself, pass --enabled=all, the output will be in a slightly different format, and don't worry about what "enabled" vs "legacy" means)

@zackw zackw closed this Sep 19, 2018
@zackw zackw deleted the zack/sha2-not-strong branch September 19, 2018 01:27
@solardiz
Copy link
Collaborator

https://gist.github.com/epixoip/a83d38f412b4737e99bbef804a270c40 is a good benchmark, even if a bit old

We don't have to use that one, Jeremi has since posted a newer one for 8x1080Ti and newer hashcat:

https://gist.github.com/epixoip/ace60d09981be09544fdd35005051505

He also posted bcrypt benchmarks on 4 different recent GPUs:

https://gist.github.com/epixoip/9d9b943fd580ff6bfa80e48a0e77520d

@solardiz
Copy link
Collaborator

solardiz commented Sep 19, 2018

Tuning for same defensive running time on one logical CPU on an idle system isn't ideal. A more important figure (limiting the maximum cost setting that can be used on a server) is throughput (hashes computed per second) under full CPU utilization (all logical CPUs) by concurrent hash computations. But the differences between these two metrics are usually within 2x (of course, first taking into consideration that they're inversely proportional), so I'll just use @zackw's results below. Also, I don't know what password length and salt these benchmarks are for - and this matters a lot for sha256crypt - but again usually within a factor of 2. hashcat's are for password length 7, salt length 8.

Finally, I think 250ms on a modern CPU is excessive as a distro's default (more useful is the range up to 100ms), and with our current single-parameter input to yescrypt would consume more memory (great for security) than a distro would dare to use for this task by default (considering it's per each concurrent authentication attempt, so can be multiple GB total). Thankfully, for the 4 hash types I reference below this isn't a problem as it relates to scaling the existing results.

Based on Jeremi's 8x1080Ti benchmarks I referenced above, we can expect these speeds at the cost settings from @zackw's comment above:

hash 8x1080Ti cracking h/s per 250ms on @zackw's laptop
bcrypt 1.2k
sha512crypt 31.4k
sha256crypt 89.6k
bsdicrypt 7.1k (scaled from actual) to 145k (potential)

No directly comparable data for other hash types @zackw listed.

Scaling formulas I used:

184.8*1000*(2^5*1042+585)/(2^13*1042+585)*404.67/250 = 1188.9
1849.1*1000*5000/293877 = 31460
4774.6*1000*5000/266405 = 89612
19333.5*1000*725/1966043 = 7129

Per these results, SHA-crypt and even bsdicrypt certainly do appear a lot cheaper than bcrypt to attack on GPUs. However, per the speed for bsdicrypt I guess hashcat's implementation of it is not bitsliced (sorry, too lazy to check the code). Using the number for descrypt (which I know is bitsliced) instead we'd get:

11414.2*10^6*25/1966043 = 145142

as the potential speed for a properly optimized implementation. This needs to be scaled down somewhat since bsdicrypt will require extra registers for the pointers to extra 12 bits of salt (24-bit vs. 12-bit salt), and register pressure is often the limiting factor for performance of bitslice implementations. So maybe 100k, which is on par with sha256crypt.

So bcrypt does stand out.

On FPGAs, the results are very different (see comment below), but GPUs are still far more accessible (this might change somewhat as/if we proceed to implement bcrypt on AWS F1).

@solardiz
Copy link
Collaborator

solardiz commented Sep 19, 2018

hash one ZTEX 1.15y board (4 FPGAs) cracking h/s per 250ms on @zackw's laptop
bcrypt 747
sha512crypt 936
sha256crypt 2.5k
bsdicrypt 10k (potential)

Scaling formulas I used:

106000*(2^5*1042+585)/(2^13*1042+585)*404.67/250 = 682 # scaling from cost 5 at 141 MHz
461.6*404.67/250 = 747 # actual run at cost 13 at 150 MHz
55000*5000/293877 = 936 # 160 MHz (actual measured at rounds=293877 is up to 880 or trying to buffer more work we hit communication timeouts - looks like it's something for us to improve)
133000*5000/266405 = 2496 # 160 MHz (actual measured at rounds=266405 is 2520)
800*10^6*25/1966043 = 10173 # 220 MHz descrypt cores, 160 MHz comparators

We don't have an implementation of bsdicrypt on FPGA, so I used scaling from descrypt - it's similar enough.

It looks like bcrypt wins even at this test.

Note: these Spartan-6 LX150 FPGAs are many years older than the 1080Ti GPUs used for the previous comparison, and they were not high-end even when they were new (unlike these GPUs) - they were merely at the end of the budget Spartan-6 line (whereas high-end ones were the Virtex-6 line). Also, this board consumes 30W to 40W total for the 4 FPGAs vs. 1 kW to 2 kW for the 8 GPUs. So these performance figures are given here for comparison of the different hash types (on these two kinds of hardware when the hashes are tuned for same performance on CPU), not for a comparison of GPUs vs. FPGAs.

@zackw
Copy link
Collaborator Author

zackw commented Sep 19, 2018

Here's what I get for target times of 100ms, on the same computer:

yescrypt     9        157.85ms
scrypt       8        176.88ms
bcrypt       11       100.22ms
sha512       118781   100.40ms
sha256       105841   100.03ms
sha1         107239   100.09ms
md5_sun      67433    100.08ms
des_bsd      789397   100.21ms

and 50ms:

yescrypt     8         78.11ms
scrypt       7         89.30ms
bcrypt       11       100.11ms
sha512       59723     50.31ms
sha256       53743     50.96ms
sha1         54495     50.91ms
md5_sun      10001     50.17ms
des_bsd      392947    50.11ms

For exponential cost parameters, it chooses the smallest number producing a time above the target.

I will think about revising the program to estimate throughput instead, but I'm not sure when I will have time to implement it.

@solardiz
Copy link
Collaborator

Thanks @zackw. These results scale for the 4 hash types I included in my comparison as expected, so I don't need to redo my calculations.

@solardiz
Copy link
Collaborator

I will think about revising the program to estimate throughput instead, but I'm not sure when I will have time to implement it.

While throughput is more relevant for defense vs. attack comparisons, it makes sense to keep tuning for elapsed time as a feature as well. Ideally, it should be possible to tune for either throughput or elapsed time (command-line options?), and have the other metric output as well. Or maybe it'll be more metrics to output: also elapsed time when running 1 thread vs. when running max threads. Could also output min/avg/max like I do in yescrypt's userom.c program (which I also hope to find time to rewrite one day):

Benchmarking 1 thread ...
100 c/s real, 100 c/s virtual (255 hashes in 2.54 seconds)
Benchmarking 56 threads ...
2314 c/s real, 41 c/s virtual (14280 hashes in 6.17 seconds)
min 13.440 ms, avg 24.049 ms, max 25.467 ms

(In this example, "min" is the lowest seen when running 56 threads, and it just happens to be similar to what would be typical for 1 thread. We should probably also have explicit min/avg/max output for 1 thread.)

@solardiz
Copy link
Collaborator

Maybe it makes sense to start by removing only sha256crypt ($5$) from strong set? Per the results I posted in here, it's 2.5x to 3x faster than sha512crypt on GPUs and FPGAs while these two hash types are tuned for the same speed on CPU. It's also more sensitive to password lengths, even within the commonly used range. And it's relatively rarely used.

Also, with libxcrypt's MD5, SHA-256, and SHA-512 implementations having just been replaced with permissively licensed ones, perhaps @zackw can use this opportunity to re-run the benchmarks on the same system and post the results in here. This will help ensure there's no significant performance regression (and ideally a speedup) from that change, with other benchmarked hash types (those that don't use these primitives) serving as a control group.

@zackw
Copy link
Collaborator Author

zackw commented Oct 25, 2018

It looks like the permissively-licensed SHA-256 and -512 may be slightly faster (needing more iterations for the same target time) and MD5 might be slightly slower. But I don't know whether any of the changes are statistically significant -- that would require me to do more complicated testing than I have time for today (or in the near future).

hash           cost     elapsed
----           ----     -------
yescrypt         10	301.61ms
scrypt            9	340.82ms
bcrypt           13	385.78ms
sha512crypt  327849	254.29ms
sha256crypt  303227	252.29ms
sha1crypt    263921	251.55ms
md5_sun      166999	252.10ms
des_bsd     1819753	254.74ms

@solardiz
Copy link
Collaborator

Thanks, @zackw. Those changes don't appear to be statistically significant - there's also a similar change for des_bsd, which isn't supposed to be affected by the code changes. At least we know the differences are within 10% or so.

Also, this reminds me: we're probably wasting time on context zeroization on every iteration of sha*crypt and SunMD5, because it's done in the primitives' *_Final(). We might want to look into that and fix it under a separate issue... or not, if we keep those for backwards compatibility only.

@solardiz
Copy link
Collaborator

6fc8102 should indeed hurt performance at MD5 and MD4 a lot. Any chance we can revert it? @besser82

@zackw
Copy link
Collaborator Author

zackw commented Oct 25, 2018 via email

@solardiz
Copy link
Collaborator

OK, it's your call. I understand that's an aliasing violation, as I acknowledge in the comment. It's prevented from turning into a problem now by the translation unit boundary, which might not work that way forever. To avoid the violation yet not incur the performance hit, we'd have to modify the context API to accept a union and modify the caller accordingly. I guess this is in fact not worth the bother for the legacy uses like SunMD5.

@zackw
Copy link
Collaborator Author

zackw commented Oct 25, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants