-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove sha256crypt ($5$) and sha512crypt ($6$) from strong set. #35
Conversation
These are the strongest hashes supported by glibc’s libcrypt, but they have no particular protection against hardware parallel attacks (e.g. using GPUs to iterate SHA-2) and should probably only be used for new hashes in mixed environments where yescrypt and/or bcrypt aren’t universally supported. It’s difficult to find information about how difficult these hashes specifically are to brute-force; the big online services whose user databases got dumped seem to have skipped straight from SHA-1 or worse straight to bcrypt. I think what I said in NEWS is accurate as far as it goes, but it might be nice to add more concrete numbers and advice to crypt.5.
https://gist.github.com/epixoip/a83d38f412b4737e99bbef804a270c40 is a good benchmark, even if a bit old, and it shows that in practice sha512crypt can be cracked 10x faster than bcrypt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't consider SHA-2 crypt strong anymore, we should do the same for bcrypt (
All of these hashes are tunable, so ideally any benchmarks we would consider here should be for defensive performance vs. cracking performance, or for cracking performance at similar and currently relevant defensive throughput (unless we assume that anyone using SHA-crypt would use the default of rounds=5000 or else migrate to another hash type? maybe). The benchmarks cited above are cracking-only at tunable settings that are historically used for benchmarking, but that have somewhat different defensive throughput (SHA-crypt's default of rounds=5000, and bcrypt at cost 5 unless stated otherwise), so direct/exact comparisons of these performance numbers between different hash types are at best not ideal. Besides the performance aspects, a drawback of SHA-crypt (and especially sha256crypt) is its running time being highly dependent on password length. For unusually high lengths, it's O(length^2). For low lengths, in sha256crypt there are very noticeable (tens of percent) differences in running time for realistic and attack-relevant password length ranges - e.g., for 7 vs. 8 or 11 vs. 12 characters, with the exact thresholds varying by salt length. (For sha512crypt, similar slowdowns occur at much higher lengths - tens of characters - so not practically relevant as an infoleak.) This also limits the cost setting a sysadmin or distro vendor could realistically use - would have to consider the worst case (longest) allowed passwords, and as a result have unnecessarily high throughput (including for cracking) for more realistic lengths. There's also a relatively tiny timing leak (in both sha256crypt and sha512crypt) of 8 bits of an intermediate hash value (independent of password length), so precise timing measurements would allow an attacker to rule out 255 out of 256 possible passwords (or better if also considering lengths), but I think this attack will remain impractical and thus only theoretical (although it can be demo'ed). bcrypt, scrypt, and yescrypt are not ideal in that they're cache-timing unsafe. That's a tough security trade-off (avoiding this property makes the password hash weaker in other ways). But they avoid "worse than cache" timing leaks, whereas SHA-crypt literally has conditional branching based on sensitive data (not limited to password length - also on that 8-bit intermediate hash). |
The proposed change to NEWS (indirectly) calls bcrypt "newer", whereas it actually pre-dates SHA-crypt by a decade. For the same reason, bcrypt wasn't deliberately hardened against GPUs - those were quite different and were not used for password cracking at the time bcrypt was introduced. It just happened to be GPU-unfriendly (at first very unfriendly, but newer GPUs gradually become better at it). I am just sharing what I know and notice. I do not yet vote for/against a change like what's proposed here. |
For our current results at SHA-crypt on FPGA, see: https://www.openwall.com/lists/john-users/2018/07/23/1 (sha512crypt) These are much worse than the corresponding latest GPU speeds per chip, but are on par with latest GPUs per Watt. However, they're on FPGAs from many years ago (45nm, circa 2011). |
I had been under the impression that sha{256,512}crypt were known to be significantly weaker than bcrypt at any cost parameter level -- it seems I was wrong. Let's close this PR for now and make a note that we need to collect solid data to base any decision on. The branch for #26 now contains a sketch implementation of a program called
(if you run the program yourself, pass |
We don't have to use that one, Jeremi has since posted a newer one for 8x1080Ti and newer hashcat: https://gist.github.com/epixoip/ace60d09981be09544fdd35005051505 He also posted bcrypt benchmarks on 4 different recent GPUs: https://gist.github.com/epixoip/9d9b943fd580ff6bfa80e48a0e77520d |
Tuning for same defensive running time on one logical CPU on an idle system isn't ideal. A more important figure (limiting the maximum cost setting that can be used on a server) is throughput (hashes computed per second) under full CPU utilization (all logical CPUs) by concurrent hash computations. But the differences between these two metrics are usually within 2x (of course, first taking into consideration that they're inversely proportional), so I'll just use @zackw's results below. Also, I don't know what password length and salt these benchmarks are for - and this matters a lot for sha256crypt - but again usually within a factor of 2. hashcat's are for password length 7, salt length 8. Finally, I think 250ms on a modern CPU is excessive as a distro's default (more useful is the range up to 100ms), and with our current single-parameter input to yescrypt would consume more memory (great for security) than a distro would dare to use for this task by default (considering it's per each concurrent authentication attempt, so can be multiple GB total). Thankfully, for the 4 hash types I reference below this isn't a problem as it relates to scaling the existing results. Based on Jeremi's 8x1080Ti benchmarks I referenced above, we can expect these speeds at the cost settings from @zackw's comment above:
No directly comparable data for other hash types @zackw listed. Scaling formulas I used:
Per these results, SHA-crypt and even bsdicrypt certainly do appear a lot cheaper than bcrypt to attack on GPUs. However, per the speed for bsdicrypt I guess hashcat's implementation of it is not bitsliced (sorry, too lazy to check the code). Using the number for descrypt (which I know is bitsliced) instead we'd get:
as the potential speed for a properly optimized implementation. This needs to be scaled down somewhat since bsdicrypt will require extra registers for the pointers to extra 12 bits of salt (24-bit vs. 12-bit salt), and register pressure is often the limiting factor for performance of bitslice implementations. So maybe 100k, which is on par with sha256crypt. So bcrypt does stand out. On FPGAs, the results are very different (see comment below), but GPUs are still far more accessible (this might change somewhat as/if we proceed to implement bcrypt on AWS F1). |
Scaling formulas I used:
We don't have an implementation of bsdicrypt on FPGA, so I used scaling from descrypt - it's similar enough. It looks like bcrypt wins even at this test. Note: these Spartan-6 LX150 FPGAs are many years older than the 1080Ti GPUs used for the previous comparison, and they were not high-end even when they were new (unlike these GPUs) - they were merely at the end of the budget Spartan-6 line (whereas high-end ones were the Virtex-6 line). Also, this board consumes 30W to 40W total for the 4 FPGAs vs. 1 kW to 2 kW for the 8 GPUs. So these performance figures are given here for comparison of the different hash types (on these two kinds of hardware when the hashes are tuned for same performance on CPU), not for a comparison of GPUs vs. FPGAs. |
Here's what I get for target times of 100ms, on the same computer:
and 50ms:
For exponential cost parameters, it chooses the smallest number producing a time above the target. I will think about revising the program to estimate throughput instead, but I'm not sure when I will have time to implement it. |
Thanks @zackw. These results scale for the 4 hash types I included in my comparison as expected, so I don't need to redo my calculations. |
While throughput is more relevant for defense vs. attack comparisons, it makes sense to keep tuning for elapsed time as a feature as well. Ideally, it should be possible to tune for either throughput or elapsed time (command-line options?), and have the other metric output as well. Or maybe it'll be more metrics to output: also elapsed time when running 1 thread vs. when running max threads. Could also output min/avg/max like I do in yescrypt's
(In this example, "min" is the lowest seen when running 56 threads, and it just happens to be similar to what would be typical for 1 thread. We should probably also have explicit min/avg/max output for 1 thread.) |
Maybe it makes sense to start by removing only sha256crypt ( Also, with libxcrypt's MD5, SHA-256, and SHA-512 implementations having just been replaced with permissively licensed ones, perhaps @zackw can use this opportunity to re-run the benchmarks on the same system and post the results in here. This will help ensure there's no significant performance regression (and ideally a speedup) from that change, with other benchmarked hash types (those that don't use these primitives) serving as a control group. |
It looks like the permissively-licensed SHA-256 and -512 may be slightly faster (needing more iterations for the same target time) and MD5 might be slightly slower. But I don't know whether any of the changes are statistically significant -- that would require me to do more complicated testing than I have time for today (or in the near future).
|
Thanks, @zackw. Those changes don't appear to be statistically significant - there's also a similar change for Also, this reminds me: we're probably wasting time on context zeroization on every iteration of sha*crypt and SunMD5, because it's done in the primitives' |
That's doing aliasing-unsafe pointer tricks. I want all such code
exterminated from the repository *even if* that hurts performance.
|
OK, it's your call. I understand that's an aliasing violation, as I acknowledge in the comment. It's prevented from turning into a problem now by the translation unit boundary, which might not work that way forever. To avoid the violation yet not incur the performance hit, we'd have to modify the context API to accept a |
Yeah, I'd think harder about preserving this kind of optimization in
the code used by bcrypt or yescrypt, but not anything depending on
MD[45] or even SHA1.
|
These are the strongest hashes supported by glibc’s libcrypt, but
they have no particular protection against hardware parallel
attacks (e.g. using GPUs to iterate SHA-2) and should probably
only be used for new hashes in mixed environments where yescrypt
and/or bcrypt aren’t universally supported.
It’s difficult to find information about how difficult these hashes
specifically are to brute-force; the big online services whose user
databases got dumped seem to have skipped straight from SHA-1 or worse
straight to bcrypt. I think what I said in NEWS is accurate as far as
it goes, but it might be nice to add more concrete numbers and advice
to crypt.5.