-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Entropy estimation of user-entered passwords: Not a best practice (a small rant) #2061
Comments
I don't see any advantage of what you are proposing. This entropy measure is nothing but a rough measure of how good the password is and in no way accurate. So there is really no difference between 15 and 17 bits of "entropy" here. There is a difference between 15 and 120 bits, but only in the sense that one is "much higher" than the other. We introduced the colour coding to convey this fact. If you ask me, we could actually remove the numbers and only go with the colours. Your statement about the exact entropy being a function of our generation process is also not correct. When I have [a-zA-Z0-9] as my generation alphabet and I generate the password "abcdeFGH", then its entropy would be 8 * -log2(1/64) according to your statement, although it really is just 8 * -log2(1/8). Moreover, the fact that it's a sequence of characters and therefore predictable no matter its (hypothetical) entropy is probably a lot more important and that's exactly what zxcvbn takes into account. All in all, entropy on its own is pretty useless for measuring password quality, because it really only takes into account character probabilities and NOT the generation process or any prior knowledge associated with it. "zxcvbn" itself is the best example. Measured by entropy, it scores okay, but since it's the name of a well-known password strength estimator, it will never make a good password, even if it had a 1000 bits of entropy. |
There are two ways of looking at measuring password quality: First argues: I generate my password using some particular method and I trust that this method produces particular entropy that could be called "generated entropy". I trust the math, the algorithm and its implementation and am not afraid that it could produce a password that somebody considers crackable. Second argues: I do not really care how I generated my password. I will attempt to crack it using some common methods and predict entropy based on these cracking attempts. This could be called "cracking entropy" and as I understand this is what zxcvbn basically does. For human generated (or edited) passwords it only makes sense to display "cracking entropy" and there is no disagreement about it here. However I agree with OP that this estimate should be displayed with more caution compared to randomly generated passwords. For a decent example password "onetwothreefourfive" has poor (32 bit) quality estimate (looks too high already), but same "onetwothreefourfive" in my native language has excellent (>100 bit) quality estimate (ridiculous). Maybe zxcvbn one day will add my language to their dictionaries and this will become less ridiculous. However I think that no manually generated or edited password should have excellent message only with no other warnings. For automatically generated passwords (like within "Tools -> Password Generator") both estimates could be used and the fundamental question is:
Currently KepassXC supports only basic password generation methods (passwords from standard character sets and passphrases based on English words) for which zxcvbn gives decent "cracking entropy" estimates; so this problem is not very important right now (as already mentioned). However if you wanted to start supporting more esoteric password generation methods using things like custom dictionaries this again would become problematic and you would probably want to start using "generated entropy" for more realistic password quality estimates. For an automatically generated password "cracking entropy" estimate is still useful and can be displayed alongside "generated entropy" based estimate, but I would agree with OP that displaying "generated entropy" as the main source makes more sense. One could consider adding some additional fields for each password to support better password quality overview (one can store this in notes already):
|
Hello @phoerious, What have i misunderstood about entropy? |
That was my main point. Yes, the generation source has an entropy of 8 * -log2(1/64). However, the final password has only 8 different characters which appear with a probability of 1/8 each. So even if your source is very rich, you can still generate a weak password that by some chance only uses a very predictable subset of characters. "abcdeFGH" is one such weak password and so are "aaaaaaaa" and "password". The chances of generating them are low, but if you get unlucky and do generate one of them, it is easy to crack. In the end, there is no distinction between a bad password chosen wilfully and a bad password chosen by astronomically unlucky chance. That's why we measure the strength of the generated password and not its generation source. |
Then i think that this is a problem of notion or my understanding of entropy? (i'm only just starting to grasp the concept of a cracking entropy. As for your example, i currently think that the probability that a certain symbol combination is used as a could be used to calculate cracking entropy.) When i hear entropy, i think of it defined as generation source entropy, a measurement for the expected number of attempts an attacker blindly brute-forcing your passcode would require to guess it. I now understand that my previous points were mainly about me not interpreting the entropy in KeePassXC as cracking entropy, BUT KeePassXC uses the generation source entropy for passphrases, even if a common nursery rhyme might be the passphrase. For me, this inconsistency increases the cunfusion as to why "entropy" should mean something else for passwords. |
There is no such thing as "cracking entropy". Your understanding of entropy is largely correct, but not quite applicable, because pure entropy is actually a pretty bad measure for password quality. Our password generation source has a specific entropy, yes, but the password that it generates bears no information about this source. On the other hand, you can enter any password of your own into the password field or edit a generated password to your liking and we wouldn't have any information about the entropy of that source either. All we see is the password and we can only make assumptions about what random source (if any) may have generated it. That assumed source is the one we calculate the entropy of. Of course, the concept of a bad password has nothing to do with entropy per se (except that low entropy causes bad passwords), but we use such information to "correct" the measurement. In the end, we do not try to give a mathematical description of your generation source (which is entirely irrelevant), but try to find a lower bound for how strong the actual password is. What we measure is a lot more conservative and more in line with (although not quite the same as) min-entropy as compared to the classic Shannon entropy. |
Now i'm even more confused as to why you don't want to give it another name or implement these clarifications in a way visible to the user. |
It is effectively still entropy, just not the plain Shannon entropy of an imaginary "perfect" source. We assume a different source instead, one that works more like a language model, thus we take into account a dictionary and character dependencies to find a better estimate of the actual information content than a naive calculation under the assumption of stochastic independence could (an assumption that almost never holds in the real world BTW). Call it corrected entropy if you will, but in the end the name really doesn't matter. |
ok, going to let this issue rest for good then. |
Expected Behavior
When KeePassXC generates a password or passphrase, it should always provide the EXACT entropy, which it knows from the method of generation. When a user enters or edits a password or passphrase, IF KeePassXC provides an entropy estimate, it should WARN the user that this is only an estimate and is not a reliable means of determining password entropy (and possibly that, in general, generated passwords are a better choice than user-selected ones.)
Current Behavior
When the user generates a passphrase, the correct entropy is given for the diceware passphrase generation process. (But if the user then edits the passphrase, the entropy estimate is unchanged, a clear bug: #867)
When the user generates a password, using the built-in password generator, the entropy is computed using a password cracker. This is not necessary or desirable: KeePassXC knows the actual entropy, which is a function of the generation process it just performed. It should supply that number instead, in the same way it does for passphrases.
When the user manually enters or edits a password, the entropy is computed using a password cracker. This is the best possible approach under the circumstances, but it should be highlighted in red, with a tooltip explaining that entropy estimation is art rather than science, is only an estimate, and will give excessively high estimates for execrably bad passwords in many cases.
Example: If I enter "Glenn Willen, May 27 19xx, Gemini", which is my full name, obfuscated birthdate, and star sign, the cracker estimates that this string has 129.63 bits of entropy. In reality, of course, it has very little; supposing very generously that there are 1000 equally-obvious public facts I might use in this way, and 100 equally-obvious ways to format each, and 3 ways to arrange 3 of them, I still don't even make it up to 20 bits in reality. This is also the reason that the cracker gives unrealistically-high estimates when people manually paste diceware passphrases into the 'password' box -- it doesn't understand the space it's estimating over.
Why is this example important? I think that it's obvious to many people on this bugtracker what the problem with my example is. But I think it is equally not obvious at all to most typical non-expert users. If a piece of security software is telling them, as they add characters to their password, that it's getting more secure, they may believe the software. I claim this is a significant potential danger and a footgun to naive users who don't know why the software does what it does.
Possible Solution
I alluded to one above: When generating passwords, estimate the entropy correctly, using the entropy that was put in by the generator. When the user enters or edits a password, provide an estimate if you wish, but somehow clearly highlight (such as with red text and a tooltip) that it is a bad idea to rely on it.
Steps to Reproduce (for bugs)
Try generating and entering passwords and passphrases in the password generation dialog. Observe the entropy estimator widget.
Context
I am a software engineer in the Bitcoin space. I would not describe myself as a computer security expert (except possibly by osmosis from the people I work with), but I aspire to "security researcher". I am in the process of writing up some best practices for personal computer security for Bitcoin users, which is why I finally got around to investigating KeePassXC as a KeePassX replacement.
Thank you for reading my rant! Obviously I am intending it as a jumping-off point to start discussion, only. It's your app. :-)
Debug Info
KeePassXC - Version 2.3.3
Revision: 0a155d8
Libraries:
Operating system: macOS Sierra (10.12)
CPU architecture: x86_64
Kernel: darwin 16.7.0
Enabled extensions:
The text was updated successfully, but these errors were encountered: