-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List not filtered properly #29
Comments
Guess this one slipped by me, do you have a specific example? It's possible these are legitimately being used as passwords - but that's very unlikely. |
I don't have the file anymore but you can search for angled brackets "<" and ">" |
This is tricky. |
For Release 2.0, I erred on the side of inclusivity. Their are lines that look a lot like code, specifically html tags. The same is true for some email addresses. In many cases, these lines appeared in over 15 files in analysis, suggesting they are in fact passwords. This logic is not definitive, however. All of the source files on the list were already published, so this information is already available to the internet. With this in mind, I opted to include these lines. Most questionable lines do not appear until the list is already quite large. This issue will remain open and we'll meditate upon it. |
Troy Hunt's take on the problem.
While it is highly likely that these aren't passwords, the very idea that they are not is based on assumption we have a good handle on what passwords are. This assumption, for the most part, is true. However, INTENTIONALLY making passwords that don't look like passwords isn't without merit. I once worked at a company where we had reason to believe that keyloggers were installed on our systems. I had no idea what to with this information, but it really bothered me. To cope with this, I came up with an idea to use the on-screen keyboard to create a password that looked like a URL. Certainly, I can't be the only one to come up with the idea of making a password that contains some sort of camouflage. It is still most definitely more likely that these are simple "upstream parsing" issues, including them has such a small impact on list performance. I say they are worth keeping. |
The 258Million list has not been filtered properly. It contain a lot of HTML tags like and .
The text was updated successfully, but these errors were encountered: