-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
De-duplicate items #2
Comments
Good project though! Would love to see a list of WPA-formatted passwords that come just from router/wifi sources, not user-passwords. |
Duplication - this is me getting caught with the classic invisible newline between windows and Linux. WPA-formatted sources - I have found Wordlists that include "WPA" in the title, but that isn't much of a guarantee that they exclusively come from router/wifi sources. It is also possible (and equally not possible, as I am asserting this with zero evidence) that the trends for common passwords do not change dramatically if they are used for a Router or for an email address. It seems just as likely to me that people see it as a generic "password" rather than "the Wifi password." I'll see if I can find some sources with more background, but I have doubts. EDIT |
Easy fix for the dupes that worked for me was issuing |
@WiseNerd So if you already fixed it, why not make a PR? |
PR from me shortly for de-dupe. Great work. |
@iancnorden You're gonna beat me to the punch! |
Now it's a race! I had not realized the size, Git clone is still chugging away! |
@blobgo well my macbook's limited ddr2 memory would be neutered by sanitizing that entire thing, I fixed a small part mostly out of curiosity. But was hoping to save somebody some time nonetheless :) |
De-dupes still running. |
Initial De-Dupes (up to ~30 Million Non-Spec and WPA) are done, looks like I can't do the big ones in parallel - probably done by tomorrow. Or so I thought, they didn't come out right. @WiseNerd I was using
which I assumed started at the top and worked its way down, but then for one of the files it popped "password" out of the 2nd slot. No way.
only works if two lines are next to one another, unfortunately. I might just have to compile again from sources - unless @iancnorden 's experience comes up with a solid de-duping |
Chewing on the folder with Top2Bill* 164/958 completed, started around 1400 eastern. If curious, thanks to https://github.com/ltdenard ... and this will have to continue overnight at this rate.
|
Can all unique combinations be put into a new file, or do you just want the duplicates removed? |
For Rev 1.1 we aim to just remove the duplicates while otherwise preserving order. Rev 2.0 will have the newlines weeded out at the source, so this problem will not carry over. |
De-Duped Rev 1.1 is live now, but does not contain the largest files. Rev 1.2 will, in torrents with compression. Closing this in light of the release of 1.1 and the impending release of 1.2 |
#Looks like there could be quite a few dupes in here, for instance, "password" is at 1 and 19: https://github.com/berzerk0/Probable-Wordlists/blob/master/Real-Passwords/WPA-Length/Top76-probable-WPA.txt
The text was updated successfully, but these errors were encountered: