Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wordlists don't contain Non-ASCII Characters #9

Closed
berzerk0 opened this issue Apr 29, 2017 · 3 comments
Closed

Wordlists don't contain Non-ASCII Characters #9

berzerk0 opened this issue Apr 29, 2017 · 3 comments
Assignees

Comments

@berzerk0
Copy link
Owner

berzerk0 commented Apr 29, 2017

Americans aren't the only ones with passwords - why not have special wordlists that include non-ASCII Characters?

I'm glad you asked.

As my knowledge level increases so does my ability to sort out lines. I have two methodologies that I will put to use for Rev 2.0

1. Grep out passwords containing characters from different alphabets

If there is an alphabet published in unicode on Wikipedia, I plan to grep for it

  • The Ukranian Alphabet is different than the Russian, which is different than the Belorussian, which is different than the Common Cyrillic, which is different than the Serbian which is different than...
  • This means we could have NATIONALLY targeted lists based on predominant languages
  • This isn't only true for Cyrillic-based alphabets. Dano-Norwegian is a different alphabet than Swedish, English... etc.
  • At the very least by language family
  • My sources still bias towards English, so the ASCII-only lists may simply dwarf the others, but they should still be available.

2. Make Sub-set lists based on source name.

  • I have many sources with "Rus", "ru", and "Russian" in the title. These lists contain are presumably from Russian sources - so perhaps they should be amalgamated themselves.
  • Some sources are obviously geared towards WPA, etc.
  • Caveat: Since my methodology is based on approximating accuracy using the number of files a given line appears in, these groups made of sub-set sources are likely to be precise, but inaccurate. An analogy would be me throwing darts. I might be landing them within a circle of less than 1", but the target is about 4ft over to the left.

In actuality, I'm awful at darts.

I welcome any suggestions - except on my darts game. I mean suggestions about the wordlists.

@berzerk0 berzerk0 self-assigned this Apr 29, 2017
@iancnorden
Copy link

Hey again,

Not sure if this has had much thought or updates, but I believe unicode.com upholds the 'official' characters lists that can be rendered or utilized from other alphabets... such as punicode to unicode.
Good example:
https://unicode-table.com/en/#cyrillic

I believe these are sourced from: https://github.com/unicode-table/unicode-table-data which may have good data on a per-language or per character set to base an initial push from.

@berzerk0
Copy link
Owner Author

berzerk0 commented Jun 7, 2017

Great find! I still plan on implementing this.

As a status update on this and Rev 2 generally, I have found plenty of sources and need to do a bit of sifting before repeating the process. I'd say Mid-July is a generous estimate for Rev 2 - meaning it may be sooner than that.

@berzerk0
Copy link
Owner Author

"Mid July" haha.

The lists now contain non-ascii characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants