Wordlists don't contain Non-ASCII Characters #9

berzerk0 · 2017-04-29T22:25:45Z

Americans aren't the only ones with passwords - why not have special wordlists that include non-ASCII Characters?

I'm glad you asked.

As my knowledge level increases so does my ability to sort out lines. I have two methodologies that I will put to use for Rev 2.0

1. Grep out passwords containing characters from different alphabets

If there is an alphabet published in unicode on Wikipedia, I plan to grep for it

The Ukranian Alphabet is different than the Russian, which is different than the Belorussian, which is different than the Common Cyrillic, which is different than the Serbian which is different than...
This means we could have NATIONALLY targeted lists based on predominant languages
This isn't only true for Cyrillic-based alphabets. Dano-Norwegian is a different alphabet than Swedish, English... etc.
At the very least by language family
My sources still bias towards English, so the ASCII-only lists may simply dwarf the others, but they should still be available.

2. Make Sub-set lists based on source name.

I have many sources with "Rus", "ru", and "Russian" in the title. These lists contain are presumably from Russian sources - so perhaps they should be amalgamated themselves.
Some sources are obviously geared towards WPA, etc.
Caveat: Since my methodology is based on approximating accuracy using the number of files a given line appears in, these groups made of sub-set sources are likely to be precise, but inaccurate. An analogy would be me throwing darts. I might be landing them within a circle of less than 1", but the target is about 4ft over to the left.

In actuality, I'm awful at darts.

I welcome any suggestions - except on my darts game. I mean suggestions about the wordlists.

iancnorden · 2017-06-07T15:01:23Z

Hey again,

Not sure if this has had much thought or updates, but I believe unicode.com upholds the 'official' characters lists that can be rendered or utilized from other alphabets... such as punicode to unicode.
Good example:
https://unicode-table.com/en/#cyrillic

I believe these are sourced from: https://github.com/unicode-table/unicode-table-data which may have good data on a per-language or per character set to base an initial push from.

berzerk0 · 2017-06-07T17:41:04Z

Great find! I still plan on implementing this.

As a status update on this and Rev 2 generally, I have found plenty of sources and need to do a bit of sifting before repeating the process. I'd say Mid-July is a generous estimate for Rev 2 - meaning it may be sooner than that.

berzerk0 · 2018-02-20T23:32:25Z

"Mid July" haha.

The lists now contain non-ascii characters.

berzerk0 added this to the Acquiring and Processing Sources milestone Apr 29, 2017

berzerk0 self-assigned this Apr 29, 2017

berzerk0 closed this as completed Feb 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wordlists don't contain Non-ASCII Characters #9

Wordlists don't contain Non-ASCII Characters #9

berzerk0 commented Apr 29, 2017 •

edited

Loading

iancnorden commented Jun 7, 2017

berzerk0 commented Jun 7, 2017

berzerk0 commented Feb 20, 2018

Wordlists don't contain Non-ASCII Characters #9

Wordlists don't contain Non-ASCII Characters #9

Comments

berzerk0 commented Apr 29, 2017 • edited Loading

Americans aren't the only ones with passwords - why not have special wordlists that include non-ASCII Characters?

1. Grep out passwords containing characters from different alphabets

2. Make Sub-set lists based on source name.

iancnorden commented Jun 7, 2017

berzerk0 commented Jun 7, 2017

berzerk0 commented Feb 20, 2018

berzerk0 commented Apr 29, 2017 •

edited

Loading