Official repo for the "Reducing Bias in Modeling Real-world Password Strength via Deep Learning and Dynamic Dictionaries" by Dario Pasquini, Marco Cianfriglia, Giuseppe Ateniese and Massimo Bernaschi presented at USENIX Security 2021
Disclaimer: This code is aimed at reproducing the results reported in our paper as well as support security analysis in the academic context.
All the code needed to run the ADaMs Attack is contained in AdamAttack directory. To be able to run the attack, the following prerequisites must been satisfied:
- gcc/g++
- Python 3.x
- TensorFlow 2.x
- Matplotlib
- CUDA (if TensorFlow-GPU)
- CppFlow
We tested the code only on ubuntu >= 18.x.
To compile the binary:
- Download Tensorflow C API and decompress it.
- Clone CppFlow:
git clone https://github.com/serizba/cppflow.git
- Modify the AdamAttack/Makefile and set the two variables:
- TENSOFLOW_HOME = {path to Tensorflow C API}
- CPPFLOW_HOME = {path to CppFlow}
-
Add the path to the TensorFlow's libraries to: LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:{path to Tensorflow C API}
-
Compile:
cd AdamAttack; make
This produces a directory bin containing a binary file AdamAttack that you can run to launch the attack.
The binary accepts the following input parameters:
- -r an hashcat rules-set e.g., generated.rule.
- -w the dictionary/worlist for the attack
- --hashes-file the file containing the attacked set of password. Referred as X in the paper. (
⚠️ It must contain plaintext passwords, no password hashes) - -a the attack mode. Accepted values are 0=standard, 9=adams. The default value is adams.
- --max-guesses-pow the exponent of the power of 10 that defines the maximum number of guesses.
- --config-dir the path of the directory containing the trained model, the rules file and the budget file. It works only with adam attack-mode.
- --model-path the pathname of the trained-model. It works only with adam attack-mode.
- --budget the attack budget. It works only with adam attack-mode.
For instance:
cd AdamAttack/
./bin/AdamAttack -a 9 -w WORDLIST.txt --config-dir MODELs/PasswordPro_BIG/ --hashes RockYou.txt
Pre-trained models in Keras format, along with rules files and default parameters, are available:
- PasswordPro_SMALL 44MB based on InsidePro-PasswordsPro.rule (150.000.000 c/sec)
- generated_SMALL 130MB based on generated.rule (560.000.000 c/sec)
"c/sec" stands for compatibility scores per second on a NVIDIA V100.
Each pre-trained model comes with its rule-set and the best hyper-parameters we found in our analysis. You can load the pre-trained model/rules-set/hyper-parameters by using the flag: --config-dir followed by the downloaded directory.
Inside the directory ./AdamAttack/Test you can run a scripted test and check if it is everything working. Here, some instructions:
- (After the code for the attack has been compiled)
- cd into /AdamAttack/Test
mkdir MODELs; cd Models
Then download PasswordPro_SMALL.zip, and unzip it.cd ..
back to /AdamAttack/Test/. Download the file passwords.zip, and unzip it.- add execution permission to the scripts:
chmod u+x run.sh
- Then, lunch the attack by running: ./run.sh. It will take some time.
At the end of the process, a file called adams_vs_standard_test.png should appear in ./AdamAttack/Test. The plot reported in the image should be something like this:
In order to train your model, you have to create a training set and a validation set first.
The directory MakeTrainingset contains the necessary code to produce the training sets.
The code can be compile using the makefile into the directory e.g.,:
cd MakeTrainingset; make
This produces a directory bin containing a binary file makeTrainSet that you will run to produce your training set.
The binary takes as input different mandatory parameters:
- the dictionary used to create the training set. Referred as W in the paper.
- an hashcat rules-set e.g., generated.rule.
- the attacked set of passwords used to create the training set. Called
$X_{A}$ in the paper. - output directory, where to write the training set files.
- the number of threads e.g., 32
For instance:
./bin/makeTrainSet WORDLIST.txt RULES.rule ATTACKED_SET.txt OUTPUTDIR 32
After you have created one training set and at least one validation set (used as early-stopping criteria) with makeTrainSet, you have to craft a configuration file for the training process.
A template for such a configuration file is NeuralNet/CONFs/BIG_bn2.gin. This contains all the hyper parameters and information needed to train a new model.
setup.train_home = "./trainset/"
setup.test_homes = [ "./validation1/", "./validation2/"]
Once created a suitable configuration file, you can start the training using train.py in ./NeuralNet/. This takes as input the path to the configuration file e.g.,:
python train.py CONFs/my_BIG_bn2.gin
The script saves checkpoints and logs for the training process in NeuralNet/MODELs/CHECKPOINTS. These can be visualized using tensorboard. At the end of the training, the script saves the keras model in NeuralNet/MODELs/SAVED_MODELS.
This is the official repository of the paper "Reducing Bias in Modeling Real-world Password Strength via Deep Learning and Dynamic Dictionaries" presented at USENIX Security 2021. If you use this tool for your research activity, please cite our paper
@inproceedings {272236,
author = {Dario Pasquini and Marco Cianfriglia and Giuseppe Ateniese and Massimo Bernaschi},
title = {Reducing Bias in Modeling Real-world Password Strength via Deep Learning and Dynamic Dictionaries},
booktitle = {30th {USENIX} Security Symposium ({USENIX} Security 21)},
year = {2021},
isbn = {978-1-939133-24-3},
pages = {821--838},
url = {https://www.usenix.org/conference/usenixsecurity21/presentation/pasquini},
publisher = {{USENIX} Association},
month = aug,
}
Our software is built on top of hashcat-legacy.