FRAUD-Detect

This is the official repository of Washing The Unwashable : On The (Im)possibility of Fairwashing Detection, a work published in the Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS), New Orleans, Louisiana, USA, November 28 - December 9, 2022.

Setup:

Clone source code from GitHub:

git clone https://github.com/cleverhans-lab/FRAUD-Detect.git

Create virtual environment (use Python 3.8):

python3 -m venv FRAUDDetect_env 
source /FRAUDDetect_env/bin/activate
pip3 install -r requirements.txt

Description

The code contains:

Training black-box models
Fairwashing and deteting fairwashing using FRAUD-Detect
Evading FRAUD-Detect using an informed adversary

Training black-box models

The ./FRAUD-Detect_code/models/ directory contains several files to train black-box models (AdaBoost, DNN, RF, and XgBoost) from scratch, their architectures and all hyper-parameters are located in the train_models.py script, though main.sh will train the models for all seed and generate labels for both the suing and test set which are identical to those used in our paper.

Note: training the black-box models requires tensorflow.

Outputs

Trained black-box models. Note that all the pretrained models are provided at ./FRAUD-Detect/FRAUD-Detect_code/models/pretrained/ directory.

Fairwashing and detecting fairwashing using FRAUD-Detect

FRAUD-Detect detects fairwashed models by measuring the Kullback–Leibler (KL) divergence over subpopulation-wise confusion matrices of the interpretable model.

To observe the KL_confusion_matrix vs Demographic Parity graph for 10 seeds of a given dataset, interpretable model, and black-box model combination:

cd FRAUD-Detect_code/washing_analysis
python analysis.py --help

Rerun the last line with requested arguments (note, epsilons are optional and given as space separated floats). Results will be generated in ./sample_results/ directory.

The code currently supports:

3 datasets (Adult income, COMPAS, and Bank Marketing)
4 black-box models (DNN, AdaBoost, XgBoost, and Random Forest)
2 interpretable models (Descision Trees, Logistic Regression)

Outputs

The KL divergence as a function of Demographic Parity plot for each seed with dashed lines showing the black-box fairness;
Demographic Parity and KL divergence as a function of fairwashing strength plot with error shadings.

Evading FRAUD-Detect using an informed adversary

We assume an informed adversary who is aware of the FRAUD-Detect and desires to evade FRAUD-Detect while performing fairwashing.

The ./FRAUD-Detect/FRAUD-Detect_code/quantifying_fairwashing/ directory contains codes that explore the range of fairness gap given a fixed value of fidelity and a fixed value of KL divergence via solving the informed adversary optimization problem.

Outputs

A plot showing the range of Demographic Parity gap achievable by the informed adversary seeking to evade the fairwashing detector.

References

If you use our code, please cite the following paper:

  @InProceedings{shamsabadi2022FraudDetect,
    title = {Washing The Unwashable : On The (Im)possibility of Fairwashing Detection},
    author = {Shahin Shamsabadi, Ali and Yaghini, Mohammad and Dullerud, Natalie and Wyllie, Sierra and Aïvodji, Ulrich and Alaagib, Aisha and Gambs, Sébastien and Papernot, Nicolas},
    booktitle = {The Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS)},
    year = {2022},
    address = {New Orleans, Louisiana, USA},
    month = November 28-December 9
  }

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
FRAUD-Detect_code		FRAUD-Detect_code
sample_results		sample_results
FRAUD-Detect.pdf		FRAUD-Detect.pdf
FRAUD-Detect.png		FRAUD-Detect.png
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FRAUD-Detect

Setup:

Description

Training black-box models

Outputs

Fairwashing and detecting fairwashing using FRAUD-Detect

Outputs

Evading FRAUD-Detect using an informed adversary

Outputs

References

About

Releases

Packages

Contributors 2

Languages

cleverhans-lab/FRAUD-Detect

Folders and files

Latest commit

History

Repository files navigation

FRAUD-Detect

Setup:

Description

Training black-box models

Outputs

Fairwashing and detecting fairwashing using FRAUD-Detect

Outputs

Evading FRAUD-Detect using an informed adversary

Outputs

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages