MEDFAIR is a fairness benchmarking suite for medical imaging (paper). We are actively updating this repo and will incorporate more datasets and algorithms in the future. Contributions are warmly welcomed! Check our website for a brief summary of the paper.
😀 MEDFAIR is accepted to ICLR'23 as Spotlight!
A detailed documentation can be found here.
Python >= 3.8+ and Pytorch >=1.10 are required for running the code. Other necessary packages are listed in environment.yml
.
cd MEDFAIR/
conda env create -n fair_benchmark -f environment.yml
conda activate fair_benchmark
Due to the data use agreements, we cannot directly share the download link. Please register and download datasets using the links from the table below:
Dataset | Access |
---|---|
CheXpert | Original data: https://stanfordmlgroup.github.io/competitions/chexpert/ |
Demographic data: https://stanfordaimi.azurewebsites.net/datasets/192ada7c-4d43-466e-b8bb-b81992bb80cf | |
MIMIC-CXR | https://physionet.org/content/mimic-cxr-jpg/2.0.0/ |
PAPILA | https://www.nature.com/articles/s41597-022-01388-1#Sec6 |
HAM10000 | https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T |
OCT | https://people.duke.edu/~sf59/RPEDC_Ophth_2013_dataset.htm |
Fitzpatrick17k | https://github.com/mattgroh/fitzpatrick17k |
COVID-CT-MD | https://doi.org/10.6084/m9.figshare.12991592 |
ADNI 1.5T/3T | https://ida.loni.usc.edu/login.jsp?project=ADNI |
See notebooks/HAM10000.ipynb
for an simple example of how to preprocess the data into desired format. You can also find other preprocessing scripts for corresponding datasets.
Basically, it contains 3 steps:
- Preprocess metadata.
- Split to train/val/test set
- Save images into pickle files (optional -- we usually do this for 2D images instead of 3D images, as data IO is not the bottleneck for training 3D images).
After preprocessing, specify the paths of the metadata and pickle files in configs/datasets.json
.
python main.py --experiment [experiment] --experiment_name [experiment_name] --dataset_name [dataset_name] \
--backbone [backbone] --total_epochs [total_epochs] --sensitive_name [sensitive_name] \
--batch_size [batch_size] --lr [lr] --sens_classes [sens_classes] --val_strategy [val_strategy] \
--output_dim [output_dim] --num_classes [num_classes]
For example, for running ERM
in HAM10000
dataset with Sex
as the sensitive attribute:
python main.py --experiment baseline --dataset_name HAM10000 \
--total_epochs 20 --sensitive_name Sex --batch_size 1024 \
--sens_classes 2 --output_dim 1 --num_classes 1
See parse_args.py
for more options.
python sweep/train-sweep/sweep_batch.py --is_slurm True/False
Set the other arguments as needed.
See notebooks/results_analysis.ipynb
for a step by step example.
Please consider citing our paper if you find this repo useful.
@inproceedings{zong2023medfair,
title={MEDFAIR: Benchmarking Fairness for Medical Imaging},
author={Yongshuo Zong and Yongxin Yang and Timothy Hospedales},
booktitle={International Conference on Learning Representations (ICLR)},
year={2023},
}
MEDFAIR adapts implementations from many repos (check here for the original implementation of the algorithms), as well as many other codes. Many thanks!