[paper]
Tested at all of
Ubuntu18.04
macOS Monterey v12.1, Apple M1 Max Chip
- docker image of
python:3.9.2-buster
with Python 3.9.2
-
Run pip install
$ pip install -r requirements.txt
-
Install datasets
-
Creditcard dataset
Download from https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud. Put
dataset/creditcard/creditcard.csv
. -
Medical dataset (HeartDisease and TcgaBrca)
This repository uses the medical dataset for cross-silo FL reseaches through FLamby. If users want to use the data, please carefully read yourself with the license stated in FLamby.
We tested at release version 0.0.1 in FLamby.
- TcgaBrca
- Preprocessed data is stored in the package in the file
flamby/datasets/fed_heart_disease/brca.csv
.
- Preprocessed data is stored in the package in the file
- HeartDisease
- We need to download the data.
- Execute this script
./download_heart_disease_dataset.sh
.
- TcgaBrca
-
MNIST dataset
We use the dataset from torchvision. https://pytorch.org/vision/stable/generated/torchvision.datasets.MNIST.html.
-
Run simulator mode.
In simulator mode, we don't use real communication path in the FL rounds, but on memory.
(List of options: src/options.py
)
Example.
$ python run_simulation.py --dataset_name=creditcard --verbose=1 --agg_strategy=ULDP-AVG-w --n_users=1000 --global_learning_rate=10.0 --clipping_bound=1.0 --n_total_round=100 --local_learning_rate=0.01 --local_epoch=30 --sigma=5.0 --sampling_rate_q=0.5 --user_dist=zipf --user_alpha=0.5 --silo_dist=zipf --silo_alpha=2.0
Example of secure mode (private weighting protocol), supporting only (ULDP-AVG-w, ULDP-AVG, ULDP-SGD, ULDP-SGD-w). It is time-consuming and works only with small datasets.
$ python run_simulation.py --dataset_name=heart_disease --verbose=1 --agg_strategy=ULDP-AVG-w --n_users=100 --global_learning_rate=10.0 --clipping_bound=0.1 --n_total_round=3 --local_learning_rate=0.001 --local_epoch=50 --sigma=5.0 --user_dist=zipf --user_alpha=0.5 --silo_dist=zipf --silo_alpha=2.0 --n_silos=4 --n_silo_per_round=4 --secure_w
Experimental scripts including the all of hyperparameters used in the ULDP-FL paper
are located in exp/script
.
$ exp/script/privacy_utility.sh # compare different algorithms
$ exp/script/optimal_weighting.sh # compare with or without enhanced weighting strategy
$ exp/script/secure_weighting.sh # private weighting protocol (on memory)
$ exp/script/user_level_subsampling.sh # user-level sub-sampling effect
GRPC-based implementation based on FedML.
Note: it currently only works with some algorithms including ULDP-AVG-w.
Wake up server
$ python run_server.py --dataset_name=creditcard --verbose=1 --agg_strategy=ULDP-AVG-w --n_users=1000 --global_learning_rate=10.0 --clipping_bound=1.0 --n_total_round=100 --local_learning_rate=0.01 --local_epoch=30 --sigma=5.0 --user_dist=zipf --user_alpha=0.5 --silo_dist=zipf --silo_alpha=2.0 --n_silos=3 --n_silo_per_round=3
Wake up 3 silos
$ python run_silo.py --silo_id=0 --dataset_name=creditcard --verbose=1 --agg_strategy=ULDP-AVG-w --n_users=1000 --global_learning_rate=10.0 --clipping_bound=1.0 --n_total_round=100 --local_learning_rate=0.01 --local_epoch=30 --sigma=5.0 --user_dist=zipf --user_alpha=0.5 --silo_dist=zipf --silo_alpha=2.0 --n_silos=3 --n_silo_per_round=3
$ python run_silo.py --silo_id=1 --dataset_name=creditcard --verbose=1 --agg_strategy=ULDP-AVG-w --n_users=1000 --global_learning_rate=10.0 --clipping_bound=1.0 --n_total_round=100 --local_learning_rate=0.01 --local_epoch=30 --sigma=5.0 --user_dist=zipf --user_alpha=0.5 --silo_dist=zipf --silo_alpha=2.0 --n_silos=3 --n_silo_per_round=3
$ python run_silo.py --silo_id=2 --dataset_name=creditcard --verbose=1 --agg_strategy=ULDP-AVG-w --n_users=1000 --global_learning_rate=10.0 --clipping_bound=1.0 --n_total_round=100 --local_learning_rate=0.01 --local_epoch=30 --sigma=5.0 --user_dist=zipf --user_alpha=0.5 --silo_dist=zipf --silo_alpha=2.0 --n_silos=3 --n_silo_per_round=3
It works in secure mode (--secure_w
) as well to perform enhanced weighting protocol in private manner.
$ python run_server.py --dataset_name=heart_disease --verbose=1 --agg_strategy=ULDP-AVG-w --n_users=100 --global_learning_rate=10.0 --clipping_bound=0.1 --n_total_round=3 --local_learning_rate=0.001 --local_epoch=50 --sigma=5.0 --user_dist=zipf --user_alpha=0.5 --silo_dist=zipf --silo_alpha=2.0 --n_silos=4 --n_silo_per_round=4 --secure_w
$ python run_silo.py --silo_id=0 --dataset_name=heart_disease --verbose=1 --agg_strategy=ULDP-AVG-w --n_users=100 --global_learning_rate=10.0 --clipping_bound=0.1 --n_total_round=3 --local_learning_rate=0.001 --local_epoch=50 --sigma=5.0 --user_dist=zipf --user_alpha=0.5 --silo_dist=zipf --silo_alpha=2.0 --n_silos=4 --n_silo_per_round=4 --secure_w
$ python run_silo.py --silo_id=1 --dataset_name=heart_disease --verbose=1 --agg_strategy=ULDP-AVG-w --n_users=100 --global_learning_rate=10.0 --clipping_bound=0.1 --n_total_round=3 --local_learning_rate=0.001 --local_epoch=50 --sigma=5.0 --user_dist=zipf --user_alpha=0.5 --silo_dist=zipf --silo_alpha=2.0 --n_silos=4 --n_silo_per_round=4 --secure_w
$ python run_silo.py --silo_id=2 --dataset_name=heart_disease --verbose=1 --agg_strategy=ULDP-AVG-w --n_users=100 --global_learning_rate=10.0 --clipping_bound=0.1 --n_total_round=3 --local_learning_rate=0.001 --local_epoch=50 --sigma=5.0 --user_dist=zipf --user_alpha=0.5 --silo_dist=zipf --silo_alpha=2.0 --n_silos=4 --n_silo_per_round=4 --secure_w
$ python run_silo.py --silo_id=3 --dataset_name=heart_disease --verbose=1 --agg_strategy=ULDP-AVG-w --n_users=100 --global_learning_rate=10.0 --clipping_bound=0.1 --n_total_round=3 --local_learning_rate=0.001 --local_epoch=50 --sigma=5.0 --user_dist=zipf --user_alpha=0.5 --silo_dist=zipf --silo_alpha=2.0 --n_silos=4 --n_silo_per_round=4 --secure_w
All of experimental source code and results are available in exp/puldp_exp.ipynb
.