This is an official repository for our CVPR2023 workshop paper
FedDPMS and synthetic data generation. The four parts of the figure depict: (1) finding latent representation of raw data via a local encoder; (2) creating noisy latent means (by adding Gaussian noise to the means of latent data representations) and filtering out unusable ones with the help of a local classifier; (3) uploading usable noisy latent means to the server; (4) a benefiting client utilizing the global decoder to generate synthetic data from the received noisy latent means, expanding its local dataset.This project is developed based on python 3.6 with torch1.9 (rocm4.2). We use conda to manage the virtual environment.
git clone git@github.com:CityChan/Federated-DPMS.git
cd Federated-DPMS
conda create -n dpms --python=3.6
conda activate dpms
pip install torch==1.9.1+rocm4.2 torchvision==0.10.1+rocm4.2 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
train.py
: general setup for training and evaluationmodels.py
: model architectures for running experimentssampling.py
: functions for generating non-iid datasets for federated learningutil.py
: functions for computing accuracy, knowledge distillation and model aggregationLocalupdate.py
: define functions for locally updating models with FedAvg, FedProx, Moon, FedMix and FedDPMS
- --dataset: 'CIFAR10', 'CIFAR100', 'FMNIST'
- --batch_size: 64 by default
- --num_epochs: number of global rounds, 50 by default
- --lr: learning rate, 0.001 by default
- --lr_sh_rate: period of learning rate decay, 10 by default
- --dropout_rate: drop out rate for each layer, 0.2 by default
- --tag: 'centralized', 'federated'
- --num_users: number of clients, 10 by default
- --update_frac: proportion of clients send updates per round, 1 by default
- --local_ep: local epoch, 5 by default
- --beta: concentration parameter for Dirichlet distribution: 0.5 by default
- --seed: random seed(for better reproducting experiments): 0 by default
- --mini: use part of samples in the dataset: 1 by default
- --moon_mu: hyper-parameter mu for moon algorithm, 5 by default
- --moon_temp: temperature for moon algorithm, 0.5 by default
- --prox_mu: hyper-parameter mu for prox algorithm, 0.001 by default
- --pretrain: number of preliminary rounds, 20 by default
- --gen_num: desired generation number for each class, 50 by default
- --std: standard deviation by Differential Noise, 4 by default
- --code_len: length of latent vector, 32 by default
- --alg: 'FedAvg, FedProx, Moon, FedVAE, DPMS, FedMix'
- --vae_mu: hyper-parameter for FedVAE and FedDPMS: 0.05 by default
- --fedmix_lam: lambda for fedmix: 0.05 by default
- --eval_only: only ouput the testing accuracy during training and the running time
We mainly use a .sh files to execute multiple expriements in parallel. The exprimenets are saved in checkpoint with unique id. Also, when the dataset is downloaded for the first time it takes a while.
example:
(1) for training a DPMS model
python3 train.py --dataset 'CIFAR100' --batch_size 64 --lr 0.001 --num_epochs 50 --dropout_rate 0.2 --tag 'federated' --num_users 10 --update_frac 1 --local_ep 5 --beta 0.5 --seed 0 --mini 1 --pretrain 20 --gen_num 50 --std 4 --code_len 128 --alg 'DPMS' --vae_mu 0.05
(2) for test the trained and saved model
python3 train.py --dataset 'CIFAR100' --batch_size 64 --lr 0.001 --num_epochs 50 --dropout_rate 0.2 --tag 'federated' --num_users 10 --update_frac 1 --local_ep 5 --beta 0.5 --seed 0 --mini 1 --pretrain 20 --gen_num 50 --std 4 --code_len 128 --alg 'DPMS' --vae_mu 0.05 --eval_only
You can explore the different .sh files in the 'scripts' folder for more examples.
We appreciate your citation if you use this codebase.
@article{chen2022federated,
title={Federated Learning in Non-IID Settings Aided by Differentially Private Synthetic Data},
author={Chen, Huancheng and Vikalo, Haris},
journal={arXiv preprint arXiv:2206.00686},
year={2022}
}