Authors: James Jordon, Jinsung Yoon, Mihaela van der Schaar
Reference: James Jordon, Jinsung Yoon, Mihaela van der Schaar, "PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees," International Conference on Learning Representations (ICLR), 2019.
Paper Link: https://openreview.net/forum?id=S1zk9iRqF7
Contact: jsyoon0823@gmail.com
This directory contains implementations of PATEGAN framework for generating synthetic data.
To run the pipeline for training and evaluation on PATEGAN framwork, simply run python3 -m main_pategan_experiment.py.
Note that hyper-parameter tuning is necessary for different datasets.
(1) data_generator.py
- Generate train and test data to evaluate PATEGAN framework
(2) utils.py
- Define various supervised models such as logistic regression
- Return AUC and APR as the metrics
(3) pate_gan.py
- Main PATEGAN framework
- Return the synthetically generated data
(4) main_pategan_experiment.py
- Report the prediction performances of original data and synthetic data generated by PATEGAN.
- data_no: number of generated data
- data_dim: number of data dimensions
- noise_rate: noise ratio on data
- iterations: number of iterations for handling initialization randomness
- n_s: the number of student training iterations
- batch_size: the number of batch size for training student and generator
- k: the number of teachers
- epsilon: Differential privacy parameters (epsilon)
- delta: Differential privacy parameters (delta)
- lamda: PATE noise size
Note that hyper-parameters should be optimized for different datasets.
$ python3 main_pategan_experiment.py --data_no 10000 --data_dim 10 --noise_rate 1.0
--iterations 50 --n_s 1 --batch_size 64 --k 100 --epsilon 100 --delta 0.0001
--lamda 1.0
- results: performances of Original and Synthetic performances
- train_data: original data
- synth_train_data: synthetically generated data