by Maciej Falkiewicz and Naoya Takeishi and Alexandros Kalousis
The arXiv preprint can be found under this link.
The project is governed with Data Version Control (DVC) with the pipeline defined in dvc.yaml
and parameterized with a global config file: experiments/configs/global.yaml
. There are two types of datasets: synthetic datasets and real-world datasets which have independent pipelines, both defined in dvc.yaml
.
flowchart TD
node1["evaluate@{model_id}-{dataset_name}-{simulation_budget}-{model_seed}"]
node2["generate@{dataset_name}"]
node3["preprocess@{dataset_name}"]
node4["train@{model_id}-{dataset_name}-{simulation_budget}-{model_seed}"]
node5["visualize-data@{dataset_name}"]
node6["visualize-evaluation@{model_id}-{dataset_name}-{simulation_budget}-{model_seed}"]
node7["visualize-training@{model_id}-{dataset_name}-{simulation_budget}-{model_seed}"]
node1-->node6
node2-->node3
node3-->node1
node3-->node4
node3-->node5
node4-->node1
node5-->node1
node4-->node7
The possible values of the variables are:
{model_id}
: "model1" (GAN), "model2" (WGAN), "model3" (KSGAN){dataset_name}
: "swissroll", "circles", "rings", "moons", "8gaussians", "pinwheel", "2spirals", "checkerboard"{simulation_budget}
: "512", "1024", "2048", "16384", "65536"{model_seed}
: "0", "1", "2", "3", "4"
flowchart TD
node1["evaluate-real@{model_id}-{dataset_name}-{model_seed}"]
node2["preprocess-real@{dataset_name}"]
node3["train-real@{model_id}-{dataset_name}-{model_seed}"]
node4["visualize-training-real@{model_id}-{dataset_name}-{model_seed}"]
node2-->node1
node2-->node3
node3-->node1
node3-->node4
The possible values of the variables are:
{model_id}
: "model1" (GAN), "model2" (WGAN), "model3" (KSGAN){dataset_name}
: "mnist", "cifar10"{model_seed}
: "0", "1", "2", "3", "4"
Execution on a slurm orchestrated cluster
-
To execute the synthetic datasets pipeline you can use
experiments/scripts/pipeline.py
script -
To execute the real-world datasets pipeline you can use
experiments/scripts/pipeline-real.py
script- The
evaluate-real@{model_id}-{dataset_name}-{model_seed}
stage is not triggerred by the script
- The
Attention: the scripts assume that preprocess@{dataset_name}
(preprocess-real@{dataset_name}
) stages have been already executed!
Please mind that this way of execution bypasses DVC, and thus requires commiting the changes in order to control versions.
In CPU_PARTITIONS
and GPU_PARTITIONS
environmental variables you should specify the available CPU and GPU partitions.
-
Implementations of the synthetic datasets simulators:
src/simulators/
-
Implementations of the generative models:
- GAN:
src/models/gan.py
- Wasserstein GAN:
src/models/wgan.py
- Kolmogorov–Smirnov GAN:
src/models/ksgan.py
(the proposed method)
- GAN:
-
Scripts used to run the experiments:
experiments/scripts
-
Training utilities:
src/training/utils.py
-
Evaluation utilities:
src/evaluation/utils.py
All the python dependencies are listed in requirements.txt
file.
The DVC cache for the project can be downloaded from here.