The phd lab repository contains routines for training of networks, extraction of latent representations, saturation computation and other experimental and probe training.
Phd-lab is written in python. It uses several third-party moduls which have to be installed in order to run the experiments. The following two sections provide installation instructions.
The file requirements.txt
can be used to install all requirements
using pip
into new, virtual environment (called phd-lab-env
):
python3 -m venv phd-lab-env
source phd-lab-env/bin/activate
pip3 install -r requirements.txt
Remarks:
- it seems not possible to install
delve
with Python 3.5.2 (the version installed at the IKW) - at our institute, torch complains that the NVIDIA driver is too old (found version 10010). However, there seems to be no way to upgrade this with pip. In this situation you may resort to the conda installation.
- if not needed anymore, the virtual environment can be deleted by typing
rm -R phd-lab-env/
.
When using conda
, you can use the file environment.yml
to set up a
new conda environment caled phd-lab
, containing all required packages:
conda env create -f environment.yml
conda activate phd-lab
Remarks:
- if no longer needed, the environment can be removed by typing
conda remove --name phd-lab --all
- at the institute of cognitive science (IKW), the currently installed
nvidia driver (460.27.04) allows at best CUDA toolkit version CUDA
11.2.0 GA. Use the file
environment-ikw.yml
instead ofenvironment.yml
for an adapted environment. - To check if torch can use your CUDA version, you can run the following command:
python -c "import torch; print(torch.cuda.is_available())"
some (but not all) versions of torch also support:
python -c "import torch; print(torch._C._cuda_isDriverSufficient())"
Models are configures using json-Files. The json files are collected in the ./configs folder.
{
"model": ["resnet18", "vgg13", "myNetwork"],
"epoch": [30],
"batch_size": [128],
"dataset": ["Cifar10", "ImageNet"],
"resolution": [32, 224],
"optimizer": ["adam", "radam"],
"metrics": ["Accuracy", "Top5Accuracy", "MCC"],
"logs_dir": "./logs/",
"device": "cuda:0",
"conv_method": ["channelwise"],
"delta": [0.99],
"data_parallel": false,
"downsampling": null
}
Note that some elements are written as lists and some are not.
A config can desribe an arbitrary number of experiments, where the number
of experiments is the number of possible value combinations. The only
exception from this rule are the metrics, which are allways provided as a list
and are used all during every experiment.
In the above example, we train 3 models on 2 datasets using 2 optimizers.
This result in 3x2x2=12 total experiments.
It is not necessary to set all these parameters everytime. If a parameter is not
set a default value will be injected.
You can inspect the default value of all configuration keys in phd_lab.experiments.utils.config.DEFAULT_CONFIG
.
Logging is done in a folder structure. The root folder of the logs is specified
in logs_dir
of the config file.
The system has the follow save structure
+-- logs
| +-- MyModel
| | +-- MyDataset1_64 //dataset name followed by input resolution
| | | +-- MyRun //id of this specific run
| | | | +-- probe_performance.csv //if you compute probe performances this file is added containing accuracies per layer, you may add a prefix to this file
| | | | +-- projected_results.csv //if you projected the networks
| | | | +-- computational_info.json //train_model.py will compute some meta info on FLOPS per inference step and save it as json
| | | | +-- MyModel-MyDataset1-r64-bs128-e30_config.json //lets repeat this specific run
| | | | +-- MyModel-MyDataset1-r64-bs128-e30.csv //saturation and metrics
| | | | +-- MyModel-MyDataset1-r64-bs128-e30.pt //model, lr-scheduler and optimizer states
| | | | +-- MyModel-MyDataset1-r64-bs128-e30lsat_epoch0.png //plots of saturation and intrinsic dimensionality
| | | | +-- MyModel-MyDataset1-r64-bs128-e30lsat_epoch1.png
| | | | +-- MyModel-MyDataset1-r64-bs128-e30lsat_epoch2.png
| | | | +-- .
| | | | +-- .
| | | | +-- .
| +-- VGG16
| | +-- Cifar10_32
. . . . .
. . . . .
. . . . .
The only exception from this logging structure are the latent
representation, which will be dumped in the folder
latentent_datasets
in the top level of this repository. The reason
for this is the size of the latent representation on the hard
drive. You likely want to keep your light-weight csv-results in the
logs, but may want to remove extracted latent representations on a
regular basis to free up space. (They can be
reextracted from the saved model quite easily,
so it's not even a time loss realy)
Execution of experiments is fairly straight forward. You can easily
write scripts if you want to deviate from the out-of-the-box
configurations (more on that later). In the phd_lab
folder you
will find scripts handling different kinds of model training and
analysis.
There are 4 overall scripts that will conduct a training if called. It is worth noting that each script is calling the same Main-Functionn object in just a slightly different configuration. They therefore share the same command line arguments and basic execution logic. The scripts are:
train_model.py
train models and compute saturation along the way, adds also a json with information about FLOPs required per image and traininginfer_with_altering_delta.py
same as train.py, after trainingg concluded the model es evaluated on the test set, while changing the delta-value of all PCA layers.extract_latent_representations.py
extract the latent representation of the train and test set after training has concluded.compute_receptive_field.py
compute the receptive field after training. Non-sequential models must implementednoskip: bool
argument in their consstructor in order for this to work properly.probes_meta_execution_script.py
basicallyextract_latent_representations.py
andtrain_probes.py
combined into one file for easier handling.
All of these scripts have the same arguments:
--config
path to the config.json--device
compute device for the modelcuda:n
for the nth gpu,cpu
for cpu--run-id
the id of the run, may be any string, all experiments of this config will be saved in a subfolder with this id. This is useful if you want to repeat experiments multiple times.
Additionally extract_latent_representations.py
has an additional argument:
--downsampling
target height and width of the downsampled feature map. Default value is 4. Adaptive Average Pooling is used for downsampling. In case ofprobes_meta_execution_script.py
this argument is called-d
instead and may be used multiple times to train probed multiple times on various resolutions.--prefix
if set, the content of this argument will be added as a prefix infront of the filename, separated by underscore. For examplefoo_probe_performance.csv
.
All metrics and the model itself are checkpointed after each epoch and the previous weights are overwritten. The system will automatically resume training at the end of the last finished epoch. If one or more trainings were completed, these trainings are skipped. Please note that post-training actions like the extractions of latent representations will still be executed. Furthermore runs are identified by their run-id. Runs under different run-ids generally do not recognize each other, even if they are based on the same configuration.
Latent representations for an experiment (a specific model and dataset)
can be obtained by the script extract_latent_representations.py
.
python extract_latent_representations.py --config ./configs/myconfig.json --device cuda:0 --run-id MyRun --downsample 4
The script expects the usual parameters --config
, --device
,
and --run-id
, and the following additional value:
--downsample
:
This script will feed the full dataset through the model and store
the observed activation patterns for each layer. The data are
stored in the directory latent_datasets/[experiment]/
and
the files are called [train|eval]-[layername].p
+-- latent_datasets/
| +-- ResNet18_XXS_Cifar10_32/
| | +-- eval-layer1-0-conv1.p
| | +-- eval-layer1-0-conv2.p
| | +-- ...
| | +-- model_pointer.txt
| | +-- train-layer1-0-conv1.p
| | +-- train-layer1-0-conv2.p
| | +-- ...
. .
. .
. .
The .p
are pickle files containing numpy arrays with the latent
representations.
The file model_pointer.txt
contains the path to the log files.
Another operation that is possible with this repository is training probe classifiers on receptive fields. Probe Classifiers are LogisticRegression models. They are trained on the output of a neural network layer using the original labels. The performance relative to the model performance yields an intermediate solution quality for the trained model. After extracting the latent representations you can train the probe classifiers on the latent representation by calling
python train_probes.py --config ./configs/myconfig.json --prefix "SomePrefix" -mp 4
The script train_probes.py
can take the following arguments:
--config
the config the original experiments were conducted on-f
the root folder of the latent representation storage is by default./latent_representation
-mp
the number of processes spawned by this script. By default the number of processes equal to the number of cores on your cpu. Note that the parallelization is done over the number of layers, therefore more processes than layers will not yield any performance benefits.
The performance of the probe classifiers in stored in the log
directory under the name probe_performances.csv
.
The system uses joblist chaching and will recognize whether a logistic regression has allready been fitted on a particular latent representation and skip training if it has, making crash recovery less painful.
All experiments are strictly tied to the run-id and their configuration. This means that two trained models are considered equal if they are trained using the same configuration parameters and run-id, regardless of the called script. There for you could for instance run:
python train_model.py --config ./configs/myconfig.json --device cuda:0 --run-id MyRun
followed by
python compute_receptive_field.py --config ./configs/myconfig.json --device cuda:0 --run-id MyRun
the latter script call will recognize the previously trained models and just skip to computing the receptive field and add the additional results to the logs.
You can combine the steps of train a model,
extract pixelwise latent representation and
train probes
using the script probe_meta_execution_script.py
:
python probe_meta_execution_script.py --config ../configs/your_config.json -mp ${NumberOfCoresYouWantToUse} -d pixelwise --device cuda:0 --run-id ${YourRunID}
--config
-d pixelwise
:
You may want to add optimizer, models and datasets to this experimental setup. Basically there is a package for each of these ingredientes:
phd_lab.datasets
phd_lab.models
phd_lab.optimizers
phd_lab.metrics
You can add datasets, model, metrics and optimizers by importing the respective factories in the __init__
file of the respective
packages.
The interfaces for the respective factories are defines as protocols in phd_lab.experiments.domain
or you can
simply orient yourself on the existing once in the package.
If you want to use entirely different registries for datasets, models, optimizers and metrics you can change registry
by setting different values for:
phd_lab.experiments.utils.config.MODEL_REGISTRY
phd_lab.experiments.utils.config.DATASET_REGISTRY
phd_lab.experiments.utils.config.OPTIMIZER_REGISTRY
phd_lab.experiments.utils.config.METRICS_REGISTRY
These registries do not need to be Module or Package-Types, they merely need to have a __dict__
that maps string keys
to the respective factories.
The name in the config file must allways match a factory in order to be a valid configuration.