The following is repository containing implementation of the paper "Graph-level representations using ensemble-based readout functions" by Jakub Binkowski, Albert Sawczyn, Denis Janiak, Piotr Bielak and Tomasz Kajdanowicz.
The paper is available on arXiv or in the proceedings of ICCS 2023.
@InProceedings{10.1007/978-3-031-35995-8_28,
author="Binkowski, Jakub and Sawczyn, Albert and Janiak, Denis and Bielak, Piotr and Kajdanowicz, Tomasz",
editor="Miky{\v{s}}ka, Ji{\v{r}}{\'i} and de Mulatier, Cl{\'e}lia and Paszynski, Maciej and Krzhizhanovskaya, Valeria V. and Dongarra, Jack J. and Sloot, Peter M.A.",
title="Graph-Level Representations Using Ensemble-Based Readout Functions",
booktitle="Computational Science -- ICCS 2023",
year="2023",
publisher="Springer Nature Switzerland",
address="Cham",
pages="393--405",
isbn="978-3-031-35995-8"
}
- Experiment are stated in
dvc.yaml
file, and could be easly run withdvc repro
command (but not in parallel, so it may take much time to finish) - Repeated experiments on several seeds are run with script: train_gnn_with_reps.py
- Each experiment runs with corresponding configuration file
experiments/confg/ensemble_readouts/hparams_<dataset>.yaml
, which contains:- directories of the source dataset and further experiment output
- hyperparameters, including model architecture specification
Repository relies on Python 3.10
-
There are 4 files with requirements declared:
requirements-cpu.txt
- packages only for CPU environmentsrequirements-gpu.txt
- packages only for GPU environmentsrequirements.txt
- packages common to GPU and CPU environmentsrequirements-dev.txt
- linting tool to support code quality maintenance (optional)
-
Install requirements for CPU environments:
pip install -r requirements-cpu.txt -r requirements.txt -r requirements-dev.txt
-
Install requirements for GPU environments:
pip install -r requirements-gpu.txt -r requirements.txt -r requirements-dev.txt
To reproduce experiments with DVC, simply use the command below. Due to relatively low resource consumption, one might want to leverage parallel run described in the next section.
$ dvc pull data/datasets/{ENZYMES,MUTAG,REDDIT-MULTI-12K,ZINC}.dvc
$ dvc repro
- Single experiment is relatively lightweight (consumes about ~10% of NVIDIA TITAN RTX)
- For the sake of fast computation of multiple experiments there is script, which helps to exploit resources
- To run experiments:
- Ensure you have conda with environment called
ensemble-readouts
, containing all dependencies installed - Select dataset for which you want to run experiments, and look at the script run_training.sh
- Run the scripts with parameters depending on available resources, e.g.,
CUDA_VISIBLE_DEVICES=0 experiments/shell_runner/run_training.sh \ --config-path experiments/config/deterministic_readouts/hparams_enzymes.yaml \ --config-list experiments/config/config_names.yaml \ --config-list-names ensemble_readouts \ --num-jobs 4 \ --accelerator gpu \ --devices 1
- Ensure you have conda with environment called