This repo is the official implementation of Graph Inductive Biases in Transformers without Message Passing (Ma et al., ICML 2023) [PMLR] [arXiv]
The implementation is based on GraphGPS (Rampasek et al., 2022).
There is a typo on
``where
conda create -n grit python=3.9
conda activate grit
# please change the cuda/device version as you need
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113 --trusted-host download.pytorch.org
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric==2.2.0 -f https://data.pyg.org/whl/torch-1.12.1+cu113.html --trusted-host data.pyg.org
# RDKit is required for OGB-LSC PCQM4Mv2 and datasets derived from it.
## conda install openbabel fsspec rdkit -c conda-forge
pip install rdkit
pip install torchmetrics==0.9.1
pip install ogb
pip install tensorboardX
pip install yacs
pip install opt_einsum
pip install graphgym
pip install pytorch-lightning # required by graphgym
pip install setuptools==59.5.0
# distuitls has conflicts with pytorch with latest version of setuptools
# ---- experiment management tools --------
# pip install wandb # the wandb is used in GraphGPS but not used in GRIT (ours); please verify the usability before using.
# pip install mlflow
### mlflow server --backend-store-uri mlruns --port 5000
# Run
python main.py --cfg configs/GRIT/zinc-GRIT.yaml wandb.use False accelerator "cuda:0" optim.max_epoch 2000 seed 41 dataset.dir 'xx/xx/data'
# replace 'cuda:0' with the device to use
# replace 'xx/xx/data' with your data-dir (by default './datasets")
# replace 'configs/GRIT/zinc-GRIT.yaml' with any experiments to run
- Configurations are available under
./configs/GRIT/xxxxx.yaml
- Scripts to execute are available under
./scripts/xxx.sh
- will run 4 trials of experiments parallelly on
GPU:0,1,2,3
.
- will run 4 trials of experiments parallelly on
Our code is based on GraphGym, which intensively relies on the module registration
. This mechanism allows us to combine modules by module names.
However, it is challenging to trace the code from main.py
. Therefore, we provide hints for the overall code architecture.
You can write your customized modules and register them, to build new models under this framework.
The overall architecture of the code: ([x] indicates 'x' is a folder in the code)
- model
- utils
- [act] (the activation functions: be called by other modules)
- [pooling] (global pooling functions: be called in output head for graph level tasks)
- [network] (the macro model architecture: stem->backbone->output head)
- [encoder] (feature/PE encoders(stem): to bridge inputs to the backbone)
- [layer] (backbone layer: )
- [head] (task-dependent output head: )
- training pipeline
- data
- [loader] (data loaders: )
- [transform] (pre-computed transform: PE and other preprocessing)
- [train] (training pipeline: logging, visualization, early-stopping, checkpointing, etc.)
- [optimizer] (optimizers and lr schedulers: )
- [loss] (loss functions: )
- [config] (the default configurations)
Storing all RRWP values for large graphs can be memory-intensive, as torch_geometric loads the entire dataset into memory by default.
Alternatively, you can customize the PyG dataset class or calculate RRWP on the fly within the dataloader. Owe to the simplicity of RRWP computations, performing them on the fly only marginally slows down training with multiple processing workers. (for graphs with nodes fewer than 500).
Example config can be found in cifar10-GRIT-RRWP.yaml (line 5 and line 14).
If you find this work useful, please consider citing:
@inproceedings{ma2023GraphInductiveBiases,
title = {Graph {Inductive} {Biases} in {Transformers} without {Message} {Passing}},
booktitle = {Proc. {Int}. {Conf}. {Mach}. {Learn}.},
author = {Ma, Liheng and Lin, Chen and Lim, Derek and Romero-Soriano, Adriana and K. Dokania and Coates, Mark and H.S. Torr, Philip and Lim, Ser-Nam},
year = {2023},
}