diff --git a/README.md b/README.md index af307a8..46d08d6 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,225 @@ -# SpreadGNN -SpreadGNN: Serverless Multi-Task Learning Framework for Graph Neural Networks +# SpreadGNN: Serverless Multi-task Federated Learning for Graph Neural Networks + +This repository is the official implementation of SpreadGNN: Serverless Multi-task Federated Learning for Graph Neural Networks + +## 1. Introduction + + +Graph Neural Networks (GNNs) are the first choice methods for graph machine learning problems thanks to their ability to learn state-of-the-art level representations from graph-structured data. However, centralizing a massive amount of real-world graph data for GNN training is prohibitive due to user-side privacy concerns, regulation restrictions, and commercial competition. Federated Learning is the de-facto standard for collaborative training of machine learning models over many distributed edge devices without the need for centralization. Nevertheless, training graph neural networks in a federated setting is vaguely defined and brings statistical and systems challenges. This work proposes SpreadGNN, a novel multi-task federated training framework capable of operating in the presence of partial labels and absence of a central server for the first time in the literature. SpreadGNN extends federated multi-task learning to realistic serverless settings for GNNs, and utilizes a novel optimization algorithm with a convergence guarantee, Decentralized Periodic Averaging SGD (DPA-SGD), to solve decentralized multi-task learning problems. We empirically demonstrate the efficacy of our framework on a variety of non-I.I.D. distributed graph-level molecular property prediction datasets with partial labels. Our results show that SpreadGNN outperforms GNN models trained over a central server-dependent federated learning system, even in constrained topologies. + + +## 2. Installation + + +```bash +conda create -n spreadgnn python=3.7 +conda activate spreadgnn +conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia +conda install -c anaconda mpi4py grpcio +conda install scikit-learn numpy h5py setproctitle networkx +pip install -r requirements.txt +cd FedML; git submodule init; git submodule update; cd ../; +pip install -r FedML/requirements.txt +``` + + +## 3. Data Preparation +For each dataset you want to try run the .sh file located in the dataset folder. +For more datasets, visit http://moleculenet.ai/ + + +## 4. Experiments + + +### Distributed/Federated Molecule Property Classification experiments +``` +sh run_fedavg_distributed_pytorch.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256 sider "./../../../data/sider/" 0 + +##run on background +nohup sh run_fedavg_distributed_pytorch.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256 sider "./../../../data/sider/" 0 > ./fedavg-graphsage.log 2>&1 & +``` + +### Distributed/Federated Molecule Property Regression experiments +``` +sh run_fedavg_distributed_reg.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256 freesolv "./../../../data/freesolv/" 0 + +##run on background +nohup sh run_fedavg_distributed_reg.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256 freesolv "./../../../data/freesolv/" 0 > ./fedavg-graphsage.log 2>&1 & +``` + +#### Arguments for Distributed/Federated Training +This is an ordered list of arguments used in distributed/federated experiments. Note, there are additional parameters for this setting. +``` +CLIENT_NUM=$1 -> Number of clients in dist/fed setting +WORKER_NUM=$2 -> Number of workers +SERVER_NUM=$3 -> Number of servers +GPU_NUM_PER_SERVER=$4 -> GPU number per server +MODEL=$5 -> Model name +DISTRIBUTION=$6 -> Dataset distribution. homo for IID splitting. hetero for non-IID splitting. +ROUND=$7 -> Number of Distiributed/Federated Learning Rounds +EPOCH=$8 -> Number of epochs to train clients' local models +BATCH_SIZE=$9 -> Batch size +LR=${10} -> learning rate +SAGE_DIM=${11} -> Dimenionality of GraphSAGE embedding +NODE_DIM=${12} -> Dimensionality of node embeddings +SAGE_DR=${13} -> Dropout rate applied between GraphSAGE Layers +READ_DIM=${14} -> Dimensioanlity of readout embedding +GRAPH_DIM=${15} -> Dimensionality of graph embedding +DATASET=${16} -> Dataset name (Please check data folder to see all available datasets) +DATA_DIR=${17} -> Dataset directory +CI=${18} +``` + +### Distributed/Federated Molecule Property Classification with FedGMTL +``` +sh run_fedavg_distributed_pytorch.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256 sider "./../../../data/sider/" 0 + +##run on background +nohup sh run_fedavg_distributed_pytorch.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256 sider "./../../../data/sider/" 0 > ./fedavg-graphsage.log 2>&1 & +``` + +#FedGMTL Classification experiments + +``` +sh run_fedgmtl.sh 8 8 1 1 graphsage hetero 0.5 70 1 1 0.0015 0.3 1 0 64 64 0.3 64 64 1 sider =./../../../data/sider/ 1 0 +``` + +#FedGMTL Regression experiments + +``` +sh run_fedgmtl_reg.sh 8 8 1 1 graphsage hetero 0.5 70 1 1 0.0015 0.3 1 0 64 64 0.3 64 64 1 qm8 "./../../../data/qm8/" 1 0 +``` + +#### Arguments for FedGMTL +This is an ordered list of arguments used in distributed/federated experiments. Note, there are additional parameters for this setting. +``` +CLIENT_NUM=$1 -> Number of clients in dist/fed setting +WORKER_NUM=$2 -> Number of workers +SERVER_NUM=$3 -> Number of servers +GPU_NUM_PER_SERVER=$4 -> GPU number per server +MODEL=$5 -> Model name +DISTRIBUTION=$6 -> Dataset distribution. homo for IID splitting. hetero for non-IID splitting. +PARTITION_ALPHA=$7 -> Alpha parameter for Dirichlet distribution +ROUND=$8 -> Number of Distributed/Federated Learning Rounds +EPOCH=$9 -> Number of epochs to train clients' local models +BATCH_SIZE=${10} -> Batch size +LR=${11} -> Learning rate +TASK_W=${12} -> Task-Relationship regularizer weight +TASK_W_DECAY=${13} -> Decay for Task-Relationship regularizer +WD=${14} -> Weight Decay Coefficient +HIDDEN_DIM=${15} -> Dimensionality of GNN Hidden Layer +NODE_DIM=${16} -> Dimensionality of Node embeddings +DR=${17} -> Dropout rate applied between GraphSAGE Layers +READ_DIM=${18} -> Dimensionality of readout embedding +GRAPH_DIM=${19} -> Dimensionality of graph embedding +MASK_TYPE=${20} -> Mask scenario (0,1,2) +DATASET=${21} -> Dataset name +DATA_DIR=${22} -> Directory +CI=${23} +``` + +#SpreadGNN Classification experiments + +``` +sh run_spreadgnn.sh 8 8 1 1 graphsage hetero 0.5 70 1 1 0.0015 0.3 1 0 64 64 0.3 64 64 1 sider =./../../../data/sider/ 1 0 +``` + +#SpreadGNN Regression experiments + +``` +sh run_spreadgnn_reg.sh 8 8 1 1 graphsage hetero 0.5 70 1 1 0.0015 0.3 1 0 64 64 0.3 64 64 1 qm8 "./../../../data/qm8/" 1 0 +``` + +#### Arguments for SpreadGNN +This is an ordered list of arguments used in distributed/federated experiments. Note, there are additional parameters for this setting. +``` +CLIENT_NUM=$1 -> Number of clients in dist/fed setting +WORKER_NUM=$2 -> Number of workers +SERVER_NUM=$3 -> Number of servers +GPU_NUM_PER_SERVER=$4 -> GPU number per server +MODEL=$5 -> Model name +DISTRIBUTION=$6 -> Dataset distribution. homo for IID splitting. hetero for non-IID splitting. +PARTITION_ALPHA=$7 -> Alpha parameter for Dirichlet distribution +ROUND=$8 -> Number of Distributed/Federated Learning Rounds +EPOCH=$9 -> Number of epochs to train clients' local models +BATCH_SIZE=${10} -> Batch size +LR=${11} -> Learning rate +TASK_W=${12} -> Task-Relationship regularizer weight +TASK_W_DECAY=${13} -> Decay for Task-Relationship regularizer +WD=${14} -> Weight Decay Coefficient +HIDDEN_DIM=${15} -> Dimensionality of GNN Hidden Layer +NODE_DIM=${16} -> Dimensionality of Node embeddings +DR=${17} -> Dropout rate applied between GraphSAGE Layers +READ_DIM=${18} -> Dimensionality of readout embedding +GRAPH_DIM=${19} -> Dimensionality of graph embedding +MASK_TYPE=${20} -> Mask scenario (0,1,2) +DATASET=${21} -> Dataset name +DATA_DIR=${22} -> Directory +PERIOD=${23} -> Communication Period for Parameter Exchange +CI=${24} +``` + + +## 6. Code Structure of SpreadGNN + +- `FedML`: a soft repository link generated using `git submodule add https://github.com/FedML-AI/FedML`. + +- `data`: provide data downloading scripts and store the downloaded datasets. + + +- `data_preprocessing`: data loaders + +- `model`: advanced molecular ML models. + +- `trainer`: please define your own `trainer.py` by inheriting the base class in `FedML/fedml-core/trainer/fedavg_trainer.py`. +Some tasks can share the same trainer. + +- `experiments/distributed`: +1. `experiments` is the entry point for training. It contains experiments in different platforms. +2. Every experiment integrates FOUR building blocks `FedML` (federated optimizers), `data_preprocessing`, `model`, `trainer`. + + +## 5. Update FedML Submodule +``` +cd FedML +git checkout master && git pull +cd .. +git add FedML +git commit -m "updating submodule FedML to latest" +git push +``` + + + +## 6. Citation +Please cite our FedML paper if it helps your research. +You can describe us in your paper like this: "We develop our experiments based on FedML". +``` +@misc{he2020fedml, + title={FedML: A Research Library and Benchmark for Federated Machine Learning}, + author={Chaoyang He and Songze Li and Jinhyun So and Xiao Zeng and Mi Zhang and Hongyi Wang and Xiaoyang Wang and Praneeth Vepakomma and Abhishek Singh and Hang Qiu and Xinghua Zhu and Jianzong Wang and Li Shen and Peilin Zhao and Yan Kang and Yang Liu and Ramesh Raskar and Qiang Yang and Murali Annavaram and Salman Avestimehr}, + year={2020}, + eprint={2007.13518}, + archivePrefix={arXiv}, + primaryClass={cs.LG} +} + +@misc{he2021fedgraphnn, + title={FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks}, + author={Chaoyang He and Keshav Balasubramanian and Emir Ceyani and Yu Rong and Peilin Zhao and Junzhou Huang and Murali Annavaram and Salman Avestimehr}, + year={2021}, + eprint={2104.07145}, + archivePrefix={arXiv}, + primaryClass={cs.LG} +} + +@misc{he2021spreadgnn, + title={SpreadGNN: Serverless Multi-task Federated Learning for Graph Neural Networks}, + author={Chaoyang He and Emir Ceyani and Keshav Balasubramanian and Murali Annavaram and Salman Avestimehr}, + year={2021}, + eprint={2106.02743}, + archivePrefix={arXiv}, + primaryClass={cs.LG} +} + +```