This is the official repo of gemGAT: Cross-tissue Graph Attention Networks for Semi-supervised Gene Expression Prediction. gemGAT aims to enhance gene expression prediction across different tissues using advanced graph neural networks.
The model is trained on NVIDIA GeForce RTX 3090. Here are dependencies in Python. Note that you may upgrade those packages to fit your data and experimental settings.
pytorch
: 1.13.0
dgl-cuda11.6
: 0.9.1
numpy
: 1.23.4
pandas
: 1.4.2
gemGAT requires the following dataset to train the model:
-
Gene expression in the source tissue.
-
Gene-gene network (e.g., co-expression network) in both source and the target tissue.
A sample dataset can be found here to illustrate the data format allowed by the program, in which we have four files corrsponding to tissue Brain Amygdala
processed from ADNI dataset:
expr_in_Brain-Amygdalaadni.csv
: This csv file saves gene expression data in the source tissue. The first row and the first column are subject and gene ID, respectively. Each element corresponds to gene expression regarding a specific subject and a specific gene.
expr_out_Brain-Amygdalaadni.csv
: This csv file saves gene expression data in the target tissue for training purpose. The first row and the first column are the same set of subject ID and (usually more) gene ID, respectively.
graph_in_Brain-Amygdalaadni.csv
: This csv file saves gene-gene netnetwork in the source tissue. The first row and the first column are both IDs for the same set of genes in the same order. Gene-gene network is a binary matrix that indicates interactions between genes, and can be any known gene-gene networks or constructed via existing tools, such as co-expression network constructed by WGCNA. We constructed our gene-gene co-expression networks of both source and target tissues via WGCNA using gene expression data (e.g., expr_in_Brain-Amygdalaadni.csv
and expr_out_Brain-Amygdalaadni.csv
).
graph_out_Brain-Amygdalaadni.csv
: This csv file saves gene-gene netnetwork in the target tissue. The first row and the first column are both IDs for the same set of genes in the same order. Note that genes of the source tissue are covered by those of the target tissue. We order genes in the source tissue before genes that are in the target tissue but not in the source tissue.
All datasets used in our paper are processed in the same way introduced above from the GTEX v8 dataset and ADNI dataset. Full dataset that we processed to train the model is provided upon request.
Create your own dataset of follow the above steps to download sample dataset. Put datasets in data
folder. Run the following code to train the model:
python train.py --train True --epoch 1000 --nhidatt 1024 --nheads 8 --lr 0.001 --data Brain-Amygdalaadni
train
: True if training or False if inference
epoch
: number of epoches for training the model
nhidatt
: hidden dimension of attention
nheads
: number of heads in attention
lr
: learning rate
data
: your data name
You can change the name of your data, but make sure your datasets' name follow the format expr_in_<your data name>.csv
, expr_out_<your data name>.csv
, graph_in_<your data name>.csv
and graph_out_<your data name>.csv
. You can also customize your training hyperparameter. To change other model parameters, feel free to do it by modifying parameters in model.py
.
The training process will save the model as <your data name>.pt
in the folder, on which you should perform inference.
Once you have trained the model save as <your data name>.pt
, you can execute model inference by simply setting --train
as False.
python train.py --train False --nhidatt 1024 --nheads 8 --data Brain-Amygdalaadni
You can change the name of your data, but make sure your datasets' name follow the format expr_in_<your data name>.csv
, expr_out_<your data name>.csv
, graph_in_<your data name>.csv
and graph_out_<your data name>.csv
. Note that the program with automatically use the testing set. Inference will produce a file <your data name>_inference.csv
, which each row as a subject and each column as a gene, and elements as predicted gene expressions.