DeepVul is a model designed to predict gene essentiality and drug response using gene expression data. The model leverages a shared feature extractor to learn representations that can be fine-tuned for specific tasks such as gene essentiality prediction and drug response prediction.
To set up the environment, use the provided condaenv.yml
file with conda. First, ensure you have conda installed, then run the following command:
conda env create --file condaenv.yml
conda activate condaenv
To run the DeepVul model, you will need to download the following datasets and copy them into the data
directory (with the names shown below):
- Gene Expression: OmicsExpressionProteinCodingGenesTPMLogp1
- Gene Essentiality: CRISPRGeneEffect.csv
- Drug Response: primary-screen-replicate-collapsed-logfold-change.csv
- Sanger Essentiality Data: gene_effect.csv
- Somatic Mutation Data: CCLE_Oncomap3_Assays_2012-04-09.csv
After downloading these datasets, place them in the data
directory to ensure the model can access them correctly.
When running the DeepVul model, you can specify various hyperparameters to control its behavior. Below is a list of the hyperparameters along with their possible values:
--pretrain_batch_size
: Batch size for pre-training data loading (default: 20)--finetuning_batch_size
: Batch size for fine-tuning data loading (default: 20)--hidden_state
: Hidden state size for the model (default: 500)--pre_train_epochs
: Number of epochs for pre-training (default: 20)--fine_tune_epochs
: Number of epochs for fine-tuning (default: 20)--opt
: Optimizer type (default: "Adam")--lr
: Learning rate for the optimizer (default: 0.0001)--dropout
: Dropout rate (default: 0.1)--nhead
: Number of heads in the multihead attention models (default: 2)--num_layers
: Number of layers in the model (default: 2)--dim_feedforward
: Dimension of the feedforward network (default: 2048)--fine_tuning_mode
: Mode for fine-tuning (default: "freeze-shared", options: ["freeze-shared", "initial-shared"])--run_mode
: Run mode (options: "pre-train", "fine-tune", "both")
First, change your current directory to src :
cd src
To run the pre-training process, use the following command:
python run_deepvul.py --pretrain_batch_size 20 --hidden_state 1000 --pre_train_epochs 20 --opt "Adam" --lr 0.0005 --dropout 0.2 --nhead 4 --num_layers 2 --dim_feedforward 1024 --run_mode pre-train
To run the fine-tuning process, use the following command:
python run_deepvul.py --finetuning_batch_size 20 --hidden_state 1000 --fine_tune_epochs 20 --opt "Adam" --lr 0.0005 --dropout 0.2 --nhead 4 --num_layers 2 --dim_feedforward 1024 --fine_tuning_mode "freeze-shared" --run_mode fine-tune
To run both pre-training and fine-tuning sequentially, use the following command:
python run_deepvul.py --pretrain_batch_size 20 --finetuning_batch_size 20 --hidden_state 1000 --pre_train_epochs 20 --fine_tune_epochs 20 --opt "Adam" --lr 0.0005 --dropout 0.2 --nhead 4 --num_layers 2 --dim_feedforward 1024 --fine_tuning_mode "freeze-shared" --run_mode both
For more details on the model and its implementation, please refer to the source code and associated documentation. If you encounter any issues or have questions, please open an issue or contact the maintainers.
@article {Jararweh2024.10.17.618944,
author = {Jararweh, Ala and Arredondo, David and Macaulay, Oladimeji and Dicome, Mikaela and Tafoya, Luis and Hu, Yue and Virupakshappa, Kushal and Boland, Genevieve and Flaherty, Keith and Sahu, Avinash},
title = {DeepVul: A Multi-Task Transformer Model for Joint Prediction of Gene Essentiality and Drug Response},
elocation-id = {2024.10.17.618944},
year = {2024},
doi = {10.1101/2024.10.17.618944},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2024/10/21/2024.10.17.618944},
eprint = {https://www.biorxiv.org/content/early/2024/10/21/2024.10.17.618944.full.pdf},
journal = {bioRxiv}
}