Skip to content

alaaj27/DeepVul

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepVul

DeepVul is a model designed to predict gene essentiality and drug response using gene expression data. The model leverages a shared feature extractor to learn representations that can be fine-tuned for specific tasks such as gene essentiality prediction and drug response prediction.

Installation

To set up the environment, use the provided condaenv.yml file with conda. First, ensure you have conda installed, then run the following command:

conda env create --file condaenv.yml
conda activate condaenv

Datasets

To run the DeepVul model, you will need to download the following datasets and copy them into the data directory (with the names shown below):

  1. Gene Expression: OmicsExpressionProteinCodingGenesTPMLogp1
  2. Gene Essentiality: CRISPRGeneEffect.csv
  3. Drug Response: primary-screen-replicate-collapsed-logfold-change.csv
  4. Sanger Essentiality Data: gene_effect.csv
  5. Somatic Mutation Data: CCLE_Oncomap3_Assays_2012-04-09.csv

After downloading these datasets, place them in the data directory to ensure the model can access them correctly.

Hyperparameter Usage and Possible Values

When running the DeepVul model, you can specify various hyperparameters to control its behavior. Below is a list of the hyperparameters along with their possible values:

  • --pretrain_batch_size: Batch size for pre-training data loading (default: 20)
  • --finetuning_batch_size: Batch size for fine-tuning data loading (default: 20)
  • --hidden_state: Hidden state size for the model (default: 500)
  • --pre_train_epochs: Number of epochs for pre-training (default: 20)
  • --fine_tune_epochs: Number of epochs for fine-tuning (default: 20)
  • --opt: Optimizer type (default: "Adam")
  • --lr: Learning rate for the optimizer (default: 0.0001)
  • --dropout: Dropout rate (default: 0.1)
  • --nhead: Number of heads in the multihead attention models (default: 2)
  • --num_layers: Number of layers in the model (default: 2)
  • --dim_feedforward: Dimension of the feedforward network (default: 2048)
  • --fine_tuning_mode: Mode for fine-tuning (default: "freeze-shared", options: ["freeze-shared", "initial-shared"])
  • --run_mode: Run mode (options: "pre-train", "fine-tune", "both")

Running the Model

First, change your current directory to src :

cd src

Pre-training

To run the pre-training process, use the following command:

python run_deepvul.py --pretrain_batch_size 20 --hidden_state 1000 --pre_train_epochs 20 --opt "Adam" --lr 0.0005 --dropout 0.2 --nhead 4 --num_layers 2 --dim_feedforward 1024 --run_mode pre-train

Fine-tuning

To run the fine-tuning process, use the following command:

python run_deepvul.py --finetuning_batch_size 20 --hidden_state 1000 --fine_tune_epochs 20 --opt "Adam" --lr 0.0005 --dropout 0.2 --nhead 4 --num_layers 2 --dim_feedforward 1024 --fine_tuning_mode "freeze-shared" --run_mode fine-tune

Running Both Pre-training and Fine-tuning

To run both pre-training and fine-tuning sequentially, use the following command:

python run_deepvul.py --pretrain_batch_size 20 --finetuning_batch_size 20 --hidden_state 1000 --pre_train_epochs 20 --fine_tune_epochs 20 --opt "Adam" --lr 0.0005 --dropout 0.2 --nhead 4 --num_layers 2 --dim_feedforward 1024 --fine_tuning_mode "freeze-shared" --run_mode both

Additional Information

For more details on the model and its implementation, please refer to the source code and associated documentation. If you encounter any issues or have questions, please open an issue or contact the maintainers.

Citation

@article {Jararweh2024.10.17.618944,
	author = {Jararweh, Ala and Arredondo, David and Macaulay, Oladimeji and Dicome, Mikaela and Tafoya, Luis and Hu, Yue and Virupakshappa, Kushal and Boland, Genevieve and Flaherty, Keith and Sahu, Avinash},
	title = {DeepVul: A Multi-Task Transformer Model for Joint Prediction of Gene Essentiality and Drug Response},
	elocation-id = {2024.10.17.618944},
	year = {2024},
	doi = {10.1101/2024.10.17.618944},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2024/10/21/2024.10.17.618944},
	eprint = {https://www.biorxiv.org/content/early/2024/10/21/2024.10.17.618944.full.pdf},
	journal = {bioRxiv}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages