G2D-Diff: A genotype-to-drug diffusion model for generation of tailored anti-cancer small molecules

Information

Official repository of the G2D-Diff: A genotype-to-drug diffusion model for generation of tailored anti-cancer small molecules.

Diffusion source code was adapted from the https://github.com/lucidrains/denoising-diffusion-pytorch.
All software dependencies are listed in "requirement.txt" text file.

Contact Info:
hjnam@gist.ac.kr
hyunhokim@gm.gist.ac.kr

Abstract

We present Genotype-to-Drug Diffusion (G2D-Diff), a generative artificial intelligence (AI) approach for creating small molecule-based drug structures tailored to specific cancer genotypes. G2D-Diff demonstrates exceptional performance in generating diverse, drug-like compounds that meet desired efficacy conditions for a given genotype. The model outperforms existing methods in diversity, feasibility, and condition fitness. G2D-Diff learns directly from drug response data distributions, ensuring reliable candidate generation without separate predictors. Its attention mechanism provides insights into potential cancer targets and pathways, enhancing interpretability. In triple-negative breast cancer case studies, G2D-Diff generated plausible hit-like candidates by focusing on relevant pathways. By combining realistic hit-like molecule generation with relevant pathway suggestions for specific genotypes, G2D-Diff represents a significant advance in AI-guided, personalized drug discovery. This approach has the potential to accelerate drug development for challenging cancers by streamlining hit identification, and lead optimization.

Environment setting (Ubuntu and Anaconda)

Create virtual environment

conda create -n g2d_diff python=3.8.10

Activate environment

conda activate g2d_diff

Install required packages

pip install -r requirement.txt --extra-index-url https://download.pytorch.org/whl/cu113

The installation typically takes around 10 minutes, but the time may vary depending on the environment.

Generation tutorial

GenerationTutorial.ipynb

Generation with the trained condition encoder and diffusion model.
It will take about 15 minutes for a single genotype input (ex. a cell line), but the time may vary depending on the environment.

Check the comments in the notebook for further information about the source code.
(ex. saving checkpoints. You may need to create a directory for saving.)

Reproducing the models

Use the following jupyter notebooks after adding the kernel.

python -m ipykernel --user --name g2d_diff

For all .py file and jupyter notebooks for reproducing the models, check the comments for further information.
(ex. saving checkpoints. You may need to create a directory for saving.)

For training G2D-Diff

Single GPU

accelerate launch --num_processes=1 --gpu_ids=0 distributed_G2D_Diff.py

Multiple GPUs (Check available GPU IDs)

accelerate launch --num_processes=2 --gpu_ids=0,1 distributed_G2D_Diff.py

For training condition encoder

ConitionEncoderPretraining.ipynb

For training G2D-Pred

G2DPredTraining.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
src		src
vae_package		vae_package
ConditionEncoderPretraining.ipynb		ConditionEncoderPretraining.ipynb
G2DPredTraining.ipynb		G2DPredTraining.ipynb
GenerationTutorial.ipynb		GenerationTutorial.ipynb
LICENSE		LICENSE
README.md		README.md
distributed_G2D_Diff.py		distributed_G2D_Diff.py
g2d_diff_fig.png		g2d_diff_fig.png
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

G2D-Diff: A genotype-to-drug diffusion model for generation of tailored anti-cancer small molecules

Information

Abstract

Environment setting (Ubuntu and Anaconda)

Generation tutorial

Reproducing the models

For training G2D-Diff

For training condition encoder

For training G2D-Pred

About

Releases

Packages

Languages

License

GIST-CSBL/G2D-Diff

Folders and files

Latest commit

History

Repository files navigation

G2D-Diff: A genotype-to-drug diffusion model for generation of tailored anti-cancer small molecules

Information

Abstract

Environment setting (Ubuntu and Anaconda)

Generation tutorial

Reproducing the models

For training G2D-Diff

For training condition encoder

For training G2D-Pred

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages