TacoGFN: Target Conditioned GFlowNet for Structure-based Drug Design

Accepted in TMLR (Transaction on Machine Learning Research) and spotlighted in NeurIPS GenBio Workshop 2023 [arxiv].

Official Github for TacoGFN: Target Conditioned GFlowNet for Structure-based Drug Design by Tony Shen, Seonghwan Seo, Grayson Lee, Mohit Pandey, Jason Smith, Artem Cherkasov, Woo Youn Kim and Martin Ester.

We frame structure-based drug design as a Reinforcement Learning task, where the goal is to search the wider chemical space for molecules with desirable properties as opposed to fitting a training data distribution. We propose TacoGFN, a Generative Flow Network conditioned on protein pocket structure, using binding affinity, drug-likeliness and synthesizability measures as our reward.

Empirically, our method outperforms state-of-art methods on the CrossDocked2020 benchmark for every molecular property (Vina score, QED, SA), while improving the generation time by multiple orders of magnitude. TACOGFN achieves −8.82 in median docking score and 52.63% in Novel Hit Rate.

If you have any problems or need help with the code, please add an issue or contact tonyzshen@gmail.com.

Setup

Before running any scripts, please download the necessary package:

https://figshare.com/s/2738ce20d82463227113

This package includes:

trained model weights
pre-computed pharmacophores in lmdb
saved pocket-graphs in lmdb
misc files (data splitting, avg vina score, pocket centroid, generated molecules)

Please also setup up Conda Environment and install neccessary dependencies.

conda env create -f environment.yml
conda activate tacogfn
cd src/molvoxel
pip install -e .

Training TacoGFN

If you wish to re-train the model we provide the HPS for 3 model variants presented in our paper:

TacoGFN: hps/crossdocked_mo_256.json
TacoGFN (ZINCDock-15M): hps/zinc_mo_256.json
TacoGFN no pocket conditioning: hps/zinc_mo_256_noph.json

Note: TacoGFN_ranked is the same model as TacoGFN - we just generate 500 instead of 100 molecules at inference time, and rank by predicted docking score. The inference script takes care of that.

python3 src/tacogfn/tasks/pharmaco_frag.py --hps_path "$HPS_PATH"

Generating molecules and computing metrics

If you just wish to generate molecules and evaluate them, we also provide trained models files. The following scripts re-generates molecules and computes metrics on them (Docking needs to be computed seperatly).

bash scripts/generate_and_evaluate.sh

Note if you have re-trained a model, you can specify your model path to generate and evaluate the performance. You can set $NUM_PER_POCKET to 100 for normal runs. If you'd like to run TacoGFN_ranked, please change $NUM_PER_POCKET to 500.

python3 src/tasks/generate_molecules.py \
        --model_path "$MODEL_PATH" \
        --num_per_pocket $NUM_PER_POCKET \
        --comment "${COMMENT}"

python3 src/tasks/evaluate_molecules.py \
    --molecules_path "misc/generated_molecules/1.0_1.0_${NUM_PER_POCKET}_${COMMENT}.json"

Aggregating and displaying metrics

To display the metrics, we provide the generated molecules from our model and baseline models in misc/evaluations. The following scripts computes the metrics used in Table 1 and Table 2:

bash scripts/see_all_results.sh

Note if you've generated molecules from a trained model, please compute docking scores using QVina 2.1 first. Then you could call the following:

python3 src/tasks/aggergate_evals.py --eval_path "$EVAL_FILE"

Citations

@article{
        shen2024tacogfn,
        title={Taco{GFN}: Target-conditioned {GF}lowNet for Structure-based Drug Design},
        author={Tony Shen and Seonghwan Seo and Grayson Lee and Mohit Pandey and Jason R Smith and Artem Cherkasov and Woo Youn Kim and Martin Ester},
        journal={Transactions on Machine Learning Research},
        issn={2835-8856},
        year={2024},
        url={https://openreview.net/forum?id=N8cPv95zOU},
}

This project modifies GFlowNet library for graph and molecular data.

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
hps		hps
images		images
scripts		scripts
src		src
.gitignore		.gitignore
environment.yml		environment.yml
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TacoGFN: Target Conditioned GFlowNet for Structure-based Drug Design

Setup

Training TacoGFN

Generating molecules and computing metrics

Aggregating and displaying metrics

Citations

About

Releases

Packages

Contributors 2

Languages

tsa87/TacoGFN-SBDD

Folders and files

Latest commit

History

Repository files navigation

TacoGFN: Target Conditioned GFlowNet for Structure-based Drug Design

Setup

Training TacoGFN

Generating molecules and computing metrics

Aggregating and displaying metrics

Citations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages