This repository is the official implementation of DrugHIVE, a deep hierarchical variational autoencoder developed for structure-based drug design. JCIM paper.
The code has been tested in the following environment:
Software | Version |
---|---|
Python | 3.9.16 |
CUDA | 11.6 |
OpenBabel | 3.1.1 |
PyTorch | 1.12.1 |
PyTorch Lightning | 2.0.0 |
RDKit | 2021.09.5 |
Install dependencies using the listed requirements in requirements.txt
:
conda create -n drughive -c conda-forge -c pytorch -c nvidia -c rdkit --file requirements.txt
git clone https://github.com/jssweller/DrugHIVE
Pre-trained model weights can be downloaded from Zenodo:
wget -P model_checkpoints/ https://zenodo.org/records/12668687/files/drughive_model_ch9.ckpt
To sample from DrugHIVE, first adjust the parameters in the generate.yml
example configuration file. Then, run the following command:
python generate_molecules.py config/generate.yml
To sample from the prior, set zbetas: 1.
in the configuration file.
To sample from the posterior, set zbetas: 0.
in the configuration file.
To sample between the prior and posterior, set the values of zbetas
between 0.
and 1.
.
To generate molecules with substructure modification, first adjust the parameters in the generate_spatial.yml
example configuration file. Then, run the following command:
python generate_molecules.py config/generate_spatial.yml
Before running the optimization process, the QuickVina 2 docking tool must be installed:
- download (or compile) the QuickVina2 docking tool from https://qvina.github.io
- place
qvina2.1
inDrugHIVE/
or in a directory in listed in yourPATH
variable (e.g.,/usr/bin/
)
To optimize molecules with DrugHIVE, first adjust the parameters in the generate_optimize.yml
example configuration file. Then, run the following command:
python generate_optimize.py config/generate_optimize.yml
Download and extract the PDBbind refined dataset from http://www.pdbbind.org.cn/
Download ZINC molecules from https://zinc20.docking.org/ in SDF or MOL2 format. Place them in a single directory.
To process the PDBbind dataset, run:
python process_pdbbind_data.py <path/to/PDBbind/directory>
To process the ZINC dataset, run:
python process_zinc_data.py <path/to/ZINC/directory> -o data/zinc_data/zinc_data.h5 -ext <file_extension>
Here, <file_extension>
can be one of sdf
or mol2
.
First, adjust the training parameters in the config/train.yml
example configuration file. Make sure to set data_path_pdb
and data_path_zinc
to the locations of your datasets. Then, run the following command:
python train.py config/train.yml