This is a forked repo of this repository and is created particularly to update the code base for the latest modules. As the HTMD
module recently had a huge paradigm shift and a new module named as MoleculeKit
currently contains classes such as Smallmol
and Molecule
, so this repo is updated to incorporate these updated and stable modules.
Also SmallMol was in Beta version in HTMD==1.13.9
.
There are some inconsistencies in nomenclature and functions in the newer Moleculekit
so I have created a file named htmd2moleculekit.py
which houses some small functions which are not there in Moleculekit.
Note: Please note that the list of functions in this file is not exhaustive, you can kindly contribute to this file for backward compatibility and added features.
pytorch==0.3.1
: Model training
keras==2.2.2
: Data loaders (To remove)
RDKit==2019.03.4
: Molecular Properties
moleculekit==0.3.1
: Molecule Manipulation
For the training a smi file is needed. We used subset of the Zinc15 dataset, using only the drug-like. The same cleaned dataset can be retrieve by using the getDataset.sh
script. The latter will download the smi file required for the training (see next section).
sh getDataset.sh
In the traindataset
folder there will be the zinc15_druglike_clean_canonical_max60.smi
file that is required for the training step (see next section).
For the generation stage the model files are necessary. It is possible to use the ones that are generated during the training step or you can download the ones that we have already generated by using the following script:
sh getWeights.sh
In the modelweights
folder there will be the three models:
- decoder-210000.pkl
- encoder-210000.pkl
- vae-210000.pkl
Note that training runs on a GPU and it will take several days to complete.
First construct a set of training molecules:
$ python prepare_data.py -i "./path/to/my/smiles.smi" -o "./path/to/my/smiles.npy"
Secondly, execute the training of a model:
$ python train.py -i "./path/to/my/smiles.npy" -o "./path/to/models"
Web based compund generation is available at https://playmolecule.org/LigDream/.
For an example of local novel compound generation please follow notebook generate.ipynb
.
Code is released under GNU AFFERO GENERAL PUBLIC LICENSE.
If you are using content of the repository please consider citing the follow work:
@article{skalic2019shape,
title={Shape-Based Generative Modeling for de-novo Drug Design},
author={Skalic, Miha and Jim{\'e}nez Luna, Jos{\'e} and Sabbadin, Davide and De Fabritiis, Gianni},
journal={Journal of chemical information and modeling},
doi = {10.1021/acs.jcim.8b00706},
publisher={ACS Publications}
}