LigDream: Shape-Based Compound Generation

This is a forked repo of this repository and is created particularly to update the code base for the latest modules. As the HTMD module recently had a huge paradigm shift and a new module named as MoleculeKit currently contains classes such as Smallmol and Molecule, so this repo is updated to incorporate these updated and stable modules.

Also SmallMol was in Beta version in HTMD==1.13.9.

There are some inconsistencies in nomenclature and functions in the newer Moleculekit so I have created a file named htmd2moleculekit.py which houses some small functions which are not there in Moleculekit.

Note: Please note that the list of functions in this file is not exhaustive, you can kindly contribute to this file for backward compatibility and added features.

Requirements

pytorch==0.3.1 : Model training keras==2.2.2 : Data loaders (To remove) RDKit==2019.03.4 : Molecular Properties moleculekit==0.3.1 : Molecule Manipulation

Before starting

For the training a smi file is needed. We used subset of the Zinc15 dataset, using only the drug-like. The same cleaned dataset can be retrieve by using the getDataset.sh script. The latter will download the smi file required for the training (see next section).

  sh getDataset.sh

In the traindataset folder there will be the zinc15_druglike_clean_canonical_max60.smi file that is required for the training step (see next section).

For the generation stage the model files are necessary. It is possible to use the ones that are generated during the training step or you can download the ones that we have already generated by using the following script:

  sh getWeights.sh

In the modelweights folder there will be the three models:

decoder-210000.pkl
encoder-210000.pkl
vae-210000.pkl

Training

Note that training runs on a GPU and it will take several days to complete.

First construct a set of training molecules:

$ python prepare_data.py -i "./path/to/my/smiles.smi" -o "./path/to/my/smiles.npy"

Secondly, execute the training of a model:

$ python train.py -i "./path/to/my/smiles.npy" -o "./path/to/models"

Generation

Web based compund generation is available at https://playmolecule.org/LigDream/.

For an example of local novel compound generation please follow notebook generate.ipynb.

License

Code is released under GNU AFFERO GENERAL PUBLIC LICENSE.

Citing

If you are using content of the repository please consider citing the follow work:

@article{skalic2019shape,
  title={Shape-Based Generative Modeling for de-novo Drug Design},
  author={Skalic, Miha and Jim{\'e}nez Luna, Jos{\'e} and Sabbadin, Davide and De Fabritiis, Gianni},
  journal={Journal of chemical information and modeling},
  doi = {10.1021/acs.jcim.8b00706},
  publisher={ACS Publications}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
compound_generation.py		compound_generation.py
decoding.py		decoding.py
generate.ipynb		generate.ipynb
generators.py		generators.py
getDataset.sh		getDataset.sh
getWeights.sh		getWeights.sh
htmd2moleculekit.py		htmd2moleculekit.py
networks.py		networks.py
prepare_data.py		prepare_data.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LigDream: Shape-Based Compound Generation

Requirements

Before starting

Training

Generation

License

Citing

About

Releases

Packages

Languages

License

Bibyutatsu/ligdream

Folders and files

Latest commit

History

Repository files navigation

LigDream: Shape-Based Compound Generation

Requirements

Before starting

Training

Generation

License

Citing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages