Skip to content

Commit

Permalink
rehash/update my previous commit - single lines and other fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
delton137 committed Apr 20, 2020
1 parent cf85f39 commit ebb27b1
Show file tree
Hide file tree
Showing 3 changed files with 60 additions and 19 deletions.
34 changes: 24 additions & 10 deletions content/05.treat.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,22 +182,36 @@ However, in the long term, atomic convolutions may ultimately overtake grid-base

*De novo* drug design attempts to model the typical design-synthesize-test cycle of drug discovery in-silico [@doi:10.1002/wcms.49; @doi:10.1021/acs.jmedchem.5b01849].
It explores an estimated 10<sup>60</sup> synthesizable organic molecules with drug-like properties without explicit enumeration [@doi:10.1002/wcms.1104].
To test or score structures, physics-based simulation could be used, or machine learning models based on techniques discussed may be used, as they are much more computationally efficient.
To score molecules after generation or during optimization, physics-based simulation could be used [@tag:Sumita2018], but machine learning models based on techniques discussed earlier may be preferable [@tag:Gomezb2016_automatic], as they are much more computationally expedient. Computationally efficiency is particularly important during optimization as the "scoring function" may need to be called thousands of times.

To "design" and "synthesize", traditional *de novo* design software relied on classical optimizers such as genetic algorithms.
These approaches can lead to overfit, "weird" molecules, which are difficult to synthesize in the lab.
A popular approach which may help ensure synthesizability is to use rule-based virtual chemical reactions to generate molecular structures [@doi:10.1021/acs.jmedchem.5b01849].
Deep learning models that generate realistic, synthesizable molecules have been proposed as an alternative.
In contrast to the classical, symbolic approaches, generative models learned from data would not depend on laboriously encoded expert knowledge.

In the past few years a large number of techniques for the generative modeling and optimization of molecules with deep learning have been explored, including recursive neural networks, variational autoencoders, generative adversarial networks, and reinforcement learning -- for a review see Elton, et al.[@tag:Elton_molecular_design_review]
In the past few years a large number of techniques for the generative modeling and optimization of molecules with deep learning have been explored, including recursive neural networks, variational autoencoders, generative adversarial networks, and reinforcement learning -- for a review see Elton, et al.[@tag:Elton_molecular_design_review] or Vamathevan et al.[@tagVamathevan2019].

Building off the large amount of work that has already gone into text generation,[@arxiv:1308.0850] many generative neural networks for drug design represent chemicals with the simplified molecular-input line-entry system (SMILES), a standard string-based representation with characters that represent atoms, bonds, and rings [@tag:Segler2017_drug_design].

The first successful demonstration of a deep learning based approach for molecular optimization occured in 2016 with the development of a SMILES-to-SMILES autoencoder capable of learning a continuous latent feature space for molecules[@tag:Gomezb2016_automatic].
In this learned continuous space it is possible to interpolate between molecular structures in a manner that is not possible with discrete
(e.g. bit vector or string) features or in symbolic, molecular graph space. Even more interesting is that one can perform gradient-based or Bayesian optimization of molecules within this latent space. The strategy of constructing simple, continuous features before applying supervised learning techniques is reminiscent of autoencoders trained on high-dimensional EHR data [@tag:BeaulieuJones2016_ehr_encode].
The first successful demonstration of a deep learning based approach for molecular optimization occurred in 2016 with the development of a SMILES-to-SMILES autoencoder capable of learning a continuous latent feature space for molecules[@tag:Gomezb2016_automatic].
In this learned continuous space it is possible to interpolate between molecular structures in a manner that is not possible with discrete (e.g. bit vector or string) features or in symbolic, molecular graph space.
Even more interesting is that one can perform gradient-based or Bayesian optimization of molecules within this latent space.
The strategy of constructing simple, continuous features before applying supervised learning techniques is reminiscent of autoencoders trained on high-dimensional EHR data [@tag:BeaulieuJones2016_ehr_encode].
A drawback of the SMILES-to-SMILES autoencoder is that not all SMILES strings produced by the autoencoder's decoder correspond to valid chemical structures.
The Grammar Variational Autoencoder, which takes the SMILES grammar into account and is guaranteed to produce syntactically valid SMILES, helps alleviate this issue to some extent [@arxiv:1703.01925].

Another approach to *de novo* design is to train character-based RNNs on large collections of molecules, for example, ChEMBL [@doi:10.1093/nar/gkr777], to first obtain a generic generative model for drug-like compounds [@tag:Segler2017_drug_design].
These generative models successfully learn the grammar of compound representations, with 94% [@tag:Olivecrona2017_drug_design] or nearly 98% [@tag:Segler2017_drug_design] of generated SMILES corresponding to valid molecular structures. The initial RNN is then fine-tuned to generate molecules that are likely to be active against a specific target by either continuing training on a small set of positive examples [@tag:Segler2017_drug_design] or adopting reinforcement learning strategies [@tag:Olivecrona2017_drug_design; @arxiv:1611.02796]. Both the fine-tuning and reinforcement learning approaches can rediscover known, held-out active molecules.

Reinforcement learning approaches where operations are performed directly on the molecular graph bypass the need to learn the details of SMILES syntax, allowing the model to focus purely on chemistry. Additionally, they seem to require less training data and generate more valid molecules since they are constrained by design only to graph operations which satisfy chemical valiance rules.[@tag:Elton_molecular_design_review] A reinforcement learning agent developed by Zhou et al. demonstrated superior molecular optimization performance on certain easy to compute metrics when compared with other deep learning based approaches such as the Junction Tree VAE, Objective Reinforced Generative Adversarial Network, and Graph Convolutional Policy Network.[@doi:10.1038/s41598-019-47148-x] As another example, Zhavoronkov et al. used generative tensorial reinforcement learning to discover potent inhibitors of discoidin domain receptor 1 (DDR1).[@tag:Zhavoronkov2019_drugs] Their work is unique in that six lead candidates discovered using their approach were synthesized and tested in the lab, with 4/6 achieving some degree of binding to DDR1.[@tag:Zhavoronkov2019_drugs]

It is worth pointing out that it has been shown that classical genetic algorithms can compete with many of the most advanced deep learning methods for molecular optimization.[@doi:10.1246/cl.180665; @doi:10.1039/C8SC05372C] Such genetic algorithms use hard coded rules based possible chemical reactions to generate molecular structures [@doi:10.1021/acs.jmedchem.5b01849]. Still, there are many avenues for improving current deep learning systems and the future of the field looks bright.
These generative models successfully learn the grammar of compound representations, with 94% [@tag:Olivecrona2017_drug_design] or nearly 98% [@tag:Segler2017_drug_design] of generated SMILES corresponding to valid molecular structures.
The initial RNN is then fine-tuned to generate molecules that are likely to be active against a specific target by either continuing training on a small set of positive examples [@tag:Segler2017_drug_design] or adopting reinforcement learning strategies [@tag:Olivecrona2017_drug_design; @arxiv:1611.02796].
Both the fine-tuning and reinforcement learning approaches can rediscover known, held-out active molecules.

Reinforcement learning approaches where operations are performed directly on the molecular graph bypass the need to learn the details of SMILES syntax, allowing the model to focus purely on chemistry.
Additionally, they seem to require less training data and generate more valid molecules since they are constrained by design only to graph operations which satisfy chemical valiance rules.[@tag:Elton_molecular_design_review]
A reinforcement learning agent developed by Zhou et al. demonstrated superior molecular optimization performance on certain easy to compute metrics when compared with other deep learning based approaches such as the Junction Tree VAE, Objective Reinforced Generative Adversarial Network, and Graph Convolutional Policy Network [@doi:10.1038/s41598-019-47148-x].
As another example, Zhavoronkov et al. used generative tensorial reinforcement learning to discover potent inhibitors of discoidin domain receptor 1 (DDR1) [@tag:Zhavoronkov2019_drugs].
Their work is unique in that six lead candidates discovered using their approach were synthesized and tested in the lab, with 4/6 achieving some degree of binding to DDR1 [@tag:Zhavoronkov2019_drugs].

In concluding this section, it is worth pointing out that it has been shown that classical genetic algorithms can compete with some of the most advanced deep learning methods for molecular optimization [@doi:10.1246/cl.180665; @doi:10.1039/C8SC05372C].
Such genetic algorithms use hard coded rules based possible chemical reactions to generate molecular structures [@doi:10.1021/acs.jmedchem.5b01849].
Still, there are many avenues for improving current deep learning systems and the future of the field looks bright.
Loading

0 comments on commit ebb27b1

Please sign in to comment.