Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accurate and efficient target prediction using a potency-sensitive influence-relevance voter #229

Open
swamidass opened this issue Jan 30, 2017 · 14 comments
Labels

Comments

@swamidass
Copy link
Contributor

swamidass commented Jan 30, 2017

https://jcheminf.springeropen.com/articles/10.1186/s13321-015-0110-6

Edit:
https://doi.org/10.1186/s13321-015-0110-6

Background
A number of algorithms have been proposed to predict the biological targets of diverse molecules. Some are structure-based, but the most common are ligand-based and use chemical fingerprints and the notion of chemical similarity. These methods tend to be computationally faster than others, making them particularly attractive tools as the amount of available data grows.
Results
Using a ChEMBL-derived database covering 490,760 molecule-protein interactions and 3236 protein targets, we conduct a large-scale assessment of the performance of several target-prediction algorithms at predicting drug-target activity. We assess algorithm performance using three validation procedures: standard tenfold cross-validation, tenfold cross-validation in a simulated screen that includes random inactive molecules, and validation on an external test set composed of molecules not present in our database.
Conclusions
We present two improvements over current practice. First, using a modified version of the influence-relevance voter (IRV), we show that using molecule potency data can improve target prediction. Second, we demonstrate that random inactive molecules added during training can boost the accuracy of several algorithms in realistic target-prediction experiments. Our potency-sensitive version of the IRV (PS-IRV) obtains the best results on large test sets in most of the experiments. Models and software are publicly accessible through the chemoinformatics portal at http://chemdb.ics.uci.edu/

@agitter agitter added the treat label Feb 8, 2017
@agitter
Copy link
Collaborator

agitter commented Feb 8, 2017

@swamidass I am reading through the papers you posted and have a couple quick questions. I also edited the original post.

Is the IrvPred web server and source code at http://chemdb.ics.uci.edu/cgibin/tools/IrvPredWeb.py the software referenced in this paper?

In this comparison with SVM and Random Forest, the features come from fingerprint similarity. This makes sense because it is most similar to the IRV approach. Have you directly compared standard classifiers (e.g. Random Forest) trained with fingerprint similarity features versus using the fingerprint bit vector directly as the features? I haven't yet surveyed everything you posted and am trying to prioritize my reading.

@swamidass
Copy link
Contributor Author

swamidass commented Feb 9, 2017

I believe that is the software, though this is to Baldi's site so I cannot 100% be sure.

In this study, we did not compare to RF or fingerprint bit vectors as a direct input to a neural network (I'll abbreviate this as FP -> NN).

  1. RF was excluded because the most commonly used RF software is not free to academics. Moreover, in this domain, RF usually does not outperform SVMs (they are about equivalent). So, if we can consistently outperform SVMs (which we can) we can fairly infer that we would also outperform RF.

  2. FP -> NN was not included because it usually performs poorly compared to SVMs and Tanimoto similarity on circular fingerprints. This is considered "common knowledge" in the field, so reviewers just do not ask for it. This also, makes all the deep learning papers that show improvement over FP -> NN without even trying tanimoto similarity (I'm not going to name names) unconvincing. This is really just a strawman method. At minimum, the real comparison should be to SVMs using Tanimoto similarity (or Min Max similarity) as the kernel function.

  3. There is some unsubstantiated belief in the field that Naive Bayes classifiers can work, so we do always include them. However, they usually work poorly.

In this study, I think there are a couple key things to key in on:

  1. IRV is a very low parameter approach (around 10 weights), because of extensive weight sharing, that works extremely well. At this point, I do not thing anything else has consistently outperformed it.

  2. It also enables injection of additional data into the model. The inclusion of potency data significantly improves results, and other methods have no way of including this method. This is one of the big advantages of Deep Learning approaches (even though this particular method is not super deep).

  3. The IRV is also interpretable in some key ways. This is particularly important to emphasize. With the right structure, the Deep Learning can be interpretable.

@agitter
Copy link
Collaborator

agitter commented Feb 9, 2017

Thanks. I did appreciate the advantages from reading this paper and #228. I'm still trying to figure out where it fits in #174 because I think our working defining of "deep learning" may be "high-parameter neural networks". I'll have to see how we've discussed low-parameter NNs in other sections, the options being that we consider them along with high-parameter NNs or treat them as a competing method.

Even if it is common knowledge in the field, I would still like to find a reference that directly compares (Tanimoto similarity on fingerprints -> SVM) or (Tanimoto similarity on fingerprints -> random forest) with (fingerprints -> random forest) or (fingerprints -> neural networks) on the same data. That will help readers from outside the cheminformatics field, which I expect to be most readers.

@swamidass
Copy link
Contributor Author

I think that is a legitimate concern. IRV are low parameter, so they are not quite in the standard group of Deep Learning methods. On that basis, it is possible we may want to exclude them from the review, or at least point out that they are not exactly the current pattern.

Deep Learning, however, is more than just "high-parameter". I think a better way to define Deep Learning is: "a collection of new techniques for building neural networks, including higher parameter models, recursive and convolution networks, improved architectures, and improved training strategies."

The IRV, by this definition, is a class of Deep Learning. Although it is not high parameter, it (1) uses more hidden layers than normal (there are three hidden layers, plus a kernel layer, between the input and output, (2) it uses extensive weight replication to reduce weights substantially. Of course, it will have limitations compared to the new methods. Honestly, I expect that they will eventually be outclassed by better versions of (for example) #53. We are just not there yet.

@swamidass
Copy link
Contributor Author

swamidass commented Feb 10, 2017

About references that show NN's on fingerprints directly don't work so well. That is a tall request.

There was just so much unpublished experience of people trying this approach (albeit with older regularization techniques) that it is offhand mentioned in cheminformatics all the times. Given the bias against publishing negative results, that will be a hard reference to find.

Now it is entirely possible that using more advanced regularization that it can work on par with SVMs, RFs and Tanimoto similarity. That has to be established, however, before FP->NN's are a convincing baseline method against which to benchmark improvement over state of the art methods. I think is really the key point. While it is hard to produce a reference that shows FP->NNs are poor, there is really no body literature that demonstrates that they reliably produce results comparable with RF, SVMs and Tanimoto similarity. This alone is enough to discourage use of FP->NNs as a baseline method of comparison.

@swamidass
Copy link
Contributor Author

swamidass commented Feb 13, 2017

@cgreene , as our circus master, can you please comment on the definition that you want to use for deep learning?

@agitter offers: "high-parameter neural networks"

I think this is more accurate: "a collection of new techniques for building neural networks, including higher parameter models, recursive and convolution networks, improved architectures, and improved training strategies."

I think this is important to clarify because high-parameter networks have been around for a long time. They just never worked well, so people avoided them. It is only with new DL techniques (e.g. dropout, resnet, relu, batch-normalization, etc.) that they started to work. This is a pretty fundamental cross-cutting issue to resolve. Can you please weigh in @cgreene ?

@swamidass
Copy link
Contributor Author

@agitter asks for some benchmark papers.

These are some important papers...

https://www.ncbi.nlm.nih.gov/pubmed/15961479
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3225883/

This competition on HIV data is pretty important and shows SVMs (from my team) outperforming everything else:

http://www.agnostic.inf.ethz.ch/index.php

Though we did follow up work that demonstrated the IRV really does better:

https://www.ncbi.nlm.nih.gov/pubmed/20378557

I think the competitors in that competition are helpful. You'll see every which algorithm there is there. And MinMax-kernel SVMs (or MinMax-sim IRVs) outperforms everything.

@agitter
Copy link
Collaborator

agitter commented Feb 13, 2017

@swamidass I'm interested in @cgreene's feedback on this as well, but I should say that my definition probably is in line with what you posed above. More thoughts soon.

@agitter
Copy link
Collaborator

agitter commented Feb 14, 2017

I still don't have time articulate my complete thoughts, but my questions about how to classify IRV may not matter much in the end. I plan to include it in this section and am trying to think about where it fits in the new narrative. I have an outline in mind and will write it up as soon as I can for feedback.

@cgreene
Copy link
Member

cgreene commented Feb 17, 2017

My thoughts are much in line with @swamidass. I just filed #243 to touch on improvements to the introduction to more clearly define what we mean by transform. Right near some of the sections that were touched are the current definitions that we've been using.

If you expand the text from that section of the PR, you can see on lines 48-58 of the revised version the definitions that we have been using. We been relatively permissive saying that multi-layer NNs used to construct features, at some stage, count. We also - for what it's worth - note that by this definition such models have existed for more than 50 years in the literature.

@swamidass : I'd be thrilled if you want to refine this via a PR on the intro to highlight a more restrictive perspective on deep learning. It will require us to start making harder calls as to what qualifies.

@cgreene
Copy link
Member

cgreene commented Feb 17, 2017

Side note: I just got back from a trip to UCI where I chatted with Pierre. I should have asked in person, but I'm just catching up on this after my return! (with regards to I believe that is the software, though this is to Baldi's site so I cannot 100% be sure.)

@swamidass
Copy link
Contributor Author

Hope you got to talk to him about this =). He is one of the early leaders in the field that not so many people know about. Any how, I can take a crack at the intro.

@cgreene
Copy link
Member

cgreene commented Feb 17, 2017

Yea - we chatted a lot about the science but not about this review (missed opportunity - doh!). Do you think you could get him onto github for this? It would be great to get his perspective + feedback!

@swamidass
Copy link
Contributor Author

swamidass commented Feb 17, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants