Add RTX prediction notebook #18

jaclyn-taroni · 2018-06-21T13:21:21Z

In this PR, I am adding a notebook that performs a supervised ML analysis using the RTX data. I attempt to predict response labels (nonresponder and two types of responders: tolerant and nontolerant) from the baseline samples expression data using multinomial LASSO. Unfortunately, there are only 36 baseline samples with response labels. With this sample size, I use leave-one-out cross-validation (LOOCV) and do not have a hold-out set. I am comparing three data entities if you will:

Expression data -- gene-level measurements that have been filtered to only include genes that are in the recount2 model
recount2 LVs -- this RTX data in the recount2 PLIER model latent space
RTX PLIER LVs -- dataset-specific model

My hypothesis was that the multiPLIER LVs (recount2) would outperform the expression data in this prediction task. The results in this notebook refute that hypothesis. A few notes & observations, although I have no intention of further exploring this avenue given the limitations:

The range of the recount2 LVs is smaller / is "flatter" than the other entities (can be seen at the bottom of the notebook). It's possible that scaling the features would provide a better comparison between the entities. In general, there's probably room for improvement in training/tuning, etc.
My original hypothesis wrt multiPLIER LVs is that this approach would generalize better than models trained on gene-level expression. I can't test that with this dataset + sample size. So I note this here as a future direction.

Notebook html: 25-predict_response.nb.zip

huqiwen0313

Looks good to me. Some minor comments.

huqiwen0313 · 2018-06-24T20:09:27Z

25-predict_response.Rmd

+                                   y = baseline.covariate.df$mainclass,
+                                   type.measure = "class",
+                                   family = "multinomial",
+                                   nfolds = nrow(baseline.exprs))  # LOOCV


Is it worth to tune lasso parameter to achieve the best performance using caret::train ?

Totally agree that this would be my next move if I was going to work on this a bit more!

huqiwen0313 · 2018-06-24T20:17:54Z

25-predict_response.Rmd

+```{r}
+ggplot2::ggsave(file.path(plot.dir, "total_accuracy_CI.pdf"),
+                plot = ggplot2::last_plot())
+```


RTX LVs have a prediction accuracy equal to 1... seems like overfitting ?

Yes, absolutely

huqiwen0313 · 2018-06-24T20:22:36Z

25-predict_response.Rmd

+
+```{r}
+summary(as.vector(rtx.baseline.b))
+```


I agree, the LVs values may affect prediction, do you think scale them to the same level will make the accuracy looks more similar ?

It's possible, but I think there is not much to be done with this sample size (especially given the overfitting as you point out above!)

jaclyn-taroni · 2018-06-24T21:40:01Z

Thanks for the comments @huqiwen0313. I agree with them and am glad they will be recorded here! I am going to merge this the way it is because I will not investigate this particular avenue further until I have a larger or more appropriate dataset.

jaclyn-taroni added 4 commits June 19, 2018 11:35

Ignore private data

29b8da5

Add exploratory data analyses for RTX

4f256b3

Add notebook for predicting response from baseline samples (RTX)

a9b4d7f

Merge branch 'master' into rtx-ml

2452ca6

jaclyn-taroni changed the title ~~[WIP] Add RTX prediction notebook~~ Add RTX prediction notebook Jun 22, 2018

jaclyn-taroni requested a review from huqiwen0313 June 22, 2018 18:41

huqiwen0313 approved these changes Jun 24, 2018

View reviewed changes

jaclyn-taroni merged commit 2e5cd97 into greenelab:master Jun 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RTX prediction notebook #18

Add RTX prediction notebook #18

jaclyn-taroni commented Jun 21, 2018 •

edited

Loading

huqiwen0313 left a comment

huqiwen0313 Jun 24, 2018

jaclyn-taroni Jun 24, 2018

huqiwen0313 Jun 24, 2018

jaclyn-taroni Jun 24, 2018

huqiwen0313 Jun 24, 2018

jaclyn-taroni Jun 24, 2018

jaclyn-taroni commented Jun 24, 2018

Add RTX prediction notebook #18

Add RTX prediction notebook #18

Conversation

jaclyn-taroni commented Jun 21, 2018 • edited Loading

huqiwen0313 left a comment

Choose a reason for hiding this comment

huqiwen0313 Jun 24, 2018

Choose a reason for hiding this comment

jaclyn-taroni Jun 24, 2018

Choose a reason for hiding this comment

huqiwen0313 Jun 24, 2018

Choose a reason for hiding this comment

jaclyn-taroni Jun 24, 2018

Choose a reason for hiding this comment

huqiwen0313 Jun 24, 2018

Choose a reason for hiding this comment

jaclyn-taroni Jun 24, 2018

Choose a reason for hiding this comment

jaclyn-taroni commented Jun 24, 2018

jaclyn-taroni commented Jun 21, 2018 •

edited

Loading