-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RTX prediction notebook #18
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Some minor comments.
y = baseline.covariate.df$mainclass, | ||
type.measure = "class", | ||
family = "multinomial", | ||
nfolds = nrow(baseline.exprs)) # LOOCV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth to tune lasso parameter to achieve the best performance using caret::train ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree that this would be my next move if I was going to work on this a bit more!
```{r} | ||
ggplot2::ggsave(file.path(plot.dir, "total_accuracy_CI.pdf"), | ||
plot = ggplot2::last_plot()) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RTX LVs have a prediction accuracy equal to 1... seems like overfitting ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, absolutely
|
||
```{r} | ||
summary(as.vector(rtx.baseline.b)) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the LVs values may affect prediction, do you think scale them to the same level will make the accuracy looks more similar ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible, but I think there is not much to be done with this sample size (especially given the overfitting as you point out above!)
Thanks for the comments @huqiwen0313. I agree with them and am glad they will be recorded here! I am going to merge this the way it is because I will not investigate this particular avenue further until I have a larger or more appropriate dataset. |
In this PR, I am adding a notebook that performs a supervised ML analysis using the RTX data. I attempt to predict response labels (
nonresponder
and two types of responders:tolerant
andnontolerant
) from the baseline samples expression data using multinomial LASSO. Unfortunately, there are only 36 baseline samples with response labels. With this sample size, I use leave-one-out cross-validation (LOOCV) and do not have a hold-out set. I am comparing three data entities if you will:My hypothesis was that the multiPLIER LVs (recount2) would outperform the expression data in this prediction task. The results in this notebook refute that hypothesis. A few notes & observations, although I have no intention of further exploring this avenue given the limitations:
Notebook html: 25-predict_response.nb.zip