-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine the optimal cutpoint for continuous variables #41
Comments
Well formated issue! I have to assume that during my cooperation with biotechnologist I have met the idea of examining expression of one gene in HIGH and LOW groups :) That's so true. I didn't know about maxstat method, so I always used median, as a The plot of maxstat looks very simple, but is very useful. Maybe we could somehow extend ggsurvplot to provide cutoffs for user while he/she only specifies names of the dependent continuous variables? PS. I remember about vignette I was about to post here :P but just got invited on Bioc2016 and I am planning my journey and VISa. |
|
no emrgency for the vignette:-)! |
data(myeloma)
head(myeloma) molecular_group chr1q21_status treatment event time CCND1 CRIM1 DEPDC1 IRF4 TP53 WHSC1 GSM50986 Cyclin D-1 3 copies TT2 0 69.24 9908.4 420.9 523.5 16156.5 10.0 261.9 GSM50988 Cyclin D-2 2 copies TT2 0 66.43 16698.8 52.0 21.1 16946.2 1056.9 363.8 GSM50989 MMSET 2 copies TT2 0 66.50 294.5 617.9 192.9 8903.9 1762.8 10042.9 GSM50990 MMSET 3 copies TT2 1 42.67 241.9 11.9 184.7 11894.7 946.8 4931.0 GSM50991 MAF TT2 0 65.00 472.6 38.8 212.0 7563.1 361.4 165.0 GSM50992 Hyperdiploid 2 copies TT2 0 65.20 664.1 16.9 341.6 16023.4 2096.3 569.2
# 1. Determine the optimal cutpoint of variables
res.cut <- surv_cutpoint(myeloma, time = "time", event = "event",
variables = c("DEPDC1", "WHSC1", "CRIM1"))
summary(res.cut) cutpoint statistic DEPDC1 279.8 4.275452 WHSC1 3205.6 3.361330 CRIM1 82.3 1.968317
# 2. Plot cutpoint for DEPDC1
# palette = "npg" (nature publishing group), see ?ggpubr::ggpar
plot(res.cut, "DEPDC1", palette = "npg")
# 3. Categorize variables
res.cat <- surv_categorize(res.cut)
head(res.cat) time event DEPDC1 WHSC1 CRIM1 GSM50986 69.24 0 high low high GSM50988 66.43 0 low low low GSM50989 66.50 0 low high high GSM50990 42.67 1 low high low GSM50991 65.00 0 low low low GSM50992 65.20 0 high low low
# 3. Fit survival curves and visualize
library("survival")
fit <- survfit(Surv(time, event) ~DEPDC1, data = res.cat)
ggsurvplot(fit, risk.table = TRUE, conf.int = TRUE) |
Wonderful and very helpful :)! good job 2016-10-22 20:20 GMT+02:00 Alboukadel KASSAMBARA notifications@github.com:
|
Thanks:-)! |
@kassambara Great work! Just wondering if this applies to the case when I try to find multiple cutpoints in one group, ie. 2 points to break into low/medium/high? |
Hi @MarcinKosinski ,
As you know, In the field of cancer genomics, biologists are sometimes interested to know whether the expression profile of their favorite gene, say FGFR3, is associated with patients' prognostic. To answer to this question, one can use cox univariate analysis. They want also sometimes to split patients in FGFR3high and FGFR3low. One can arbitrary split patients into 2-3-4 groups using quartiles (q1, q2, q3).
An interesting alternative is provided by maxstat(maximally selected rank statistics) which has several advantages. First, there is no need to transform the time-dependent end point. Second, the test calculates an exact cut-off point, which can be estimated using several methods and approximations, and the discrimination power is also evaluated and estimated with a P value (type I error) 1.
See some examples in this pdf (maxstat is used): http://genomicscape.com/temp/survival_TIE3ByWMo9_plot.pdf
What do you think about adding elegant maxtat visualization support in survminer?
(I have R base scripts for that)
The philosophy of the function would be as follow:
Articles using maxstat:
The text was updated successfully, but these errors were encountered: