post-hoc analysis for Logrank-test: Multiple comparisons of survival curves #97

kassambara · 2016-12-15T15:31:47Z

In this paper - the molecular classification of multiple myeloma, the authors classify multiple myeloma patients into 7 groups based on gene expression profiling. The survival curves of these 7 groups are shown in the figure here-after (Panel A). In panel B, groups with similar survival curves have ben merged into high and low risk groups.

In a such situation, one would like to perform a pairewise comparison between groups in order to decide the groups to be merged. Groups might be merged, if the long-rank test fails to be significant.

The function survdiff()[in survival package] returns a global p-value whether to reject or not the null hypothesis. It returns a pooled p-value even when we have a strata including multiple groups (more than 2). With this, you know that a difference exists between groups, but you don't know where. You can't know until you test each combination.

The survival package does not provide any function to perform pairwise comparison between groups.

An implementation of an easy-to-use function - pairewise_survdiff() or pairewise_logrank() - might be useful for multiple log-rank tests between pairs of groups . This function should return an adjusted p-value for multiple testing correction.

The text was updated successfully, but these errors were encountered:

MarcinKosinski · 2017-01-08T11:12:42Z

What if we have

p-value for post hoc for: group A vs group B - result insignificant
p-value for post hoc for: group B vs group C - result insignificant
p-value for post hoc for: group A vs group C - result significant

which groups will you merge then :) ?

kassambara · 2017-01-08T11:56:14Z

A very good question!!

I don't really have a definitive answer. In a such difficult case, an additional external information about the groups might be required in order to help the analyst to decide which groups to merge.

I think that the primary goal of the pairwise comparison should be only descriptive, investigating where the difference is, when the global test is significant.

If the analyst wants to merge some groups, then the adjusted p-values might help in some cases:

For example,

if:
A vs B, adj.pval = 0.1
B vs C, adj.pval = 0.8
A vs C, adj.pval <= 0.05

Then:
it's tempting to merge B & C, leading to a classification with 2 groups (Low vs High risk).

MarcinKosinski · 2017-01-08T12:01:53Z

Sounds reasonable :)

kassambara · 2017-01-28T11:58:35Z

New function pairwise_survdiff() added.

Survival curves with global p-value

library(survival)
library(survminer)
# Survival curves with global p-value
data(myeloma)
fit <- survfit(Surv(time, event) ~ molecular_group, data = myeloma)
ggsurvplot(fit, legend.labs = levels(myeloma$molecular_group),
           pval = TRUE)

Pairwise survdiff

# Pairwise survdiff
res <- pairwise_survdiff(Surv(time, event) ~ molecular_group,
     data = myeloma)
res

	Pairwise comparisons using Log-Rank test 

data:  myeloma and molecular_group 

                 Cyclin D-1 Cyclin D-2 Hyperdiploid Low bone disease MAF   MMSET
Cyclin D-2       0.723      -          -            -                -     -    
Hyperdiploid     0.328      0.103      -            -                -     -    
Low bone disease 0.644      0.447      0.723        -                -     -    
MAF              0.943      0.723      0.103        0.523            -     -    
MMSET            0.103      0.038      0.527        0.485            0.038 -    
Proliferation    0.723      0.988      0.103        0.485            0.644 0.062

P value adjustment method: BH

Symbolic number coding

# Symbolic number coding
symnum(res$p.value, cutpoints = c(0, 0.0001, 0.001, 0.01, 0.05, 0.1, 1),
   symbols = c("****", "***", "**", "*", "+", " "),
   abbr.colnames = FALSE, na = "")

                 Cyclin D-1 Cyclin D-2 Hyperdiploid Low bone disease MAF MMSET
Cyclin D-2                                                                    
Hyperdiploid                                                                  
Low bone disease                                                              
MAF                                                                           
MMSET                       *                                        *        
Proliferation                                                            +    
attr(,"legend")
[1] 0 ‘****’ 1e-04 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1 \t    ## NA: ‘’

MarcinKosinski · 2017-01-28T15:01:52Z

Outstanding!

I am working locally on extending:
#17
but can't work on this today and probably will smash that tomorrow

kassambara · 2017-01-28T15:07:27Z

Ok, that would be great!

Have a nice weekend:-)!

kassambara · 2017-01-28T15:08:20Z

I think that we can close this issue

moripe · 2017-06-05T15:30:49Z

Hi! First I´m sorry for my bad English
I´m using the function pairwise_survdiff because I want to compare survival curves.
When I have installed the package from repository Cran R project and from Github too, in both cases I get the following error when I load the library in RStudio:

Error in get(method, envir = home) :
lazy-load database 'C:/Users/drimay_Monica/Documents/R/win-library/3.3/survminer/R/survminer.rdb' is corrupt
In addition: Warning messages:
1: In .registerS3method(fin[i, 1], fin[i, 2], fin[i, 3], fin[i, 4], :
restarting interrupted promise evaluation
2: In get(method, envir = home) :
restarting interrupted promise evaluation
3: In get(method, envir = home) : internal error -3 in R_decompress1
Error: package or namespace load failed for ‘survminer’

Although I got that mistake, I run the function and I get the following error:

Error in paste(deparse(substitute(data)), "and", .collapse(group_var, :
could not find function ".collapse"

Could you help me please?

Thanks!

kassambara · 2017-06-05T15:45:30Z

Hi,

This error is not reproducible on my computer.

Suggestion:

Re-start your R session
Re-install the developmental version of survminer from github
Load survminer and Try pairwise_survdiff() again.

If it still fails, then send to me the output of devtools::session_info().

Thanks

moripe · 2017-06-05T15:54:45Z

I re-start my R session and Re-install the developmental version of survminer from github but I get the same error

the output of devtools::session_info().

devtools::session_info()
Session info ----------------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.2 (2016-10-31)
system x86_64, mingw32
ui RStudio (1.0.136)
language (EN)
collate Spanish_Spain.1252
tz Europe/Paris
date 2017-06-05

Packages --------------------------------------------------------------------------------------------------------------------
package * version date source
assertthat 0.1 2013-12-06 CRAN (R 3.3.2)
base * 3.3.2 2016-10-31 local
broom 0.4.2 2017-02-13 CRAN (R 3.3.3)
cmprsk 2.2-7 2014-06-17 CRAN (R 3.3.3)
colorspace 1.3-2 2016-12-14 CRAN (R 3.3.2)
curl 2.3 2016-11-24 CRAN (R 3.3.2)
data.table 1.10.0 2016-12-03 CRAN (R 3.3.2)
datasets * 3.3.2 2016-10-31 local
DBI 0.5-1 2016-09-10 CRAN (R 3.3.2)
devtools * 1.13.1 2017-05-13 CRAN (R 3.3.3)
digest 0.6.10 2016-08-02 CRAN (R 3.3.2)
dplyr 0.5.0 2016-06-24 CRAN (R 3.3.2)
foreign 0.8-67 2016-09-13 CRAN (R 3.3.2)
ggplot2 * 2.2.1 2016-12-30 CRAN (R 3.3.3)
ggpubr * 0.1.3 2017-06-05 Github (kassambara/ggpubr@62ae7d2)
git2r 0.18.0 2017-01-01 CRAN (R 3.3.3)
graphics * 3.3.2 2016-10-31 local
grDevices * 3.3.2 2016-10-31 local
grid 3.3.2 2016-10-31 local
gridExtra 2.2.1 2016-02-29 CRAN (R 3.3.2)
gtable 0.2.0 2016-02-26 CRAN (R 3.3.2)
httr 1.2.1 2016-07-03 CRAN (R 3.3.3)
km.ci 0.5-2 2009-08-30 CRAN (R 3.3.3)
KMsurv 0.1-5 2012-12-03 CRAN (R 3.3.2)
knitr 1.15.1 2016-11-22 CRAN (R 3.3.2)
lattice 0.20-34 2016-09-06 CRAN (R 3.3.2)
lazyeval 0.2.0 2016-06-12 CRAN (R 3.3.2)
magrittr * 1.5 2014-11-22 CRAN (R 3.3.2)
Matrix 1.2-7.1 2016-09-01 CRAN (R 3.3.2)
memoise 1.0.0 2016-01-29 CRAN (R 3.3.2)
methods * 3.3.2 2016-10-31 local
mnormt 1.5-5 2016-10-15 CRAN (R 3.3.2)
munsell 0.4.3 2016-02-13 CRAN (R 3.3.2)
nlme 3.1-128 2016-05-10 CRAN (R 3.3.2)
parallel 3.3.2 2016-10-31 local
plyr 1.8.4 2016-06-08 CRAN (R 3.3.2)
psych 1.6.12 2017-01-08 CRAN (R 3.3.2)
purrr 0.2.2.2 2017-05-11 CRAN (R 3.3.3)
R6 2.2.0 2016-10-05 CRAN (R 3.3.2)
Rcpp 0.12.8 2016-11-17 CRAN (R 3.3.2)
reshape2 1.4.2 2016-10-22 CRAN (R 3.3.2)
rlang 0.1.1 2017-05-18 CRAN (R 3.3.3)
scales 0.4.1 2016-11-09 CRAN (R 3.3.2)
splines 3.3.2 2016-10-31 local
stats * 3.3.2 2016-10-31 local
stringi 1.1.2 2016-10-01 CRAN (R 3.3.2)
stringr 1.2.0 2017-02-18 CRAN (R 3.3.3)
survival 2.40-1 2016-10-30 CRAN (R 3.3.2)
survminer * 0.3.1.999 2017-06-05 Github (8511dee)
survMisc 0.5.4 2016-11-23 CRAN (R 3.3.3)
tibble 1.3.3 2017-05-28 CRAN (R 3.3.3)
tidyr 0.6.3 2017-05-15 CRAN (R 3.3.3)
tools 3.3.2 2016-10-31 local
utils * 3.3.2 2016-10-31 local
withr 1.0.2 2016-06-20 CRAN (R 3.3.3)
xtable 1.8-2 2016-02-05 CRAN (R 3.3.2)
zoo 1.7-14 2016-12-16 CRAN (R 3.3.2)

Thanks!

kassambara · 2017-06-05T16:42:49Z

This is strange!

This kind of issues has been reported many times on stackoverflow (see this, this or this forum) and in all the cases, re-starting R fixes the issue.

Have you try to remove the old survminer before re-installation?

If the answer is no, then try this:

Re-start R
Remove the old survminer packages with remove.packages("survminer").
Re-install again

And, please let-me know if it works.

moripe · 2017-06-06T07:10:45Z

Hi!
Everything you say, I did it yesterday. Today I have done again but I get the same error.
I have tried on another pc but the same error still happens.
thank you anyway!!!!

kassambara · 2017-06-06T08:27:27Z

I have no idea what might cause this issue.

moripe · 2017-06-06T08:35:16Z

I think the problem is the function ".collapse", which you use in:
DNAME <- paste(deparse(substitute(data)), "and", collapse(group_var, sep = " + " ))
The issue is: Error in paste(deparse(substitute(data)), "and", .collapse(group_var, :
could not find function ".collapse"
I´m finding about .collapse but I don´t find anything.
I need to use the function pairwise_survdiff so I have to compare the survival curves and I don´t know other function that I can use for it :(

kassambara · 2017-06-06T08:40:49Z

As the error you obtained is not reproducible on my computer,

please, let me know if the following code works on your system?

library("survminer")
library(survival)
data(mgus)
res.cox <- coxph(Surv(futime, death) ~ mspike + log(mspike) + I(mspike^2) +
    age + I(log(age)^2) + I(sqrt(age)), data = mgus)
ggcoxfunctional(res.cox,  data = mgus, point.col = "blue", point.alpha = 0.5)

moripe · 2017-06-06T08:42:22Z

Yes, the code has worked perfectly

kassambara · 2017-06-06T08:52:43Z

.collapse() function pasted in the pairwise_survdiff.R file

Please, re-install survminer from github and let me know, if pairwise_survdiff() works. Thanks

moripe · 2017-06-06T09:20:30Z

I re-install survminer from github and I get the same issue with my dataset and with dataset "mymeloma" (http://www.sthda.com/english/wiki/survminer-0-3-0). I've tried using with the two dataset the function pairwise_survdiff() so maybe it's my data problem.

I run the function .collapse (.collapse pasted in pairwise_survdiff.R #97) and I get other issue
Error in survdiff.fit(y, groups, strata.keep, rho) :
There is only 1 group
so I use the dataset "mymeloma" and yes that works!!!!!!!

There is definitely a problem in my data!!!!! I don´t understand because the error say there is only 1 group if I have 7 factor different but I will found about it
Thanks a lot of and sorry for the inconvenience.

moripe · 2017-06-06T11:31:08Z

I finally got it!!!!! I was able to use the function pairwise_survdiff()

The ultim problem I have solved it. I tell you where my mistake was. In my initial dataset, my predictor variable has 9 levels but I´m working with 7 levels of a subdataset because the other two levels haven´t got values. So I have defined the 7 levels that have value and pairwise_survdiff() works!!!!!
I´m very vey happy
Thanks again, I wanted you to know that I already got it :)

I hope you understand what I've explained

kassambara · 2017-06-06T11:46:29Z

I'm happy that it works. Thank you for your feedback, I appreciate it and it will help me to improve pairwise_survdiff() to handle such situation. This means that I should include the function droplevels() to drop unused levels.

kassambara · 2017-06-06T11:57:27Z

droplevels() added now, thanks

ccapizzano · 2017-06-06T18:43:29Z

I first wanted to say you prepared and delivered a fantastic post on the use of log-rank tests to identify differences in survivorship over time between groups.

My quick question builds off of @MarcinKosinski in regards to merging data to provide more generalized groups and thus interpretations of the data. I agree it is logical to merge factor levels that are not considered to be statistically different. However, how does that change when you have three or more factor levels. The Zhan et al. (2006) article you refer to, for instance, appears to have visually analyzed the data with the KM estimator of survival and created two risk groups. As such, they could theoretically divide levels into whatever groups they deem necessary, such as placing MF with the low risk group.

Any critiques or suggestions to merging three or more factor levels like in the Zhan article?

Thank you

raph06 · 2017-10-31T15:29:15Z

Hello and thank you for this interesting thread,

I find it very nice to take interest in subgroups, I tried the pairewise_survdiff() function for one of my dataset.

I do not manage to understand how my global pvalue can be <0.05 and none of the pairewise_survdiff() pvalues are below this same threshold.

Also I've been struggling to code the pairewise_survdiff() into a loop to iterate on every covariates.
It seems that the "data" argument prevents me from using the syntax data$covariates or data[,n].
I get the following:

Error in [.data.frame(data, , group_var, drop = FALSE) :
undefined columns selected

Thank you for your answers.

Raphaël

kassambara · 2017-10-31T21:02:04Z

Hi,

You're writing in closed issue.

Please, kindly open a new issue and provide a reproducible example with a demo data set so that we can work on it, thanks

audyavar · 2018-11-15T04:41:28Z

I am trying to run pairwise_survdiff() in a loop where I use
res <- pairwise_survdiff(Surv(x$"PFI.Time"/365,x$"PFI") ~ x[,k],data=x)
It says "Error in [.data.frame(data, , group_var, drop = FALSE) :
undefined columns selected". I tried it multiple ways with get(k) as well, but still same error.

MarcinKosinski · 2018-11-15T18:31:49Z

In my opinion is cased because there is x[, k] in the formula specification. Its not the way you use R formulas. Error isn't survminer related but the formula specification related. One need to paste names as strings and convert them to formula with as.formula to perform something like that in a loop over variables. Cheers czw., 15 lis 2018, 05:41: Akshata Udyavar <notifications@github.com> napisał(a):

…

I am trying to run pairwise_survdiff() in a loop where I use res <- pairwise_survdiff(Surv(x$"PFI.Time"/365,x$"PFI") ~ x[,k],data=x) It says "Error in [.data.frame(data, , group_var, drop = FALSE) : undefined columns selected". I tried it multiple ways with get(k) as well, but still same error. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#97 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGdazp8THSQhjbnhekzJW0h5emOLJO6hks5uvPB5gaJpZM4LOOUt> .

rizmalik · 2018-12-29T04:47:41Z

Hello
Could anyone tell me whether it is possible to compare survival curves (proportion surviving) at a given time point (eg 1 year, 2 years) using pairwise_survfiff() ?
Thank you

SoumyaRo · 2019-01-22T20:57:48Z

@kassambara is there a way to create adjusted survival curves using ggadjustedcurve command for a variable which has more than two levels from cox proportional hazard analysis.

SoumyaRo · 2019-01-22T20:58:51Z

I am trying to create that for a variable which has five levels and it is giving me some weird error message. @kassambara

MarcinKosinski · 2019-01-22T21:21:55Z

@SoumyaRo can you start separate issue? wt., 22 sty 2019, 21:58: SoumyaRo <notifications@github.com> napisał(a):

…

I am trying to create that for a variable which has five levels and it is giving me some weird error message. @kassambara <https://github.com/kassambara> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#97 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGdaznh-nu1lL6GqukjZr3x3fM96Yt27ks5vF3uLgaJpZM4LOOUt> .

calzzone · 2019-03-01T19:24:45Z

Hi! Do you know of a way to perform pairwise comparisons on interval-censored curves? So far I used the Bonferroni method (as suggested by Graphpad people) which has been criticized as too conservative.

teixeiradatt · 2020-12-02T02:00:58Z

@kassambara
Thank you so much for your script! It help a lot!
Please, how can I add the result of p-value pairwise in the ggsurvplot?

kassambara added the enhancement label Dec 15, 2016

kassambara added a commit that referenced this issue Jan 28, 2017

New function pairwise_survdiff added #97

3b095fe

kassambara added a commit that referenced this issue Jan 28, 2017

pkdown updated #97

acf54c4

kassambara closed this as completed Jan 28, 2017

kassambara added a commit that referenced this issue Jun 6, 2017

.collapse pasted in pairwise_survdiff.R #97

9b99fdb

kassambara added a commit that referenced this issue Jun 6, 2017

droplevels added #97

f82ff99

raph06 mentioned this issue Nov 2, 2017

Pvalue and syntax issue in the pairewise_survdiff function #260

Open

qalid7 mentioned this issue Jan 17, 2018

Merging survival curves on a KM fit model #282

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

post-hoc analysis for Logrank-test: Multiple comparisons of survival curves #97

post-hoc analysis for Logrank-test: Multiple comparisons of survival curves #97

kassambara commented Dec 15, 2016

MarcinKosinski commented Jan 8, 2017

kassambara commented Jan 8, 2017

MarcinKosinski commented Jan 8, 2017

kassambara commented Jan 28, 2017

MarcinKosinski commented Jan 28, 2017

kassambara commented Jan 28, 2017

kassambara commented Jan 28, 2017

moripe commented Jun 5, 2017 •

edited

Loading

kassambara commented Jun 5, 2017

moripe commented Jun 5, 2017

kassambara commented Jun 5, 2017

moripe commented Jun 6, 2017

kassambara commented Jun 6, 2017

moripe commented Jun 6, 2017

kassambara commented Jun 6, 2017

moripe commented Jun 6, 2017

kassambara commented Jun 6, 2017

moripe commented Jun 6, 2017 •

edited

Loading

moripe commented Jun 6, 2017 •

edited

Loading

kassambara commented Jun 6, 2017

kassambara commented Jun 6, 2017

ccapizzano commented Jun 6, 2017 •

edited

Loading

raph06 commented Oct 31, 2017

kassambara commented Oct 31, 2017

audyavar commented Nov 15, 2018

MarcinKosinski commented Nov 15, 2018 via email

rizmalik commented Dec 29, 2018

SoumyaRo commented Jan 22, 2019

SoumyaRo commented Jan 22, 2019

MarcinKosinski commented Jan 22, 2019 via email

calzzone commented Mar 1, 2019

teixeiradatt commented Dec 2, 2020

post-hoc analysis for Logrank-test: Multiple comparisons of survival curves #97

post-hoc analysis for Logrank-test: Multiple comparisons of survival curves #97

Comments

kassambara commented Dec 15, 2016

MarcinKosinski commented Jan 8, 2017

kassambara commented Jan 8, 2017

MarcinKosinski commented Jan 8, 2017

kassambara commented Jan 28, 2017

MarcinKosinski commented Jan 28, 2017

kassambara commented Jan 28, 2017

kassambara commented Jan 28, 2017

moripe commented Jun 5, 2017 • edited Loading

kassambara commented Jun 5, 2017

moripe commented Jun 5, 2017

kassambara commented Jun 5, 2017

moripe commented Jun 6, 2017

kassambara commented Jun 6, 2017

moripe commented Jun 6, 2017

kassambara commented Jun 6, 2017

moripe commented Jun 6, 2017

kassambara commented Jun 6, 2017

moripe commented Jun 6, 2017 • edited Loading

moripe commented Jun 6, 2017 • edited Loading

kassambara commented Jun 6, 2017

kassambara commented Jun 6, 2017

ccapizzano commented Jun 6, 2017 • edited Loading

raph06 commented Oct 31, 2017

kassambara commented Oct 31, 2017

audyavar commented Nov 15, 2018

MarcinKosinski commented Nov 15, 2018 via email

rizmalik commented Dec 29, 2018

SoumyaRo commented Jan 22, 2019

SoumyaRo commented Jan 22, 2019

MarcinKosinski commented Jan 22, 2019 via email

calzzone commented Mar 1, 2019

teixeiradatt commented Dec 2, 2020

moripe commented Jun 5, 2017 •

edited

Loading

moripe commented Jun 6, 2017 •

edited

Loading

moripe commented Jun 6, 2017 •

edited

Loading

ccapizzano commented Jun 6, 2017 •

edited

Loading