Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences between plot.cox.zph and ggcoxzph #534

Closed
Generalized opened this issue Jun 12, 2021 · 4 comments
Closed

Differences between plot.cox.zph and ggcoxzph #534

Generalized opened this issue Jun 12, 2021 · 4 comments

Comments

@Generalized
Copy link

Generalized commented Jun 12, 2021

Dear @kassambara, @pbiecek and the rest of the SurvMiner Team.

Today I found an issue while plotting the results of the cox.zph using the function from the survival and survminer package. I get different results, attached below. Which one is correct?

I am preparing for writing a paper and I cannot risk. That's why I report the inconsistency. Both survival and survminer packages are very popular and I believe many people base their scientific work on them.

data <- structure(list(time = c(1, 7, 8, 41, 80, 85, 11, 61, 8, 2, 98, 
5, 4, 42, 23, 44, 5, 29, 13, 4, 7, 19, 9, 6, 5, 13, 6, 1, 30, 
38, 57, 87, 8, 22, 7, 16, 58, 6, 18, 55, 93, 15, 21, 28, 81, 
34, 10, 79, 2, 66, 55, 10, 6, 62, 58, 2, 42, 22, 5, 54, 57, 23, 
42, 10, 9, 27, 9, 5, 5, 40, 6, 11, 37, 101, 25, 108, 76, 38, 
13, 9, 41, 17, 11, 99, 27, 15, 2, 71, 7, 75, 36, 12, 74, 18, 
74, 16, 44, 107, 12, 41, 9, 18, 3, 11, 100, 42, 20, 15, 65, 11, 
53, 58, 20, 6, 12, 81, 53, 44, 5, 2, 26, 57, 80, 50, 22, 79, 
41, 12, 15, 42, 25, 64, 24, 83, 66, 18, 59, 14, 8, 20, 20, 12, 
30, 40, 11, 42, 42, 17, 44, 7, 2, 62), status = c(1L, 1L, 1L, 
1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 
1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 
1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 
1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 
1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 
0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 
1L, 1L, 1L, 1L, 0L), RNR = structure(c(2L, 2L, 1L, 2L, 1L, 1L, 
1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 
2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 
1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 
1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 
2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 
2L, 1L), .Label = c("A", "B"), class = "factor"), Gender = c("Male", 
"Male", "Male", "Male", "Female", "Male", "Male", "Female", "Female", 
"Male", "Female", "Male", "Male", "Female", "Male", "Female", 
"Male", "Female", "Male", "Male", "Female", "Male", "Male", "Female", 
"Female", "Male", "Male", "Male", "Female", "Male", "Male", "Male", 
"Male", "Female", "Male", "Male", "Male", "Male", "Male", "Female", 
"Female", "Male", "Male", "Male", "Male", "Female", "Male", "Female", 
"Female", "Female", "Male", "Male", "Female", "Male", "Female", 
"Male", "Male", "Male", "Male", "Female", "Male", "Male", "Male", 
"Female", "Male", "Male", "Male", "Male", "Male", "Male", "Female", 
"Male", "Male", "Female", "Female", "Female", "Male", "Male", 
"Male", "Male", "Female", "Female", "Male", "Female", "Male", 
"Male", "Male", "Male", "Female", "Female", "Female", "Male", 
"Male", "Male", "Male", "Male", "Male", "Female", "Female", "Male", 
"Male", "Male", "Male", "Female", "Female", "Male", "Male", "Female", 
"Male", "Male", "Male", "Female", "Male", "Male", "Male", "Female", 
"Male", "Male", "Female", "Female", "Male", "Female", "Male", 
"Male", "Male", "Male", "Female", "Male", "Female", "Female", 
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", 
"Male", "Female", "Male", "Male", "Female", "Male", "Female", 
"Female", "Female", "Female", "Male", "Male", "Female", "Female"
), AgeBin = structure(c(1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 
1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 
2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 
2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("<65", 
">=65"), class = "factor")), row.names = c(NA, -152L), class = "data.frame")

The fit:

fit   <- coxph(Surv(time, status) ~ RNR + Gender + AgeBin, data = x)
summary(fit)

Call:
coxph(formula = Surv(time, status) ~ RNR + Gender + AgeBin, data = x)

  n= 152, number of events= 108 

             coef exp(coef) se(coef)     z Pr(>|z|)    
RNRB       1.1317    3.1008   0.2082 5.434  5.5e-08 ***
GenderMale 0.3592    1.4322   0.2243 1.602    0.109    
AgeBin>=65 0.3820    1.4652   0.2023 1.888    0.059 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

           exp(coef) exp(-coef) lower .95 upper .95
RNRB           3.101     0.3225    2.0617     4.664
GenderMale     1.432     0.6982    0.9228     2.223
AgeBin>=65     1.465     0.6825    0.9856     2.178

Concordance= 0.688  (se = 0.025 )
Likelihood ratio test= 43.03  on 3 df,   p=2e-09
Wald test            = 43.83  on 3 df,   p=2e-09
Score (logrank) test = 48.64  on 3 df,   p=2e-10

The test of PH:

cox.zph(fit)

       chisq df    p
RNR    1.215  1 0.27
Gender 2.099  1 0.15
AgeBin 0.367  1 0.54
GLOBAL 4.921  3 0.18

Let's print it:

layout(1:3)
par(mar=c(2,4,1,1))
invisible(sapply(1:3, function(i) plot(cox.zph(fit)[i])))

obraz

As we can notice - most of the points lie outside the dashed range.

Now, let's plot it using the survminer package:

obraz

And now all the points lie inside the intervals. The intervals look identical and span the same (visually) range.
So one routine has to be wrong. Or both are correct, but different setting/option is in use. Or maybe I am missing something trivial.

This is the survival repository: https://github.com/therneau/survival/tree/master/R

I reported this issue also on the survival's github.

pbiecek added a commit to pbiecek/survminer that referenced this issue Jun 12, 2021
@pbiecek
Copy link
Contributor

pbiecek commented Jun 12, 2021

Hi,
thank you for pointing this out,
there was an issue in the scaling factor for sd, it will be fixed with #535
but you can try this temporary fix: https://github.com/pbiecek/survminer

Please note that despite changes in the plot, the p-values are the same and it is much safer to do the inference based on p-values,
also note that the confident interval is for the trend and the tested hypothesis is that the effect does not depend on time, thus you would like to see if a horizontal line fits in the confidence intervals. The location of points (almost) does not matter

@Generalized
Copy link
Author

Generalized commented Jun 13, 2021

@pbiecek Wow, thank you for a so quick reaction!
Actually, I did not worry about the intervals itself. Honestly, I always turn them off :) I do not need them, focusing on the patterns and the p's from the cox.zph().

But this difference got my attention, because I started worrying that something I don't understand is happening under the hood. And since my work will be likely validated by the stat. reviewer, I wouldn't be able to explain this. Now I can close also the issue on in the survival's page.

BTW, will this affect also the shape of the intervals? Because here all 3 are identical. And in the plot.cox.zph they all differ. Maybe you use the same SD for all variables?

EDIT: yes, it will. It works:
obraz


A side note.
If you don't mind, I would like to suggest something by this occasion.
Many people, including me, rely on the SurvMiner and other famous packages. People use the results of calculations in their scientific work, when they publish papers. Sure, we all know "R comes without any warranty", but we all have to trust it on some level - otherwise what would be the reason to use R or Python, if we are constantly under a serious risk?

Errors happen everywhere (including commercial packages), it's not unusual. But, when we catch it, it would be good to inform people about that, so they can update their papers. The "NEWS" file is "hidden" to many people (I observed many don't even know it exists). So maybe you could add a section to the web page (maybe there? https://rpkgs.datanovia.com/survminer/index.html) with "hot changes" or "breaking changes"?

You know, when it's about simple issues, like wrongly aligned label, or legend disappearing, it can be annoying, but not "breaking". But when something essential is corrected, like the Gehan-Breslow p-value, that was fixed recently in SurvMiner, people would be happy to be informed somehow there's a need to re-run their calculations.

This issue isn't any serious IMO, but I'm talking about the possible future ones.

Anyway, thank you again!

@MarcinKosinski
Copy link
Contributor

Hey @kassambara any way this can be merged?

@kassambara
Copy link
Owner

fixed by #535

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants