-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expand survival data geoms with geom_table_risk etc. #20
Comments
Thanks I will keep it open to remind me to fix it in the next update. |
Thank you very much. Well, I don't use Shiny for anything, I rather needed it for a statistical report, where, indeed, I use the median + its CI. I add it easily with just 2 lines to the graph. What I love the most in your approach is that:
geom_do_this + geom_do_that + geom_do_even_more. That's the way it should go rather than a_monstrous_function(with_dozens_of_parameters_not_easily_extendable)
With your approach I can do the following easily: or even more: I think you could easily make a separate (that's important!) competing to the SurvMiner package for the ggplot2-based survival plotting, where each "unit functionality" is split into a corresponding geom_ or stat_ function (to expose all the important settings and parameters for advanced statisticians). Exactly they way you do. It's really weird, it's been missing for so long time! I'd love to see it. People, who do serious statistics (not just eye-catching presentations, with different fonts and colors), care more of the syntax-consistency and functionality. Let me propose a few "geoms_" you might want to consider adding in the future:
It's very important to make it a separate package. Why?
I hope I encouraged you to consider starting this journey! You already did well with the basic geom_km. You saved my life, now I don't have to wrap the survfit with ggplot2 calls, having yours - ready to work :) PS: Please consider adding your examples into the official ggplot2 extensions library! |
Thanks for the valuable feedback. Initially this was written by @sachsmc and he gracefully accepted to be co-author and to include his code as part of ggquickeda. I do believe geom_km and friends is the most ggplot2 like amongst other solutions. I rewrote the ribbon code as at the time there was no stepped ribbon and also wrote it in a way that ggplotly would work. My sole purpose was to enable flexible km curves with facets grouping linetype etc. in my ggquickeda package . |
This is a great news! I hope the 2021 will be the year of birth of your new awesome package! Let me show you what I did (quite quickly!) using your functions and a couple of additional tidyr-related coding (but no rocket-science here!) MST stands for the median survival time. I could easily extend it by p-values and other stats, if space permits, but I will provide it in a separate table anyway. Just wanted to illustrate it. Sure, I could make something similar with the SurvMiner, but it would be limited only to 2 dimensions. With the flexibility of the ggplot2 faceting mechanism (enabled for the KM via your functions) I don't have to care of how many levels are needed by my client (now or in the future). I just sit, code and report. @smouksassi and @sachsmc, please accept my deepest gratitude for your work! PS: this is fake data, nothing secret is revealed here. You may find useful a small piece of code, that translates the output of survfit into median survival times with grouping variables:
with the following output (fit, and the processed fit): PS: I will have to look later on the time conversion that SurvMiner can do for me in case I need other time units.
|
@smouksassi
Curves split by appropriate factors (faceted):
The data:
|
thanks this is awesome I hope I will have time to get around to this soon ! the above showcase why we need this code and package ! of course PR welcome |
@smouksassi Hi! I don't know, if you finally decided to create a separate package for the survival analysis, but (to encourage you) I thought you might want to see, how flexible are the tools you made :) Let me show you an example - the final plot. Sure, there's nothing fancy about it - just ordinary KM. BUT, this can be faceted easily (to any level) without hacks, all components can be easily "commented out", no "million-parameters-big-function controlling everything". It's more in the tidyverse spirit. Of course, I had to write some code to achieve that (e.g. the median survival time, the table with number of events, combined via patchwork package), but it was much easier with your "atomic" (I mean - simple, single-responsibility) functions km_XX, allowing me to get the basic plot and enhance it as I want, rather than hacking the survminer. That's why I loved your approach and why it would be great to have it separated - in the future :) By the way, there's a smart way to exclude a KM band from a certain curve - like an "average" curve, which is just for informative purposes, but not for inference. I tried to achieve that by providing a separate filtered data, but it didn't work. I was wondering, if the km_bands could take an argument to decide to which curve the CI should be added. But, anyway, this can be achieved by assigning a "transparent color" to a certain "fill". |
I have written a function that can produce the desired output with control on what to facet on and how to conduct logrank pvalue (grouping) library(survival)
library(tidyverse)
library(ggquickeda)
library(ggrepel)
library(ggh4x)
ggkmandtable <- function(data = lung_long, # long format filter to Endpoint of choice
time = "time" , # long format filter to Endpoint of choice
status = "DV",
endpoint ="Endpoint",
groupvar1 = "Endpoint", #separate fit by endpoint and by expname
groupvar2 ="expname", # and up to two additional grouping
exposure_metrics = c("age","ph.karno"),
# exposures/covariates will be stacked into expname exptile
# i
exposure_metric_split = c("median","tertile","quartile","none"),
exposure_metric_soc_value = -99,
exposure_metric_plac_value = 0,
color_fill = "exptile",
linetype = "exptile",
xlab = "Time of follow_up",
ylab ="Overall survival probability",
nrisk_table_plot = TRUE,
nrisk_table_variables = c("n.risk"),#n.risk pct.risk n.event cum.n.event n.censor
nrisk_table_breaktimeby = NULL,
nrisk_table_textsize = 4,
nrisk_position_scaler = 0.2,
nrisk_position_dodge = 0.2,
nrisk_offset = 0,
nrisk_filterout0 = FALSE,
km_logrank_pvalue = FALSE,
km_trans ="identity" ,#"identity","event","cumhaz","cloglog")
km_ticks = TRUE,
km_band = TRUE,
km_conf_int = 0.95,
km_conf_type = "log", #c("none" , "plain", "log" ,"log-log","logit"),
km_conf_lower = "usual", #c("peto" , "modified", "usual"),
km_median = "none", #medianci,median none
km_yaxis_position = c("left"),#"right
facet_formula = NULL,
facet_ncol = NULL,
facet_strip_position = c("top","top","top","top")
) {
timevar <- time
statusvar <- status
endpointinputvar <- endpoint
groupvar1inputvar <- groupvar1
groupvar2inputvar <- groupvar2
colorinputvar <- if (color_fill !="none") color_fill else NULL
fillinputvar <- if (color_fill !="none") color_fill else NULL
linetypeinputvar <- if (linetype !="none") linetype else NULL
survformula <- paste( "Surv","(",timevar,",",statusvar,")",sep="")
facet_formula <- if (is.null(facet_formula) ) stats::as.formula( paste("~",groupvar1inputvar,"+",groupvar2inputvar)) else
stats::as.formula(facet_formula)
exposure_metric_split <- match.arg(exposure_metric_split)
data <- data %>%
mutate(none = "none") # needed when no metric are chosen
data.long <- data %>%
gather(expname,expvalue,!!!exposure_metrics) %>%
group_by(expname,!!endpoint)
if(exposure_metric_split=="none") {
data.long <- data.long %>%
mutate(exptile = case_when(
#expvalue == exposure_metric_soc_value ~ "SOC",
#expvalue == exposure_metric_plac_value ~ "Placebo",
#expvalue > exposure_metric_plac_value ~ "all"))
expvalue == exposure_metric_soc_value ~ NA,
expvalue == exposure_metric_plac_value ~ NA,
expvalue > exposure_metric_plac_value ~ expvalue))
}
if(exposure_metric_split=="quartile") {
data.long <- data.long %>%
mutate(
Q25 = quantile(expvalue[!expvalue %in% c(exposure_metric_soc_value,
exposure_metric_plac_value)], 0.25, na.rm=TRUE),
Q50 = quantile(expvalue[!expvalue %in% c(exposure_metric_soc_value,
exposure_metric_plac_value)], 0.50, na.rm=TRUE),
Q75 = quantile(expvalue[!expvalue %in% c(exposure_metric_soc_value,
exposure_metric_plac_value)], 0.70, na.rm=TRUE)) %>%
mutate(exptile = case_when(
expvalue == exposure_metric_soc_value ~ "SOC",
expvalue == exposure_metric_plac_value ~ "Placebo",
expvalue > exposure_metric_plac_value &
expvalue <= Q25 ~ "Q1",
expvalue > Q25 & expvalue <= Q50 ~ "Q2",
expvalue > Q50 & expvalue <= Q75 ~ "Q3",
expvalue > Q75 ~ "Q4"))
}
if(exposure_metric_split=="tertile") {
data.long <- data.long %>%
mutate(
Q33 = quantile(expvalue[!expvalue %in% c(exposure_metric_soc_value,
exposure_metric_plac_value)], 1/3, na.rm=TRUE),
Q66 = quantile(expvalue[!expvalue %in% c(exposure_metric_soc_value,
exposure_metric_plac_value)], 2/3, na.rm=TRUE)) %>%
mutate(exptile = case_when(
expvalue == exposure_metric_soc_value ~ "SOC",
expvalue == exposure_metric_plac_value ~" Placebo",
expvalue > exposure_metric_plac_value &
expvalue <= Q33 ~ "T1",
expvalue > Q33 & expvalue <= Q66 ~ "T2",
expvalue > Q66 ~ "T3"))
}
if(exposure_metric_split=="median") {
data.long <- data.long %>%
mutate(
Q50 = quantile(expvalue[!expvalue %in% c(exposure_metric_soc_value,
exposure_metric_plac_value)], 0.5, na.rm=TRUE)) %>%
mutate(exptile = case_when(
expvalue == exposure_metric_soc_value ~"SOC",
expvalue == exposure_metric_plac_value ~"Placebo",
expvalue > 0 & expvalue <= Q50 ~ "M1",
expvalue > Q50 ~ "M2"))
}
data.long$exptile2 <- data.long$exptile
#we generate a curve by the combination of all these inputs removing duplicates and none
listvars <- unique(c(endpointinputvar,colorinputvar,fillinputvar,linetypeinputvar,groupvar1,groupvar2))
listvars <- listvars[!is.element(listvars,c("none",".")) ]
listvars <- listvars[!duplicated(listvars) ]
if ( length(listvars) ==0 ){
f <- as.formula(paste(survformula, "1", sep = " ~ "))
}
if ( length(listvars) >0 ){
f <- as.formula(paste(survformula, paste(listvars, collapse = " + "), sep = " ~ "))
}
surv_object <- eval(bquote( survfit( .(f) , data.long) ))
if (is.null(nrisk_table_breaktimeby) ||
nrisk_table_breaktimeby == '' ||
is.na(nrisk_table_breaktimeby)){
ggsurv <- survminer::ggsurvplot(surv_object,
data.long,risk.table = TRUE,
ggtheme = theme_bw())
} else {
ggsurv <- survminer::ggsurvplot(surv_object,
data.long,risk.table = TRUE,
break.time.by = nrisk_table_breaktimeby,
ggtheme = theme_bw())
}
if(km_logrank_pvalue){ #log rank does not group by color_fill, linetype exptile
loopvariables <- unique(c(endpointinputvar,"expname",groupvar1,groupvar2))
loopvariables <- loopvariables[!loopvariables%in% "exptile"]
listvars2 <- listvars[!listvars%in% loopvariables]
if ( length(listvars2) ==0 ){
f2 <- as.formula(paste(survformula, "1", sep = " ~ "))
}
if ( length(listvars2) >0 ){
f2 <- as.formula(paste(survformula, paste(listvars2, collapse = " + "), sep = " ~ "))
}
survfit_by_endpoint <- list()
logrank_test_by_endpoint <- list()
data.long <- unite(data.long,"loopvariable", !!!loopvariables, remove = FALSE)
for (i in unique(data.long[,"loopvariable"]) %>%
pull() %>%
as.character() ) {
survregdata<- data.long %>%
filter(.data[["loopvariable"]] ==i)
survfit_by_endpoint_fit <- eval(bquote( survfit( .(f2) , survregdata) ))
survfit_by_endpoint[[i]] <- survfit_by_endpoint_fit
logrank_test_by_endpoint_fit <- survminer::surv_pvalue(survfit_by_endpoint_fit, method = "1",
data=survregdata)
logrank_test_by_endpoint_fit[,"loopvariable"] <- i
logrank_test_by_endpoint[[i]] <- logrank_test_by_endpoint_fit
}
logrank_test_by_endpoint <- enframe(logrank_test_by_endpoint) %>%
unnest(cols = c(value))
logrank_test_by_endpoint <- logrank_test_by_endpoint %>%
separate(loopvariable, into = loopvariables,
sep="_",extra = "merge"
)
#print(listvars2)
#print(f2)
#print(logrank_test_by_endpoint)
}
#print(listvars)
#print(f)
risktabledata <- ggsurv$table$data
#print(risktabledata)
if(!is.null(surv_object$strata)){
variables <- survminer:::.get_variables(risktabledata$strata, surv_object, data.long)
for(variable in variables) {
risktabledata[[variable]] <- survminer:::.get_variable_value(variable,risktabledata$strata, surv_object, data.long)
}
}
if(nrisk_filterout0){
risktabledata <- risktabledata %>%
filter(n.risk > 0)
}
if(!is.null(nrisk_table_variables) && (length(as.vector(nrisk_table_variables)) > 0) &&
all(nrisk_table_variables != "")){
risktabledatag<- gather(risktabledata,key,value, !!!nrisk_table_variables , factor_key = TRUE)
risktabledatag$keynumeric<- - nrisk_position_scaler* as.numeric(as.factor(risktabledatag$key)) + nrisk_offset
}
if(is.null(nrisk_table_variables) || all(nrisk_table_variables == "") ) {
risktabledatag<- gather(risktabledata,key,value, n.risk, factor_key = TRUE)
risktabledatag$keynumeric<- - nrisk_position_scaler* as.numeric(as.factor(risktabledatag$key)) + nrisk_offset
}
if (km_median!="none"){
if(!is.null(surv_object$strata) || is.matrix(surv_object$surv)) {
.table <- as.data.frame(summary(surv_object)$table)
} else {
.table <- t(as.data.frame(summary(surv_object)$table))
rownames(.table) <- "(all)"
}
surv_median <- as.vector(.table[,"median"])
dfmedian <- data.frame(x1 = surv_median,
x2 = surv_median,
x1lower = as.vector(.table[,"0.95LCL"]),
x1upper = as.vector(.table[,"0.95UCL"]),
y1 = rep(0, length(surv_median)),
y2 = rep(0.5, length(surv_median)),
strata = survminer:::.clean_strata(rownames(.table)))
if(!is.null(surv_object$strata)){
variables <- survminer:::.get_variables(dfmedian$strata, surv_object, data.long)
for(variable in variables) {
dfmedian[[variable]] <- survminer:::.get_variable_value(variable, dfmedian$strata, surv_object, data.long)
}
}
}
plotkm0 <- ggplot(data.long,aes_string(time = time,status = status,
color = colorinputvar, fill = fillinputvar,
linetype = linetypeinputvar))+
geom_line(stat = "km",trans = km_trans)
if(km_band){
plotkm00 <- plotkm0 +
geom_ribbon(stat="kmband",alpha=0.2,color="transparent",
conf.int = km_conf_int,
conf.type = km_conf_type,
conf.lower = km_conf_lower,
error = "greenwood",
trans = km_trans)
}
if(!km_band){
plotkm00 <- plotkm0
}
if(km_ticks){
plotkm000 <- plotkm00 +
geom_kmticks(trans = km_trans)
}
if(!km_ticks){
plotkm000 <- plotkm00
}
if(nrisk_table_plot) {
plotkm1 <- plotkm000 +
geom_text(data=risktabledatag,
aes(x=time,label=value,y=keynumeric,time=NULL,status=NULL ),
show.legend = FALSE,
size= nrisk_table_textsize,
position = ggstance::position_dodgev(height =nrisk_position_dodge)
)+
geom_hline(yintercept = -nrisk_position_scaler *unique(c(1,(as.numeric(as.factor(
risktabledatag$key))+1 )) ) + (abs(nrisk_position_dodge)/2 ) + nrisk_offset
)
}
if(!nrisk_table_plot) {
plotkm1 <- plotkm0
}
if(km_median=="table"){
plotkm1m <- plotkm1 +
geom_text(data=dfmedian %>% mutate(none="none"),
aes(x=0, y=(max(as.numeric(as.factor(get(!!color_fill))))+1)*0.09,
label="Med. Surv. Time:"),
hjust = 0, size=3,
show.legend = FALSE,
color="gray30",inherit.aes = FALSE) +
geom_text(data=dfmedian %>% mutate(none="none",
"{timevar}" := NA,
"{statusvar}" := NA),
aes(x=0, y=0.09*as.numeric(as.factor(get(!!color_fill))),
label=paste0(get(!!color_fill), ":")),
hjust = 0, size=3,
show.legend = FALSE,inherit.aes = TRUE) +
geom_text(data=dfmedian %>% mutate(none="none",
"{timevar}" := NA,
"{statusvar}" := NA),
aes(x=60, y=0.09*as.numeric(as.factor(get(!!color_fill))),
label=ifelse(is.na(x1), "-",as.character(x1))),
hjust = 0, size=3,
show.legend = FALSE,inherit.aes = TRUE)
}
if(km_median=="medianci"){
plotkm1m <- plotkm1 +
geom_label_repel(data = dfmedian, aes(x= x1 , y= y2 ,label =sprintf("%#.3g (%#.3g, %#.3g)",x1,x1lower,x1upper),
status=NULL,time=NULL),show.legend = FALSE,
label.size = NA, direction="both",fill="white",
segment.color="black",nudge_y = -0.1,segment.size = 0.5,
alpha = 0.5,label.padding=.1, force = 5,
na.rm=TRUE,
seed = 1234) +
geom_label_repel(data = dfmedian, aes(x= x1 , y= y2 ,label =sprintf("%#.3g (%#.3g, %#.3g)",x1,x1lower,x1upper),
status=NULL,time=NULL),show.legend = FALSE,
label.size = NA,direction="both",
nudge_y = -0.1,segment.size = 0.5,
arrow = arrow(length = unit(0.03, "npc"), type = "closed", ends = "first"),
alpha = 1,label.padding=.1, force = 5,
na.rm=TRUE,
fill = NA,
seed = 1234)
}
if(km_median=="median"){
plotkm1m <- plotkm1 +
geom_label_repel(data = dfmedian, aes(x= x1 , y= y2 ,label = sprintf("%#.3g",x1), status=NULL,time=NULL),show.legend = FALSE,
label.size = NA, direction="both",fill="white",
segment.color="black",nudge_y = -0.1,segment.size = 0.5,
alpha = 0.5,label.padding=.1, force = 5,
na.rm=TRUE,
seed = 1234) +
geom_label_repel(data = dfmedian, aes(x= x1 , y= y2 ,label =sprintf("%#.3g",x1),
status=NULL,time=NULL),show.legend = FALSE,
label.size = NA,direction="both",
nudge_y = -0.1,segment.size = 0.5,
arrow = arrow(length = unit(0.03, "npc"), type = "closed", ends = "first"),
alpha = 1,label.padding=.1, force = 5,
na.rm=TRUE,
fill = NA,
seed = 1234)
}
if(km_median=="none"){
plotkm1m <- plotkm1
}
plotkm2 <- plotkm1m +
facet_nested_wrap(facet_formula,ncol= facet_ncol ,
strip = strip_split(position=facet_strip_position))+
scale_y_continuous(position = km_yaxis_position,
breaks =c(unique(risktabledatag$keynumeric),
c(0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1) ),
labels= c(as.vector(unique(risktabledatag$key)),
c("0","10","0.2","0.3","0.4","0.5","0.6","0.7","0.8","0.9","1") ),
expand = expansion(mult=c(0.01,0.01),
add =c(0, 0)))+
scale_x_continuous( breaks =c(unique(risktabledatag$time))) +
theme_bw()+
theme(legend.position = "top",strip.placement = "outside",
axis.title.y = element_blank())+
labs(color="",fill="",linetype="",
x = xlab,y = ylab)
if(km_logrank_pvalue){
plotkm3 <- plotkm2 +
geom_text(data=logrank_test_by_endpoint,
aes(x=-Inf,y=Inf,label=pval.txt),vjust=1,hjust=0,
inherit.aes = FALSE)
plotkm <- plotkm3
}
if(!km_logrank_pvalue) {
plotkm <- plotkm2
}
plotkm
}
Example library(tidyverse)
library(patchwork)
library(ggquickeda)
library(patchwork)
library(ggh4x)
library(survival)
library(survminer)
source("ggkmandtable.R")
# see comments above to dput survival_overall_data
current_survival_data <- survival_overall_data %>%
filter(sex != "Overall") %>%
droplevels()
ggkmandtable (data = current_survival_data,
time= "time",
status ="DV",
exposure_metrics =c("none"),
exposure_metric_split = "none",
color_fill = "sex",
linetype = "TRT",
groupvar1 = "RNR",
groupvar2 = "N2O",
xlab = "Time of follow_up",
ylab ="Overall survival probability",
nrisk_table_variables = c("n.censor"),
km_median = "table",
km_band = FALSE,
nrisk_table_breaktimeby = NULL,
facet_formula ="~TRT+RNR+N2O",
facet_ncol = 6,
facet_strip_position = c("right","top","top") ) |
Hello,
I'm trying to achieve faceted KM plots. Your package makes it awesome easy with geom_XX syntax. Much easier than in other packages. It works for the geom_km (with the issue that it doesn't start the Y axis from 0 initially) and geom_band, yet it fails at geom_ticks for censored data. Let me show an example.
The data:
Head:
Plotting:
Warnings:
and the output from the SurvMiner for a comparison:
EDIT:
OK, I fixed the issue on my own. The missing function comes from the package scales
When I loaded it, it worked.
Maybe you should pre-load it in your code? People may forget doing that.
The text was updated successfully, but these errors were encountered: