diff --git a/.gitignore b/.gitignore index 85c0afc..98fec23 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ .DS_Store .Rproj.user +.Rhistory diff --git a/survival-analysis/README.Rmd b/survival-analysis/README.Rmd index e8b399d..12584f7 100644 --- a/survival-analysis/README.Rmd +++ b/survival-analysis/README.Rmd @@ -1,14 +1,16 @@ - +--- +always_allow_html: true +--- ```{r, load_libs, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'} library(prodlim) library(survival) library(flexsurv) +library(casebase) library(rpart) library(pec) library(ipred) library(randomForestSRC) -library(ranger) library(mboost) library(tidymodels) library(cli) @@ -105,6 +107,7 @@ fitw <- dist = "weibull") times <- summary(fitw, ovarian[1:2,], type = "mean") %>% + map(~ setNames(.x, c("est", "lcl", "ucl"))) %>% bind_rows() %>% select(.pred_time = est) %>% as_tibble() @@ -195,6 +198,42 @@ summary(flxs_mod, test_pred, type = "survival", t = .times, tidy = TRUE) The `flexsurvspline()` function has the same interface. +## `casebase::`[`fitSmoothHazard`](https://rdrr.io/cran/casebase/man/fitSmoothHazard.html) + +The main fitting function is `fitSmoothHazard`: + +```{r, cb_fit} +library(casebase) +cb_mod <- + fitSmoothHazard(status ~ time + (X1 + X2) ^ 2, + data = train_dat) +``` + +Note that `time` is modeled explicitly and with flexibility: + +```{r, cb_splines} +library(splines) +cb_sp_mod <- + fitSmoothHazard(status ~ bs(time) + (X1 + X2) ^ 2, + data = train_dat) +``` + +And we can also do penalized estimation: + +```{r, cb_glment} +cb_glmnet_mod <- + fitSmoothHazard(status ~ time + (X1 + X2) ^ 2, + data = train_dat, family = "glmnet") +``` + +Predictions for the cumulative incidence or survival probabilities are obtained using `absoluteRisk`: + +```{r, cb_pred} +absoluteRisk(cb_mod, newdata = test_pred, time = .times, type = "CI") +absoluteRisk(cb_mod, newdata = test_pred, time = .times, type = "survival") +``` + +(Note that the argument `type` is only available in the development version.) ## Cox PH Models @@ -389,34 +428,6 @@ although these values do not bracket the observed times: summary(train_dat$time) ``` -### `ranger:::`[`ranger`](https://rdrr.io/cran/ranger/man/ranger.html) - -```{r, ranger_fit} -library(ranger) -ranger_mod <- ranger(Surv(time, status) ~ X1 + X2, data = train_dat, num.trees = 1000) -``` - -The `predict` function generates an object with class `"ranger.prediction"`. - -```{r, ranger_pred} -ranger_pred = predict(ranger_mod, test_pred) -str(ranger_pred) -``` - -The `unique.death.times` slot has the unique death times: - -```{r, } -all(sort(unique(train_dat$time)) == ranger_pred$unique.death.times) -``` - -The `survival` slot contains a matrix of predicted survival curves, with a row for each test case and a column for each death time. The `chf` slot is the same for the cumulative hazard function: - -```{r, ranger_surv} -ranger_pred$survival[, 1:5] -ranger_pred$chf[, 1:5] -``` - - # Model Prediction Scorecard In most cases, the open circle means that this type of prediction is facilitated _indirectly_. For example, using "tricks" related to `survfit()`, methods in the `pec` package, or other means. @@ -431,6 +442,7 @@ pred_types <- ~Package, ~Function, ~`pred event time`, ~`survival probs`, ~`linear pred`, "`survival`", "`survreg`", yep, nope, yep, "`flexsurv`", "`flexsurvreg`", yep, yep, yep, + "`casebase`", "`fitSmoothHazard`", nope, yep, yep, "`survival`", "`coxph`", yep, yep, yep, "`glmnet`", "`glmnet`", kinda, kinda, yep, "`mboost`", "`*boost`", kinda, kinda, yep, @@ -438,8 +450,7 @@ pred_types <- "`ipred`", "`bagging`", yep, kinda, nope, "`party`", "`ctree`", yep, kinda, nope, "`party`", "`cforest`", yep, kinda, nope, - "`randomForestSRC`", "`rfsrc`", yep, kinda, nope, - "`ranger`", "`ranger`", kinda, yep, nope + "`randomForestSRC`", "`rfsrc`", yep, kinda, nope ) pred_types %>% diff --git a/survival-analysis/README.html b/survival-analysis/README.html new file mode 100644 index 0000000..d1e6787 --- /dev/null +++ b/survival-analysis/README.html @@ -0,0 +1,1247 @@ + + + + +
+ + + + + + + + +There is significant interest in creating infrastructure for modeling censored data using the tidymodels packages. This document is a summary and discussion of what needs to be done and a survey of many of the modeling approaches existing in R.
+We’d like to get feedback, edits, and/or contributions. Please add issues or a pull request in case for those. API suggestions are very welcome.
+What needs to be developed for tidymodels? parsnip
already contains some simple wrappers for parametric survival models but, to really be useful, there is a fair amount of infrastructure needed.
This can be broken down into a few parts:
+model fit wrapper for parsnip
new predictions types
performance metrics for censored data.
The first item is fairly straight-forward. Since parsnip
wraps model functions, this should be the easy part.
There is some work for the second item. Many applications of censored data are less focused on predicting the actual event time; the focus is often on predicting the probability of the event (e.g. death) at specific times. For this, a new prediction type is needed. Instead of using type = "prob"
, a different values could be used (e.g. type = "hazard"
or type = "survival"
). However, these predictions are dynamic since they are made at specific time points. For this reason, parsnip
and other packages will require a specific argument for the time of interest (I nominate a common argument name of .time
).
Since this is a big difference, it may be helpful to define a new parsnip
model mode, perhaps called “risk prediction” or “censored regression” so that we can delineate the problem. For example, there is an rpart
engine for decision_tree()
that can be used for classification or regression. Without another mode, there would be conflicts or the need to make alternate function names (e.g. survival_decision_tree()
, which is less attractive).
The third item, performance metrics, will also require some minor modifications for dynamic predictions. A standard argument will be required for survival metrics that tell the function what column has the time of interest.
+What about Surv()
? This function is a mainstay of survival analysis in S and R. It encapsulates the event time(s) and the censoring indicator. Most functions standardize on this format.
The main issue is that, within data frames, it might be difficult to work with. For example:
+library(survival)
+library(tidymodels)
+
+data(lung) # usually use `Surv(time, status)` for these data
+
+surv_obj <- Surv(lung$time, lung$status)
+is.vector(surv_obj)
+[1] FALSE
+is.matrix(surv_obj)
+[1] TRUE
+lung$surv <- surv_obj
+head(lung)
+ inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss surv
+1 3 306 2 74 1 1 90 100 1175 NA 306
+2 3 455 2 68 1 0 90 90 1225 15 455
+3 3 1010 1 56 1 0 90 90 NA 15 1010+
+4 5 210 2 57 1 1 90 60 1150 11 210
+5 1 883 2 60 1 0 100 90 NA 0 883
+6 12 1022 1 74 1 1 50 80 513 0 1022+
+lung_tbl <- as_tibble(lung)
+lung_tbl$surv <- surv_obj
+lung_tbl %>% slice(1:6)
+# A tibble: 6 x 11
+ inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss surv[,"time"]
+ <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+1 3 306 2 74 1 1 90 100 1175 NA 306
+2 3 455 2 68 1 0 90 90 1225 15 455
+3 3 1010 1 56 1 0 90 90 NA 15 1010
+4 5 210 2 57 1 1 90 60 1150 11 210
+5 1 883 2 60 1 0 100 90 NA 0 883
+6 12 1022 1 74 1 1 50 80 513 0 1022
+Typically, Surv
objects are not added as columns but are used when the model formula is specified. For tidymodels, there may be some added complexity since special, model-specific formulas might be needed to use workflows. This is very similar to special formulas for generalized additive models and mixed models.
It would probably be better to leave the columns alone and use Surv()
in the model formula. However, for recipes, the censoring column might get treated as a predictor or outcome so a special role might be helpful to make sure that the data are properly handled during model fitting.
The cleanest place to start this work is to create yardstick
metric functions since these are model-agnostic. The concordance statistic uses a static, event time prediction and would be very straight-forward to wrap the function from the survival package.
For dynamic metrics, such as the prediction error curve or survival ROC metrics, the problem is a little more complex since an interface entry point is needed for the time points of interest.
+metric_set()
will need to have a new class
attribute. For example, depending on the type of prediction needed for a metric, this attribute is either class_metric
, prob_metric
, or numeric_metric
. We’ll need new classes for static and dynamic metrics.
Once an official name for the new mode is determined, engines can be added for models like surv_reg()
, decision_tree()
, rand_forest()
, and others. Some details are below for possible models. To begin, we should be relatively selective for picking new models/engines.
These is a fair amount of complexity in the prediction modules. The type and format of return results from predict()
methods can be very different and un-tidy. This alone raises the complexity of the problem.
For some models, there will have to be different sub-modules for probability predictions that depend on the statistical distribution of the data. For example, survreg
can use a few different distributions and translating the estimated parameters to these probabilities can be complicated. Some of this, and other details, is already handled by the pec
package (as will be seen below). Also mlr3
has implemented these too so, at the least, good test cases will be available.
As previously mentioned, predict(x, type = "survival")
will have a mandatory .time
argument for dynamic predictions.
Also, the convention in tidymodels is, when predicting, to produce a tibble with as many rows as new_data
and standardized column names. For static predictions, .pred_time
is reasonable and .pred_survival
and .pred_hazard
might be used for the two respective probabilities.
If dynamic predictions are made, the result will be nested tibbles. For example, if there were two samples being predicted at three time points, the structure should look like:
+pred_vals
+# A tibble: 2 x 2
+ .pred_time .pred
+ <dbl> <list<tibble>>
+1 206. [3 × 2]
+2 167. [3 × 2]
+pred_vals$.pred[[1]]
+# A tibble: 3 x 2
+ .time .pred_hazard
+ <dbl> <dbl>
+1 100 0.00419
+2 200 0.00651
+3 500 0.0116
+tidyr::unnest(pred_vals, cols = c(.pred))
+# A tibble: 6 x 3
+ .pred_time .time .pred_hazard
+ <dbl> <dbl> <dbl>
+1 206. 100 0.00419
+2 206. 200 0.00651
+3 206. 500 0.0116
+4 167. 100 0.00588
+5 167. 200 0.00914
+6 167. 500 0.0164
+First, let’s simulate some data:
+library(prodlim)
+set.seed(43500)
+train_dat <- SimSurv(200)
+set.seed(2721)
+test_dat <- SimSurv(2)
+test_dat
+ eventtime censtime time event X1 X2 status
+1 10.840 13.53 10.840 1 1 -0.51 1
+2 0.963 2.28 0.963 1 1 1.18 1
+test_pred <- test_dat[, 5:6]
+
+# We will predict at these time points
+.times <- c(1, 5, 10)
+survival:::
survreg
The model fitting function:
+library(survival)
+sr_mod <- survreg(Surv(time, status) ~ (X1 + X2)^2, data = train_dat)
+The predict()
function can generate predictions for the event time and/or percentiles of the survival function via
predict(sr_mod, test_pred)
+ 1 2
+8.04 3.19
+predict(sr_mod, test_pred, type = 'quantile', p = c(.1, .5, .9))
+ [,1] [,2] [,3]
+[1,] 2.54 6.66 12.31
+[2,] 1.01 2.65 4.89
+Note that the latter is in terms of the time units and are not probabilities of survival by a specified time. Since these are parametric models, probablity values can be derived indirectly.
+flexsurv::
flexsurvreg
The main parametric fitting function is
+library(flexsurv)
+flxs_mod <-
+ flexsurvreg(Surv(time, status) ~ (X1 + X2) ^ 2,
+ data = train_dat,
+ dist = "weibull")
+Prediction is done using the summary()
method and the results come back as a data frame with tidy = TRUE
:
# Event time prediction
+summary(flxs_mod, test_pred, type = "mean", tidy = TRUE)
+ est lcl ucl X1 X2
+1 7.12 6.01 8.25 1 -0.51
+2 2.83 2.25 3.57 1 1.18
+# Probabilities
+summary(flxs_mod, test_pred, type = "survival", t = .times, tidy = TRUE)
+ time est lcl ucl X1 X2
+1 1 0.983100 9.69e-01 0.9917 1 -0.51
+2 5 0.673176 5.88e-01 0.7535 1 -0.51
+3 10 0.215802 1.16e-01 0.3195 1 -0.51
+4 1 0.901743 8.33e-01 0.9501 1 1.18
+5 5 0.090590 2.41e-02 0.2115 1 1.18
+6 10 0.000091 7.30e-08 0.0036 1 1.18
+# Same for type = "hazard"
+The flexsurvspline()
function has the same interface.
casebase::
fitSmoothHazard
The main fitting function is fitSmoothHazard
:
library(casebase)
+cb_mod <-
+ fitSmoothHazard(status ~ time + (X1 + X2) ^ 2,
+ data = train_dat)
+Note that time
is modeled explicitly and with flexibility:
library(splines)
+cb_sp_mod <-
+ fitSmoothHazard(status ~ bs(time) + (X1 + X2) ^ 2,
+ data = train_dat)
+And we can also do penalized estimation:
+cb_glmnet_mod <-
+ fitSmoothHazard(status ~ time + (X1 + X2) ^ 2,
+ data = train_dat, family = "glmnet")
+Predictions for the cumulative incidence or survival probabilities are obtained using absoluteRisk
:
absoluteRisk(cb_mod, newdata = test_pred, time = .times, type = "CI")
+ time
+ 0 0.0000 0.000
+ 1 0.0352 0.218
+ 5 0.2761 0.891
+ 10 0.7797 1.000
+absoluteRisk(cb_mod, newdata = test_pred, time = .times, type = "survival")
+ time
+ 0 1.000 1.00e+00
+ 1 0.965 7.82e-01
+ 5 0.724 1.09e-01
+ 10 0.220 3.08e-05
+(Note that the argument type
is only available in the development version.)
survival:::
coxph
cph_mod <- coxph(Surv(time, status) ~ (X1 + X2)^2, data = train_dat, x = TRUE)
+Both restricted mean survival time (RMST) and survival function probabilities at specific times can be predicted with summary.survit()
:
cph_surv <- summary(survfit(cph_mod, newdata = test_pred), times = .times)
+
+# Event time prediction (RMST)
+cph_surv$table
+ records n.max n.start events *rmean *se(rmean) median 0.95LCL 0.95UCL
+1 200 200 200 107 7.05 0.2759 7.29 6.21 8.43
+2 200 200 200 107 2.75 0.0483 2.25 1.85 3.81
+# Survival probabilities
+cph_surv
+Call: survfit(formula = cph_mod, newdata = test_pred)
+
+ time n.risk n.event survival1 survival2
+ 1 191 8 0.974 8.46e-01
+ 5 96 52 0.718 1.18e-01
+ 10 15 41 0.133 2.29e-06
+Survial probabilities can also be predicted using the pec
package;
predictSurvProb(cph_mod, newdata = test_pred, times = .times)
+ [,1] [,2] [,3]
+[1,] 0.974 0.718 1.33e-01
+[2,] 0.846 0.118 2.29e-06
+See the document “cox_notes_extra” for more notes about this model.
+glmnet
The model fitting input for the predictors is a matrix and the response data should be a Surv()
object. We would have to make a new column for the interaction effect.
library(glmnet)
+glmn_fit <- glmnet(x = as.matrix(train_dat[, c("X1", "X2")]),
+ y = Surv(train_dat$time, train_dat$event),
+ family = "cox")
+glmnet
produces predictions across a variety of penalty values. The relative-risk can be obtained using type = "response"
:
predict(glmn_fit, as.matrix(test_pred), type = "response")
+ s0 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 s16
+[1,] 1 0.963 0.929 0.90 0.873 0.85 0.829 0.81 0.837 0.892 0.945 0.997 1.05 1.09 1.14 1.18 1.22
+[2,] 1 1.092 1.184 1.28 1.367 1.46 1.542 1.63 1.824 2.120 2.436 2.768 3.11 3.47 3.84 4.21 4.59
+ s17 s18 s19 s20 s21 s22 s23 s24 s25 s26 s27 s28 s29 s30 s31 s32 s33 s34 s35
+[1,] 1.26 1.30 1.34 1.37 1.40 1.43 1.46 1.48 1.50 1.53 1.55 1.57 1.58 1.6 1.61 1.63 1.64 1.65 1.66
+[2,] 4.96 5.34 5.71 6.07 6.42 6.76 7.09 7.41 7.71 8.00 8.27 8.53 8.77 9.0 9.22 9.42 9.61 9.78 9.94
+ s36 s37 s38 s39 s40 s41 s42 s43
+[1,] 1.67 1.68 1.69 1.7 1.7 1.71 1.72 1.72
+[2,] 10.09 10.23 10.36 10.5 10.6 10.70 10.79 10.88
+mboost
modelsThe mboost
package has a suite of functions for fitting boosted models whose base learners are either trees, GLM’s, or GAM’s. Each can take an argument of family = CoxPH()
and each can make predictions on the linear predictor.
For example, for a GLM model:
+library(mboost)
+glmb_mod <-
+ glmboost(Surv(time, status) ~ X1 + X2,
+ data = train_dat,
+ family = CoxPH())
+predict(glmb_mod, test_pred)
+ [,1]
+1 0.0399
+2 1.6059
+rpart:::
rpart
library(rpart)
+rp_mod <- rpart(Surv(time, status) ~ X1 + X2, data = train_dat)
+The basic invocation of predict
generates the predicted event time:
predict(rp_mod, test_pred)
+ 1 2
+1.15 3.12
+The pec
package can be used to fit an augmented rpart
model and, from this, we can get the survivor probabilities:
pec_rpart_mod <- pecRpart(Surv(time, status) ~ X1 + X2, data = train_dat)
+predictSurvProb(pec_rpart_mod, newdata = test_pred, times = .times)
+ [,1] [,2] [,3]
+[1,] 0.964 0.739 0.0612
+[2,] 0.938 0.309 NA
+ipred:::
bagging
library(ipred)
+bag_mod <- bagging(Surv(time, status) ~ X1 + X2, data = train_dat)
+When generating predictions, survfit
objects is returned for each data point being predicted. To get survival functions including median survival:
bag_preds <- predict(bag_mod, test_pred)
+bag_preds
+[[1]]
+Call: survfit(formula = Surv(agglsample[[j]], aggcens[[j]]) ~ 1)
+
+ n events median 0.95LCL 0.95UCL
+1112.00 679.00 7.58 7.29 7.91
+
+[[2]]
+Call: survfit(formula = Surv(agglsample[[j]], aggcens[[j]]) ~ 1)
+
+ n events median 0.95LCL 0.95UCL
+ 676.00 552.00 2.43 2.43 3.32
+# Use summary.survfit() to get survival function (pec would also work)
+# For the first sample
+summary(bag_preds[[1]], times = .times)
+Call: survfit(formula = Surv(agglsample[[j]], aggcens[[j]]) ~ 1)
+
+ time n.risk n.event survival std.err lower 95% CI upper 95% CI
+ 1 1083 27 0.9757 0.00462 0.9667 0.9848
+ 5 570 232 0.7369 0.01428 0.7094 0.7654
+ 10 16 404 0.0515 0.01177 0.0329 0.0806
+# Median survival
+library(purrr)
+map_dbl(bag_preds, ~ quantile(.x, probs = .5)$quantile)
+[1] 7.58 2.43
+party:::
ctree
library(party)
+ctree_mod <- ctree(Surv(time, status) ~ X1 + X2, data = train_dat)
+The basic invocation of predict
generates the predicted event time.
predict(ctree_mod, test_pred)
+[1] 7.58 3.32
+pec
has a wrapper for this model. Since party
uses S4 methods, this just puts the model inside of a wrapper S3 class:
pec_ctree_mod <- pecCtree(Surv(time, status) ~ X1 + X2, data = train_dat)
+all.equal(ctree_mod, pec_ctree_mod$ctree)
+[1] TRUE
+predictSurvProb(pec_ctree_mod, newdata = test_pred, times = .times)
+ [,1] [,2] [,3]
+[1,] 0.964 0.739 0.0612
+[2,] 0.947 0.302 0.3022
+party:::
cforest
library(party)
+
+set.seed(342)
+cforest_mod <-
+ cforest(Surv(time, status) ~ X1 + X2,
+ data = train_dat,
+ controls = cforest_unbiased(ntree = 100, mtry = 1))
+The basic invocation of predict
generates the predicted event time:
predict(cforest_mod, newdata = test_pred)
+ 1 2
+7.28 4.65
+Another S3 wrapper is available:
+set.seed(342)
+pec_cforest_mod <-
+ pecCforest(Surv(time, status) ~ X1 + X2,
+ data = train_dat,
+ controls = cforest_unbiased(ntree = 100, mtry = 1))
+
+all.equal(cforest_mod, pec_cforest_mod$forest)
+[1] TRUE
+predictSurvProb(pec_cforest_mod, newdata = test_pred, times = .times)
+ [,1] [,2] [,3]
+[1,] 0.963 0.673 0.0803
+[2,] 0.947 0.451 0.0472
+randomForestSRC:::
rfsrc
library(randomForestSRC)
+rfsrce_mod <- rfsrc(Surv(time, status) ~ X1 + X2, data = train_dat, ntree = 1000)
+The predict
function generates an object with classes "rfsrc"
, "predict"
, and "surv"
. It is unclear what is being predicted:
predict(rfsrce_mod, test_pred)$predicted
+[1] 56.6 143.8
+The survival
slot appears to be survival probabilities:
rfsrce_pred <- predict(rfsrce_mod, test_pred)
+round(rfsrce_pred$survival, 2)
+ [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17]
+[1,] 1.00 0.98 0.98 0.98 0.98 0.97 0.97 0.97 0.95 0.93 0.93 0.93 0.93 0.93 0.93 0.92 0.92
+[2,] 0.97 0.97 0.97 0.97 0.96 0.96 0.96 0.93 0.93 0.93 0.92 0.90 0.86 0.84 0.84 0.84 0.83
+ [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32]
+[1,] 0.92 0.92 0.92 0.92 0.92 0.92 0.92 0.92 0.92 0.92 0.92 0.90 0.9 0.90 0.88
+[2,] 0.80 0.77 0.75 0.75 0.74 0.72 0.71 0.69 0.68 0.65 0.62 0.62 0.6 0.57 0.57
+ [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47]
+[1,] 0.88 0.88 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.84 0.84 0.81 0.81 0.81 0.78
+[2,] 0.56 0.52 0.52 0.51 0.50 0.48 0.48 0.48 0.48 0.48 0.47 0.47 0.45 0.43 0.43
+ [,48] [,49] [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61] [,62]
+[1,] 0.78 0.78 0.78 0.74 0.74 0.74 0.74 0.74 0.74 0.74 0.74 0.73 0.73 0.73 0.73
+[2,] 0.42 0.40 0.38 0.38 0.38 0.36 0.32 0.31 0.29 0.29 0.29 0.29 0.28 0.28 0.26
+ [,63] [,64] [,65] [,66] [,67] [,68] [,69] [,70] [,71] [,72] [,73] [,74] [,75] [,76] [,77]
+[1,] 0.71 0.71 0.69 0.69 0.69 0.68 0.68 0.68 0.67 0.63 0.63 0.63 0.63 0.60 0.60
+[2,] 0.26 0.23 0.23 0.21 0.17 0.16 0.16 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.08
+ [,78] [,79] [,80] [,81] [,82] [,83] [,84] [,85] [,86] [,87] [,88] [,89] [,90] [,91] [,92]
+[1,] 0.60 0.57 0.56 0.53 0.50 0.50 0.50 0.48 0.43 0.42 0.39 0.39 0.34 0.33 0.30
+[2,] 0.06 0.06 0.05 0.05 0.05 0.03 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
+ [,93] [,94] [,95] [,96] [,97] [,98] [,99] [,100] [,101] [,102] [,103] [,104] [,105] [,106]
+[1,] 0.30 0.25 0.25 0.23 0.23 0.17 0.12 0.12 0.05 0.05 0.05 0.04 0.03 0.03
+[2,] 0.02 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
+ [,107]
+[1,] 0.03
+[2,] 0.01
+The columns appear to correspond to the “time of interest”:
+rfsrce_pred$time.interest
+ [1] 0.435 0.537 0.603 0.635 0.776 0.822 0.900 0.926 1.080 1.091 1.202 1.307 1.344
+ [14] 1.419 1.481 1.482 1.750 1.756 1.768 1.852 1.923 1.955 2.012 2.014 2.028 2.087
+ [27] 2.098 2.231 2.254 2.313 2.356 2.359 2.406 2.427 2.743 2.837 2.872 2.984 2.987
+ [40] 3.153 3.260 3.284 3.325 3.385 3.510 3.810 4.111 4.336 4.416 4.434 4.467 4.521
+ [53] 4.601 4.654 4.740 4.806 4.857 4.881 4.889 4.938 5.061 5.090 5.099 5.118 5.221
+ [66] 5.408 5.601 5.758 5.793 5.931 5.973 6.056 6.097 6.211 6.244 6.342 6.378 6.546
+ [79] 6.722 6.871 7.277 7.291 7.401 7.492 7.580 7.914 7.924 8.024 8.193 8.350 8.428
+ [92] 8.451 8.484 8.542 8.794 9.077 9.399 9.435 9.775 9.778 9.789 10.526 12.087 12.745
+[105] 15.407 17.477 18.077
+although these values do not bracket the observed times:
+summary(train_dat$time)
+ Min. 1st Qu. Median Mean 3rd Qu. Max.
+ 0.44 2.83 4.89 5.55 7.38 20.19
+In most cases, the open circle means that this type of prediction is facilitated indirectly. For example, using “tricks” related to survfit()
, methods in the pec
package, or other means.
+Package + | ++Function + | ++pred event time + | ++survival probs + | ++linear pred + | +
---|---|---|---|---|
+survival
+ |
+
+survreg
+ |
++✓ + | ++x + | ++✓ + | +
+flexsurv
+ |
+
+flexsurvreg
+ |
++✓ + | ++✓ + | ++✓ + | +
+casebase
+ |
+
+fitSmoothHazard
+ |
++x + | ++✓ + | ++✓ + | +
+survival
+ |
+
+coxph
+ |
++✓ + | ++✓ + | ++✓ + | +
+glmnet
+ |
+
+glmnet
+ |
++◯ + | ++◯ + | ++✓ + | +
+mboost
+ |
+
+*boost
+ |
++◯ + | ++◯ + | ++✓ + | +
+rpart
+ |
+
+rpart
+ |
++✓ + | ++◯ + | ++x + | +
+ipred
+ |
+
+bagging
+ |
++✓ + | ++◯ + | ++x + | +
+party
+ |
+
+ctree
+ |
++✓ + | ++◯ + | ++x + | +
+party
+ |
+
+cforest
+ |
++✓ + | ++◯ + | ++x + | +
+randomForestSRC
+ |
+
+rfsrc
+ |
++✓ + | ++◯ + | ++x + | +