Implementing Predict Types #362

bblodfon · 2024-01-31T19:45:27Z

bblodfon
Jan 31, 2024
Maintainer

Predict Types

In mlr3, learners have predict_type and predict_types fields (see also book chapter).

learn = lrn("surv.coxph")
learn$predict_type
#> [1] "distr"
learn$predict_types
#> [1] "distr" "crank" "lp"

In mlr3proba, predict_type is ignored. All learners return as many predict types as possible. The four possibilities are lp, crank,
response, and distr:

lp - Linear Predictor. Also known as the ‘link’ or sometimes ‘prognostic index’ of a model. Many linear survival models can be represented as a Generalized Linear Model in the form $E[Y] = \beta*X$, where $\beta$ are the model coefficients and $X$ are the features. The linear predictor is given by $\hat{\beta} X$, where $\hat{\beta}$ is the fitted model coefficients.
crank - Continuous Ranking. Many survival models make a ‘relative risk’ prediction. These predictions consist of individual numerical values that assess the comparative risk of the event occurring among observations within a sample. For example a relative risk of $2$ for patient $i$, and $1$ for patient $j$, implies that patient $i$ is more likely to die. The exact value of the prediction does not matter, only the ranking.
response - Survival Time. Very few learners can make a prediction of the survival time as a single number (i.e. not a distribution).
distr - Survival Distribution. Optimally a survival learner will return a survival distribution. This is the probability distribution representing the random value, $T$, which is the time until the event occurs. Such distributions are implemented using distr6.

Examples

set.seed(5)
task = tgen("simsurv")$generate(5)

# lp
lrn("surv.coxph")$train(task)$predict(task)$lp
#> [1] -1.2133139  0.1874231  2.6211704 -0.8031067 -0.7921728

# crank
lrn("surv.rpart")$train(task)$predict(task)$crank
#> [1] 3.037492 3.039867 2.811582 2.952089 2.994166

# response
lrn("surv.svm")$train(task)$predict(task)$response
#> [1] 5.534422 2.458467 1.219538 4.765263 4.889596

# distr
lrn("surv.kaplan")$train(task)$predict(task)$distr
#> Matdist(5x5)

What to Return

We always try to map and return only the predict types that are supported natively by a survival learner. We have made a useful internal function .surv_return to perform this mapping at the of the .predict() method of every wrapped learner (see examples from mlr3extralearners here).

Note that the crank return type is not a ‘usual’ return type in any package and was created for mlr3proba. It is the default predict type for any survival learner (see further details below).

Implementing lp

Implementing the linear predictor in survival models is straightforward as most learners return this explictly. For example the CoxPH model:

fit <- survival::coxph(survival::Surv(eventtime, status) ~ ., task$data())
predict(fit,type="lp")
#> [1] -1.2133139  0.1874231  2.6211704 -0.8031067 -0.7921728

Different learners use different names to refer to the linear predictor. Below are some examples:

lp
link
lin.pred
risk
linear

risk should be checked carefully against the relevant documentation, as it could refer to lp, exp(lp) or a different risk altogether.

Implementing response

Example learners which return the (expected/estimated) survival time are surv.xgboost.aft and surv.aorsf. Usually, some knowledge of the field is required to return the survival time as it may not be explictly stated in the documentation or it can be composed from the survival distribution, see responsecompositor.

Implementing distr

distr implementation is more complex as it will depend on the individual learner. In most cases, it is a survival matrix with rows the (test) observations and columns the time points (usually of the train set). Using the params times and surv arguments of .surv_return() suffice to do the mapping in this case. It's interesting to see the internal representation of the distr prediction using distr6:

learn = lrn("surv.ranger")$train(task)
set.seed(1)
p = learn$predict(task)

distr = p$distr # 5 observations x 4 time points
#> Matdist(5x4)

# Extract the first obseration
adistr = distr[1]
#> WeightDisc(4)

# View the underlying cdf of the discrete distribution (values monotonically increase)
distr6::gprm(adistr, "cdf")
#> 1.44988110100337 1.60851588430415 3.64873527482865 4.65365417082794 
       0.1796301        0.3666297          0.5519274        0.7163932 

# Predict the survival of the chosen observation for t = 1,2,3:
> adistr$survival(c(1,2,3))
#> [1] 1.0000000 0.6333703 0.6333703

Implementing crank

The crank return type was created as many packages return some form of relative risk, or continous ranking (‘crank’), without making clear if this a well-defined value (e.g. the linear predictor) or some arbitrary value without meaning that is used solely as a rank. The crank return type captures all these possibilities by being simply defined as some continous rank used to represent a relative risk. A crank is always returned in mlr3proba, and there are four possible ways to derive it, the list below is in order of decreasing priority:

If a learner explicitly predicts a continuous rank, then crank is this prediction. An example of this can be seen in the surv.svm and the surv.rpart learner.
If a learner returns an lp, and no other ranking, then crank = lp.
If a learner returns a response, and no lp, then crank = response.
If a learner returns a distr and no response, then crank is calculated as the sum of the cumulative hazard function (expected mortality) derived from the predicted survival function (distr).

If a user prefers a different method to estimate the crank, they can use the crankcompositor.

An example where a survival model returns only a linear predictor (and crank == lp):

.predict = function(task) {
      newdata = task$data(cols = task$feature_names)
      lp = predict(self$model, type = "lp", newdata = newdata)
      .surv_return(lp = lp) # internally crank = lp
    }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing Predict Types #362

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Implementing Predict Types #362

bblodfon Jan 31, 2024 Maintainer

Predict Types

Examples

What to Return

Implementing lp

Implementing response

Implementing distr

Implementing crank

Replies: 0 comments

bblodfon
Jan 31, 2024
Maintainer