You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In mlr3proba, predict_type is ignored. All learners return as many predict types as possible. The four possibilities are lp, crank, response, and distr:
lp - Linear Predictor. Also known as the ‘link’ or sometimes ‘prognostic index’ of a model. Many linear survival models can be represented as a Generalized Linear Model in the form $E[Y] = \beta*X$, where $\beta$ are the model coefficients and $X$ are the features. The linear predictor is given by $\hat{\beta} X$, where $\hat{\beta}$ is the fitted model coefficients.
crank - Continuous Ranking. Many survival models make a ‘relative risk’ prediction. These predictions consist of individual numerical values that assess the comparative risk of the event occurring among observations within a sample. For example a relative risk of $2$ for patient $i$, and $1$ for patient $j$, implies that patient $i$ is more likely to die. The exact value of the prediction does not matter, only the ranking.
response - Survival Time. Very few learners can make a prediction of the survival time as a single number (i.e. not a distribution).
distr - Survival Distribution. Optimally a survival learner will return a survival distribution. This is the probability distribution representing the random value, $T$, which is the time until the event occurs. Such distributions are implemented using distr6.
We always try to map and return only the predict types that are supported natively by a survival learner. We have made a useful internal function .surv_return to perform this mapping at the of the .predict() method of every wrapped learner (see examples from mlr3extralearnershere).
Note that the crank return type is not a ‘usual’ return type in any package and was created for mlr3proba. It is the default predict type for any survival learner (see further details below).
Implementing lp
Implementing the linear predictor in survival models is straightforward as most learners return this explictly. For example the CoxPH model:
Different learners use different names to refer to the linear predictor. Below are some examples:
lp
link
lin.pred
risk
linear
risk should be checked carefully against the relevant documentation, as it could refer to lp, exp(lp) or a different risk altogether.
Implementing response
Example learners which return the (expected/estimated) survival time are surv.xgboost.aft and surv.aorsf. Usually, some knowledge of the field is required to return the survival time as it may not be explictly stated in the documentation or it can be composed from the survival distribution, see responsecompositor.
Implementing distr
distr implementation is more complex as it will depend on the individual learner. In most cases, it is a survival matrix with rows the (test) observations and columns the time points (usually of the train set). Using the params times and surv arguments of .surv_return() suffice to do the mapping in this case. It's interesting to see the internal representation of the distr prediction using distr6:
learn= lrn("surv.ranger")$train(task)
set.seed(1)
p=learn$predict(task)
distr=p$distr# 5 observations x 4 time points#> Matdist(5x4)# Extract the first obserationadistr=distr[1]
#> WeightDisc(4)# View the underlying cdf of the discrete distribution (values monotonically increase)distr6::gprm(adistr, "cdf")
#> 1.44988110100337 1.60851588430415 3.64873527482865 4.65365417082794 0.17963010.36662970.55192740.7163932# Predict the survival of the chosen observation for t = 1,2,3:>adistr$survival(c(1,2,3))
#> [1] 1.0000000 0.6333703 0.6333703
Implementing crank
The crank return type was created as many packages return some form of relative risk, or continous ranking (‘crank’), without making clear if this a well-defined value (e.g. the linear predictor) or some arbitrary value without meaning that is used solely as a rank. The crank return type captures all these possibilities by being simply defined as some continous rank used to represent a relative risk. A crank is always returned in mlr3proba, and there are four possible ways to derive it, the list below is in order of decreasing priority:
If a learner explicitly predicts a continuous rank, then crank is this prediction. An example of this can be seen in the surv.svm and the surv.rpart learner.
If a learner returns an lp, and no other ranking, then crank = lp.
If a learner returns a response, and no lp, then crank = response.
If a learner returns a distr and no response, then crank is calculated as the sum of the cumulative hazard function (expected mortality) derived from the predicted survival function (distr).
If a user prefers a different method to estimate the crank, they can use the crankcompositor.
An example where a survival model returns only a linear predictor (and crank == lp):
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Predict Types
In
mlr3
, learners havepredict_type
andpredict_types
fields (see also book chapter).In
mlr3proba
,predict_type
is ignored. All learners return as many predict types as possible. The four possibilities arelp
,crank
,response
, anddistr
:lp
- Linear Predictor. Also known as the ‘link’ or sometimes ‘prognostic index’ of a model. Many linear survival models can be represented as a Generalized Linear Model in the formcrank
- Continuous Ranking. Many survival models make a ‘relative risk’ prediction. These predictions consist of individual numerical values that assess the comparative risk of the event occurring among observations within a sample. For example a relative risk ofresponse
- Survival Time. Very few learners can make a prediction of the survival time as a single number (i.e. not a distribution).distr
- Survival Distribution. Optimally a survival learner will return a survival distribution. This is the probability distribution representing the random value,Examples
What to Return
We always try to map and return only the predict types that are supported natively by a survival learner. We have made a useful internal function .surv_return to perform this mapping at the of the
.predict()
method of every wrapped learner (see examples frommlr3extralearners
here).Note that the
crank
return type is not a ‘usual’ return type in any package and was created formlr3proba
. It is the default predict type for any survival learner (see further details below).Implementing lp
Implementing the linear predictor in survival models is straightforward as most learners return this explictly. For example the CoxPH model:
Different learners use different names to refer to the linear predictor. Below are some examples:
lp
link
lin.pred
risk
linear
risk
should be checked carefully against the relevant documentation, as it could refer tolp
,exp(lp)
or a different risk altogether.Implementing response
Example learners which return the (expected/estimated) survival time are
surv.xgboost.aft
andsurv.aorsf
. Usually, some knowledge of the field is required to return the survival time as it may not be explictly stated in the documentation or it can be composed from the survival distribution, see responsecompositor.Implementing distr
distr
implementation is more complex as it will depend on the individual learner. In most cases, it is a survival matrix with rows the (test) observations and columns the time points (usually of the train set). Using the paramstimes
andsurv
arguments of.surv_return()
suffice to do the mapping in this case. It's interesting to see the internal representation of thedistr
prediction using distr6:Implementing crank
The
crank
return type was created as many packages return some form of relative risk, or continous ranking (‘crank’), without making clear if this a well-defined value (e.g. the linear predictor) or some arbitrary value without meaning that is used solely as a rank. Thecrank
return type captures all these possibilities by being simply defined as some continous rank used to represent a relative risk. Acrank
is always returned inmlr3proba
, and there are four possible ways to derive it, the list below is in order of decreasing priority:crank
is this prediction. An example of this can be seen in thesurv.svm
and thesurv.rpart
learner.lp
, and no other ranking, thencrank = lp
.response
, and nolp
, thencrank = response
.distr
and noresponse
, thencrank
is calculated as the sum of the cumulative hazard function (expected mortality) derived from the predicted survival function (distr
).If a user prefers a different method to estimate the
crank
, they can use the crankcompositor.An example where a survival model returns only a linear predictor (and
crank == lp
):Beta Was this translation helpful? Give feedback.
All reactions