You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Okay so here are my thoughts on what plumbing post-processing for survival analysis would need.
Basic assumptions
We might tailor both predictions of survival time and predictions of survival probability.
If we tailor survival probability, we will do so at a specific single time point (hello eval_time our old friend).
I have not yet validated either assumption. Max, do you have a sense of whether they are valid?
For the implementation, this implies
How/where do we specify the predictions?
How/where do we specify eval_time?
Specifying the predictions
tailor() could take survival time .pred_time via the estimate argument -- the documentation already suggest that.
tailor() could take survival probability in the tibble form of .pred (containing .eval_time and .pred_survival) via the probabilities argument.
Specifying eval_time
[tailor] tailor() would drop time (the documentation for that as "the predicted event time" contradicts the documentation for estimate listing .pred_time) and instead take eval_time. The alternative would be to derive it from .pred within a given postprocessing operation and default to the first value if we need a single one. Either way, time would disappear.
[workflows] We currently can pass eval_time to predict() and augment() methods for workflows but since we currently don't need it earlier, there are not arguments for eval time to the fit() method for workflows or the specification via workflow().
If the post-processing operation doesn't need estimation, only predict.workflow() having an eval_time argument should be fine.
If it does need estimation, it could come from the tailor() specification.
[tune] The tuning functions have an eval_time argument which is required for dynamic and integrated survival metrics. If we need a single eval time to optimize for, we use the first one.
If a user includes a tailor in the workflow to be tuned, it would be nice to not make them specify eval_time twice: when making the tailor() and when calling the tuning function.
We could pass it through workflows (and make it gain an argument) or make a function to update the tailor. I lean towards updating the tailor.
More thoughts from Hannah in a follow-up comment on the same PR:
After chatting with Max:
Max agrees with the basic assumptions except for the single eval time point. He thinks we might want to calibrate/post-process at multiple time points. I agree we might want to do that in general, just not sure about whether or not that is specified/done in one calibration operation (if it requires multiple calibration models to be fitted). Either way, it doesn't change where we need eval time values, just how many. How many is a decision we can make later on.
In light of Simon's and my thoughts on specifying the information for the data split needed to fit a workflow with a post-processor that needs fitting on a separate dataset, I've considered adding eval_time there (in add_tailor()) but I think specifying eval_time in tailor() directly is still the right move: it's needed for fitting the post-processor so can't only be in a workflow.
Max and I agree we should have an idea of what infrastructure we'd need for post-processing for survival but not include any placeholder arguments at this point. Hence We can remove the time argument in tailors tailor#16
From Hannah in the linked discussion:
Okay so here are my thoughts on what plumbing post-processing for survival analysis would need.
Basic assumptions
eval_time
our old friend).I have not yet validated either assumption. Max, do you have a sense of whether they are valid?
For the implementation, this implies
eval_time
?Specifying the predictions
tailor()
could take survival time.pred_time
via theestimate
argument -- the documentation already suggest that.tailor()
could take survival probability in the tibble form of.pred
(containing.eval_time
and.pred_survival
) via theprobabilities
argument.Specifying
eval_time
tailor()
would droptime
(the documentation for that as "the predicted event time" contradicts the documentation forestimate
listing.pred_time
) and instead takeeval_time
. The alternative would be to derive it from.pred
within a given postprocessingoperation
and default to the first value if we need a single one. Either way,time
would disappear.eval_time
topredict()
andaugment()
methods for workflows but since we currently don't need it earlier, there are not arguments for eval time to thefit()
method for workflows or the specification viaworkflow()
.predict.workflow()
having aneval_time
argument should be fine.tailor()
specification.eval_time
argument which is required for dynamic and integrated survival metrics. If we need a single eval time to optimize for, we use the first one.eval_time
twice: when making thetailor()
and when calling the tuning function.Originally posted by @hfrick in #225 (review)
The text was updated successfully, but these errors were encountered: