Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p_max parameter for C-Index #383

Closed
vlegoff opened this issue May 14, 2024 · 3 comments
Closed

p_max parameter for C-Index #383

vlegoff opened this issue May 14, 2024 · 3 comments

Comments

@vlegoff
Copy link

vlegoff commented May 14, 2024

Hello mlr3proba team,

In Uno's article about the C-Index, he mentions truncating the C-Index with a prespecified τ:

where τ is a prespecified time point such that pr(D > τ) > 0

the following being the justification for this:

the tail part of the estimated survival function of T is rather unstable

This is possible in the actual implementation of the C-index through the cutoff parameter, but when working with multiple datasets (e.g. in a benchmark), it would be interesting to use a censoring proportion p_max, in the same way as with the Graf score.

Reference
Uno, H., Cai, T., Pencina, M. J., D'Agostino, R. B., & Wei, L. J. (2011). On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in medicine, 30(10), 1105–1117. https://doi.org/10.1002/sim.4154

@bblodfon
Copy link
Collaborator

Thanks, I will have a look at the PR

@bblodfon
Copy link
Collaborator

bblodfon commented May 15, 2024

@vlegoff I refined the PR and merged it to the main branch, let me know if anything goes super wrong! cutoff arg is now t_max.

I think it would be interesting to use all data (train and test) to estimate the censoring distribution used for weighting, in the same way as in the Graf score & with the same justification.

We never use both train and test data for G(t) (not even in graf). But certainly we use all of training data first, before applying the t_max cutoff. See these lines where the estimation happens and later we give the t_max to the C function which filters observations pretty much

@vlegoff
Copy link
Author

vlegoff commented May 16, 2024

We never use both train and test data for G(t) (not even in graf)

Yes, I misread the doc for the Graf score, thanks for point it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants