-
-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for survival analysis #543
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
I want to say this is one of the best PRs ive ever seen in open source. Between the well formatted and readable code, tests, and docs I dont even know what more to ask for. Thanks so much for creating this |
@ipa thanks a lot for this fantastic PR! As I mentioned in the formulae PR it's not needed to add a new transformation in there (unless we need to treat involved variables in a very special way). You can just add a Line 78 in 236021f
that namespace is made available when formulas are parsed: Line 147 in 236021f
I'm adding more comments as I review the code :) |
@@ -0,0 +1,1587 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,1587 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #1. lam = 1/np.exp(az.summary(idata)["mean"]["Intercept"])
az.summary()
is being called multiple times. We could call it once and store the result in a variable such as idata_summary
Reply via ReviewNB
@@ -0,0 +1,1587 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #6. S0 = sp.stats.expon.sf
Could you add a little about the sf
function? From SciPy docs it says it's the survival function.
Reply via ReviewNB
@@ -0,0 +1,1587 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #2. "Surv(time,status) ~ 1 + B(trt_str, success='Test') + T(celltype_str, ref='Squamous')",
I really like you're using B()
and T()
in the model formula. Again, one sentence saying what they do would be great. I could help here if you want since I was the author of these shortcuts.
Reply via ReviewNB
@@ -0,0 +1,1587 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fantastic chart. Looks very nice! Do you think we could think about a way to make it work with plot_cap? https://github.com//pull/517
That implies some modifications to the predict function. I'm not 100% sure if this is possible or not. But it's worth considering it since it would make things much easier.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check this out and see if it's possible.
I've been reading a little about the documentation of the > library(survival)
> x <- Surv(veteran$time, veteran$status)
> attributes(x)
$dim
[1] 137 2
$dimnames
$dimnames[[1]]
NULL
$dimnames[[2]]
[1] "time" "status"
$type
[1] "right"
$class
[1] "Surv" As implemented in this PR, using the |
@tomicapretto thanks for the great review and comments. I was not aware of the I will also work on the example notebook and add some extra information. |
@ipa feel free to ping me if you want help with something or have a question :) |
@tomicapretto An easy but not really nice way to implement this is to check for After thinking about this a bit more I think it would be cleaner if we implement it similar to
I tried to do this via |
Just want to say how excited I am to see such rapid progress on this -- it will be a great addition, and will simplify a lot of work for folks (like me!) running survival analyses. If I can do anything besides cheer from the sidelines, just let me know. :-) |
Happy to help test and or document. Very much appreciate all the work going into this. :-) |
So, just reviewing the various related issues, it looks like we need a way to let formulae know whether and how the data is left censored, right censored, or both. Is that the main barrier to implementation? |
@donaldbraman this is already implemented and it's possible to work with survival models in Bambi since #697 :) You can have a look at this test bambi/tests/test_built_models.py Lines 1002 to 1049 in 169564f
|
Amazing -- you just made my day! And thank you again for all your work on this.
|
I'm closing this PR because it's already implemented (see this previous comment #543 (comment)) |
This PR adds support for survival analysis using the Exponential distribution. It adds support for the exponential family and for censored data.
There is a notebook with examples using R's veteran dataset.
Issue: #541
Depends on: bambinos/formulae/issues/79