New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Adding support for survival analysis #543

Closed

ipa wants to merge 8 commits into bambinos:main from ipa:survival

ipa commented Jul 6, 2022

This PR adds support for survival analysis using the Exponential distribution. It adds support for the exponential family and for censored data.

There is a notebook with examples using R's veteran dataset.

Issue: #541
Depends on: bambinos/formulae/issues/79

ipa and others added 7 commits

July 1, 2022 15:57


          initial model with exponential survival model

e8fb3a7


          tests for exponential model with comparison with R

992cb97


          fixed exponential lccdf for censored cases

3be4331


          fixed test with intercept

ef58176


          reformatting the code

3c1e5d5


          notebook with examples for survival analysis

bc61dc5


          Merge branch 'bambinos:main' into survival

540fe88

review-notebook-app bot commented Jul 6, 2022

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Collaborator

canyon289 commented Jul 7, 2022

I want to say this is one of the best PRs ive ever seen in open source. Between the well formatted and readable code, tests, and docs I dont even know what more to ask for. Thanks so much for creating this

Collaborator

tomicapretto commented Jul 8, 2022

@ipa thanks a lot for this fantastic PR!

As I mentioned in the formulae PR it's not needed to add a new transformation in there (unless we need to treat involved variables in a very special way). You can just add a surv function into the extra_namespace dictionary that's defined here

bambi/bambi/utils.py

Line 78 in 236021f

extra_namespace = {"c": c}

that namespace is made available when formulas are parsed:

bambi/bambi/models.py

Line 147 in 236021f

self._design = design_matrices(formula, data, na_action, 1, extra_namespace)

I'm adding more comments as I review the code :)

tomicapretto reviewed

View reviewed changes

docs/notebooks/survival_model.ipynb

    
            @@ -0,0 +1,1587 @@
          
              {

Collaborator

tomicapretto Jul 8, 2022

It would be good to add something telling how Surv works in the model formula.

Reply via ReviewNB

docs/notebooks/survival_model.ipynb

    
            @@ -0,0 +1,1587 @@
          
              {

Collaborator

tomicapretto Jul 8, 2022

Line #1.    lam = 1/np.exp(az.summary(idata)["mean"]["Intercept"])

az.summary() is being called multiple times. We could call it once and store the result in a variable such as idata_summary

Reply via ReviewNB

docs/notebooks/survival_model.ipynb

    
            @@ -0,0 +1,1587 @@
          
              {

Collaborator

tomicapretto Jul 8, 2022

Line #6.    S0 = sp.stats.expon.sf

Could you add a little about the sf function? From SciPy docs it says it's the survival function.

Reply via ReviewNB

docs/notebooks/survival_model.ipynb

    
            @@ -0,0 +1,1587 @@
          
              {

Collaborator

tomicapretto Jul 8, 2022

Line #2.        "Surv(time,status) ~ 1 + B(trt_str, success='Test') + T(celltype_str, ref='Squamous')",

I really like you're using B() and T() in the model formula. Again, one sentence saying what they do would be great. I could help here if you want since I was the author of these shortcuts.

Reply via ReviewNB

docs/notebooks/survival_model.ipynb

    
            @@ -0,0 +1,1587 @@
          
              {

Collaborator

tomicapretto Jul 8, 2022

This is a fantastic chart. Looks very nice! Do you think we could think about a way to make it work with plot_cap? https://github.com//pull/517

That implies some modifications to the predict function. I'm not 100% sure if this is possible or not. But it's worth considering it since it would make things much easier.

Reply via ReviewNB

Author

ipa Jul 9, 2022

I'll check this out and see if it's possible.

Collaborator

tomicapretto commented Jul 8, 2022

I've been reading a little about the documentation of the Surv function in the survival R package. By default, it results in right-censored data.

> library(survival)
> x <- Surv(veteran$time, veteran$status)
> attributes(x)
$dim
[1] 137   2

$dimnames
$dimnames[[1]]
NULL

$dimnames[[2]]
[1] "time"   "status"


$type
[1] "right"

$class
[1] "Surv"

As implemented in this PR, using the exponential family will always try to result in a survival model, but this shouldn't be the case. It should result in a survival model only when Surv is in the LHS of the model formula (I prefer lowercase surv or even other alternatives such as censored(y, status)). This requires more thinking of course.

Author

ipa commented Jul 9, 2022

@tomicapretto thanks for the great review and comments.

I was not aware of the extra_namespace but it makes perfect sense to add it there. I'll change that and also implement that a survival model is only results when Surv is in the LHS. We can use censored as you suggested to keep a more general terminology.

I will also work on the example notebook and add some extra information.

Collaborator

tomicapretto commented Jul 10, 2022

@ipa feel free to ping me if you want help with something or have a question :)

tomicapretto mentioned this pull request

Design how we're going to extend Bambi #544

Open

Author

ipa commented Jul 11, 2022

@tomicapretto An easy but not really nice way to implement this is to check for if "censored" in self.term.name: while building the response or something similar.

After thinking about this a bit more I think it would be cleaner if we implement it similar to Proportion in the formulae package (introducing a Censored type). This would implement it again in the formulae package though. However, we could then use something like this while building the response:

if isinstance(self.term.data, Censored):
    __build_censored_response_distribution(...)

I tried to do this via extra_namespace but then the formulae package throws errors because the type is neither numeric nor any of the other supported types like Proportion, CategoricalBox etc.

donaldbraman commented Aug 21, 2022

Just want to say how excited I am to see such rapid progress on this -- it will be a great addition, and will simplify a lot of work for folks (like me!) running survival analyses. If I can do anything besides cheer from the sidelines, just let me know. :-)


          added more info to notebook, survival model only when censored respon…

9014c9c

…se variable

don-jil commented Sep 20, 2022

Happy to help test and or document. Very much appreciate all the work going into this. :-)

tomicapretto mentioned this pull request

censored() proposal #577

Closed

donaldbraman commented Sep 22, 2023

So, just reviewing the various related issues, it looks like we need a way to let formulae know whether and how the data is left censored, right censored, or both. Is that the main barrier to implementation?

Collaborator

tomicapretto commented Sep 22, 2023

@donaldbraman this is already implemented and it's possible to work with survival models in Bambi since #697 :)
I think I commented this in some issue but somehow it missed this.

You can have a look at this test

bambi/tests/test_built_models.py

Lines 1002 to 1049 in 169564f

    
           def test_censored_response(): 
        
               data = bmb.load_data("kidney") 
        
               data["status"] = np.where(data["censored"] == 0, "none", "right") 
        
               # Model 1, with intercept 
        
               priors = { 
        
                   "Intercept": bmb.Prior("Normal", mu=0, sigma=1), 
        
                   "sex": bmb.Prior("Normal", mu=0, sigma=2), 
        
                   "age": bmb.Prior("Normal", mu=0, sigma=1), 
        
                   "alpha": bmb.Prior("Gamma", alpha=3, beta=5), 
        
               } 
        
               model = bmb.Model( 
        
                   "censored(time, status) ~ 1 + sex + age", data, family="weibull", link="log", priors=priors 
        
               ) 
        
               idata = model.fit(tune=100, draws=100, random_seed=121195) 
        
               model.predict(idata, kind="pps") 
        
               model.predict(idata, data=data, kind="pps") 
        
               # Model 2, without intercept 
        
               priors = { 
        
                   "sex": bmb.Prior("Normal", mu=0, sigma=2), 
        
                   "age": bmb.Prior("Normal", mu=0, sigma=1), 
        
                   "alpha": bmb.Prior("Gamma", alpha=3, beta=5), 
        
               } 
        
               model = bmb.Model( 
        
                   "censored(time, status) ~ 0 + sex + age", data, family="weibull", link="log", priors=priors 
        
               ) 
        
               idata = model.fit(tune=100, draws=100, random_seed=121195) 
        
               model.predict(idata, kind="pps") 
        
               model.predict(idata, data=data, kind="pps") 
        
               # Model 3, with group-specific effects 
        
               priors = { 
        
                   "alpha": bmb.Prior("Gamma", alpha=3, beta=5), 
        
                   "sex": bmb.Prior("Normal", mu=0, sigma=1), 
        
                   "age": bmb.Prior("Normal", mu=0, sigma=1), 
        
                   "1|patient": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("InverseGamma", alpha=5, beta=10)), 
        
               } 
        
               model = bmb.Model( 
        
                   "censored(time, status) ~ 1 + sex + age + (1|patient)", 
        
                   data, 
        
                   family="weibull", 
        
                   link="log", 
        
                   priors=priors, 
        
               ) 
        
               idata = model.fit(tune=100, draws=100, random_seed=121195) 
        
               model.predict(idata, kind="pps") 
        
               model.predict(idata, data=data, kind="pps")

donaldbraman commented Sep 22, 2023

Amazing -- you just made my day! And thank you again for all your work on this.

@donaldbraman this is already implemented and it's possible to work with survival models in Bambi since #697 :) I think I commented this in some issue but somehow it missed this.

You can have a look at this test

bambi/tests/test_built_models.py

Lines 1002 to 1049 in 169564f

def test_censored_response():

data = bmb.load_data("kidney")

data["status"] = np.where(data["censored"] == 0, "none", "right")

# Model 1, with intercept

priors = {

"Intercept": bmb.Prior("Normal", mu=0, sigma=1),

"sex": bmb.Prior("Normal", mu=0, sigma=2),

"age": bmb.Prior("Normal", mu=0, sigma=1),

"alpha": bmb.Prior("Gamma", alpha=3, beta=5),

}

model = bmb.Model(

"censored(time, status) ~ 1 + sex + age", data, family="weibull", link="log", priors=priors

)

idata = model.fit(tune=100, draws=100, random_seed=121195)

model.predict(idata, kind="pps")

model.predict(idata, data=data, kind="pps")

# Model 2, without intercept

priors = {

"sex": bmb.Prior("Normal", mu=0, sigma=2),

"age": bmb.Prior("Normal", mu=0, sigma=1),

"alpha": bmb.Prior("Gamma", alpha=3, beta=5),

}

model = bmb.Model(

"censored(time, status) ~ 0 + sex + age", data, family="weibull", link="log", priors=priors

)

idata = model.fit(tune=100, draws=100, random_seed=121195)

model.predict(idata, kind="pps")

model.predict(idata, data=data, kind="pps")

# Model 3, with group-specific effects

priors = {

"alpha": bmb.Prior("Gamma", alpha=3, beta=5),

"sex": bmb.Prior("Normal", mu=0, sigma=1),

"age": bmb.Prior("Normal", mu=0, sigma=1),

"1|patient": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("InverseGamma", alpha=5, beta=10)),

}

model = bmb.Model(

"censored(time, status) ~ 1 + sex + age + (1|patient)",

data,

family="weibull",

link="log",

priors=priors,

)

idata = model.fit(tune=100, draws=100, random_seed=121195)

model.predict(idata, kind="pps")

model.predict(idata, data=data, kind="pps")

Collaborator

tomicapretto commented Sep 26, 2023

I'm closing this PR because it's already implemented (see this previous comment #543 (comment))

tomicapretto closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet