Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coxph_forestplot #838

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

coxph_forestplot #838

wants to merge 21 commits into from

Conversation

aGuyLearning
Copy link
Collaborator

This pull request introduces a new plotting function for generating Cox Proportional Hazard model forest plots and includes the necessary tests.

image

image

@aGuyLearning aGuyLearning linked an issue Dec 6, 2024 that may be closed by this pull request
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@eroell eroell self-requested a review December 6, 2024 17:20
@github-actions github-actions bot added the chore label Dec 11, 2024
@eroell
Copy link
Collaborator

eroell commented Dec 11, 2024

Hm checking into the failed docs build

@eroell
Copy link
Collaborator

eroell commented Dec 11, 2024

How did you make the option to have this dashed vertical bar before? That seemed quite nice!

ehrapy/plot/_survival_analysis.py Outdated Show resolved Hide resolved
ehrapy/plot/_survival_analysis.py Outdated Show resolved Hide resolved
ehrapy/plot/_survival_analysis.py Outdated Show resolved Hide resolved
ehrapy/plot/_survival_analysis.py Outdated Show resolved Hide resolved
ehrapy/plot/_survival_analysis.py Outdated Show resolved Hide resolved
ehrapy/plot/_survival_analysis.py Show resolved Hide resolved
plot.axvline(1, color="gray", zorder=1)
lower_diff = data[auc_col] - data["coef lower 95%"]
upper_diff = data["coef upper 95%"] - data[auc_col]
plot.errorbar(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, seems that lifelines only draws the 95% errorbars and confidence intervals, did you also find no way to change this @aGuyLearning...?
If not, its something larger, and for now until someone requesting this, can keep like that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in our coxph wrapper, we don't let user set the confidence intervalls, so this is the default. can be checked after fitting in the .confidence_intervals_ dataframe of the coxphfitter object.

ehrapy/plot/_survival_analysis.py Show resolved Hide resolved
ehrapy/plot/_survival_analysis.py Show resolved Hide resolved
Co-authored-by: Eljas Roellin <65244425+eroell@users.noreply.github.com>
@eroell eroell added enhancement New feature or request and removed chore labels Dec 11, 2024
Co-authored-by: Eljas Roellin <65244425+eroell@users.noreply.github.com>
@github-actions github-actions bot added the chore label Dec 11, 2024
@eroell
Copy link
Collaborator

eroell commented Dec 11, 2024

And at the end flexing this feature in the survival notebook would be great.. :)

@eroell eroell closed this Dec 11, 2024
@eroell eroell reopened this Dec 11, 2024
@eroell
Copy link
Collaborator

eroell commented Dec 11, 2024

sorry, misclicked!

ehrapy/plot/_survival_analysis.py Outdated Show resolved Hide resolved
ehrapy/plot/_survival_analysis.py Show resolved Hide resolved


def coxph_forestplot(
coxph: CoxPHFitter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we actually redesign the interface and have this function take an AnnData object instead? We'd have to store the coxph results in the AnnData object. This would be much more in line with the rest of the ehrapy experience.

Also ping @eroell

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. Thanks. We should do this indeed - storing the results used for the plot in adata.uns I suppose

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also need a key_added argument then or something to ensure that we don't always overwrite results. But yeah this behavior must be consistent across all of the surv analysis methods

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the kmf plot takes a list of KaplanMeierFitter Objects, so i went with that. in the survival analysis plots, ols is the only one taking an anndata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be consistent about it and I think we should consider modifying all of them to take AnnData. It's what the rest of the API does

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the kmf plot takes a list of KaplanMeierFitter Objects, so i went with that. in the survival analysis plots, ols is the only one taking an anndata.

Yes, noticed where the idea here came from. Having .tl writing to AnnData, and .pl to read from it should be our go to. It has not been done like that everywhere yet unfortunately, and we strive for this consistency now :)

Copy link
Collaborator

@eroell eroell Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one way to go about this is to store the required table for plotting of the lifeline's fitter object in adata.uns[key_added] upon the .tl call;

.uns seems to me the only slot suitable for this table for the univariate functions such as kaplan_meier.

For cox_ph here, as the coefficients are on a per-variable level, I think adding them into the .var column would be kinda clean; basically adding into .var the columns {key_added}_{"coef"}, {key_added}_{"coef lower 95%"}, {key_added}_{"coef upper 95%"}, .... There could also be a boolean {key_added}_{cox_ph_variable} column, which the plotting function uses to pick only the variables used in the cox ph model

ehrapy/plot/_survival_analysis.py Outdated Show resolved Hide resolved
ehrapy/plot/_survival_analysis.py Outdated Show resolved Hide resolved
ehrapy/plot/_survival_analysis.py Outdated Show resolved Hide resolved
ehrapy/plot/_survival_analysis.py Outdated Show resolved Hide resolved
ehrapy/plot/_survival_analysis.py Outdated Show resolved Hide resolved
ehrapy/plot/_survival_analysis.py Outdated Show resolved Hide resolved
@@ -29,6 +29,12 @@ def rng():
return np.random.default_rng(seed=42)


@pytest.fixture
def mimic_2():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can your test not work with

@pytest.fixture
def mimic_2_10():
    mimic_2_10 = ep.dt.mimic_2()[:10]

    return mimic_2_10

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using the smaller set we get the follwoing error:

ConvergenceError: Convergence halted due to matrix inversion problems. Suspicion is high collinearity. Please see the following tips in the lifelines documentation: https://lifelines.readthe

so i took the whole mimic_2 set

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes sense. I think it's okay if we use the smaller subset and filter the warning here https://github.com/theislab/ehrapy/blob/main/pyproject.toml#L132

@eroell
Copy link
Collaborator

eroell commented Dec 11, 2024

Medcat no more a doc build dependency; asked about state of Python 3.12 CogStack/MedCAT#510

@aGuyLearning
Copy link
Collaborator Author

Waiting for #840 Update of Survivial functions. As we will be using an andata object from there on out

@Zethson
Copy link
Member

Zethson commented Dec 13, 2024

@aGuyLearning it's also okay with me if you merge this one first and then update it again with your follow up PR on making everything AnnData centric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Visualization of CoxPH models
3 participants