Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Support for Survival/Time-to-Event Data #1285

Open
lict99 opened this issue Nov 24, 2024 · 5 comments
Open

Question about Support for Survival/Time-to-Event Data #1285

lict99 opened this issue Nov 24, 2024 · 5 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@lict99
Copy link

lict99 commented Nov 24, 2024

I am writing to express my appreciation for the excellent work on the package, which has greatly facilitated causal inference in Python. As a user of the package, I have been able to successfully apply it to various datasets and problems.

However, I was wondering if it would be possible to extend DoWhy's capabilities to support survival or time-to-event data? Currently, the package appears to focus on traditional outcomes such as binary, continuous, or count responses. Time-to-event data is a common outcome type in many fields (e.g., medicine, economics, sociology), and I believe that supporting this would greatly enhance the utility of DoWhy.

I understand that adding new features can be a significant undertaking, but I was hoping to get some insight into whether there are any plans to support survival analysis or if you could recommend alternative packages or methods for causal inference with time-to-event data. Any advice or resources you could share would be greatly appreciated.

Thank you again for your hard work on the package.

@lict99 lict99 added the question Further information is requested label Nov 24, 2024
@amit-sharma
Copy link
Member

Can you provide a motivating example or dataset on which you'd like to run DoWhy?

Supporting new kinds of data is significant work. So we can try to do this step-by-step: first, let's understand a popular, high impact scenario where we can extend DoWhy, and then later we can support survival analysis fully.

@amit-sharma amit-sharma added the enhancement New feature or request label Nov 24, 2024
@lict99
Copy link
Author

lict99 commented Nov 25, 2024

Survival data typically comprises two key components: time (the duration from the start of an observation period to either an event occurrence, study end, loss of contact, or withdrawal) and status (indicating whether an event has occurred or if censoring has taken place). I've found several popular datasets on Kaggle datasets. Specifically:

  1. The Breast Cancer Survival Dataset contains a clear distinction between the patient's status (Patient_Status column) and time (interval between Date_of_Surgery and Date_of_Last_Visit). Other variables within this dataset can be used as potential predictors.image
  2. The Cirrhosis Patient Survival Prediction dataset features status (Status column) and time (N_Days column), with other variables available for use in predictive modeling.image

Additionally, I've found a helpful introduction to survival analysis on the wiki, which provides a solid starting point for understanding this topic.

Thank you for your attention to this matter.😊

@lict99 lict99 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 25, 2024
@lict99 lict99 reopened this Nov 25, 2024
Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Dec 26, 2024
@samblechman
Copy link

Adding an additional request here for this functionality. I do understand this would be a significant amount of work, but agree that is would be extremely useful for many applications (e.g., medical).

For example, oftentimes the outcome of interest is 30-day mortality after treatment. Patients who died anytime after 30 days or never died are "right-censored" and to understand the effect of treatment or covariates on 30-day mortality, the survival time of right-censored patients is imputed as 30 days. However, without a test that considers right-censoring, imputing survival time as 30 days would affect treatment effect estimate.

@emrekiciman
Copy link
Member

emrekiciman commented Jan 6, 2025

Hey folks, I wanted to add a link to this discussion of survival analysis in the discord: https://discord.com/channels/818456847551168542/818456856137170996/1221611463823720588

Notably, Paidamoyo Chapfuwa has published a counterfactual survival analysis notebook that could be integrated into PyWhy and extended with its identification algorithms and/or CATE estimators, etc. She was looking for someone who might push the integration forward. Would make a "good first project" for a person interested in getting more involved.

https://github.com/paidamoyo/counterfactual_survival_analysis

@github-actions github-actions bot removed the stale label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants