adding to knowledgebase on SCM #537

NathanielF · 2025-10-23T06:10:23Z

Adding a notebook to the knowledge base on structural causal modelling with Bayesian models.

I think it will be useful to highlight what is unique and special about bayesian causal modelling and lean into the structural causal modelling perspective. This would distinguish CausalPy more as a Bayesian package with a unique selling point. In the notebook i demonstrate joint causal models of outcome and treatment and will discuss links and contrast with the potential outcome framework.

Along the way i demonstrate the usefulness of spike and slab priors for improving identification with a species of joint causal model for both continuous and binary treatments. These could be added as distinct models or just as options for priors in existing models. I'm not entirely sure where to take this modelling and would like to discuss it when the draft is more complete and ready for review.

Questions for discussion.

Structural causal Modelling is present in CausalPy via the IV modelling implementation. But i wonder if we want to make the implementation more robust following some the details in this notebook e.g. variable selection priors or the explicit modelling of rho
CATE inference with BART treatment models are shown here
General thoughts about SCM and the link between potential outcomes and the way we gloss the relationship between these topics in CausalPy.

📚 Documentation preview 📚: https://causalpy--537.org.readthedocs.build/en/537/

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

review-notebook-app · 2025-10-23T06:10:28Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

codecov · 2025-10-23T06:18:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.59%. Comparing base (9a88b01) to head (eb05301).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #537      +/-   ##
==========================================
+ Coverage   95.48%   95.59%   +0.11%     
==========================================
  Files          29       29              
  Lines        2615     2681      +66     
==========================================
+ Hits         2497     2563      +66     
  Misses        118      118

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

drbenvincent

Will review the knowledge base article now. But can we just check correctness of pymc-bart (with hypen) to environment.yml and the import being import pymc_bart as pmb.

I'm actually thinking that if this is only used in docs, and because the docs are not built from scratch remotely, maybe we don't need this update to environment.yml?

drbenvincent

First-pass review. I like it. Here's a mix of high and low level thoughts at this point...

I'm wondering if we want to use the language of 'model selection' a bit more, especially in the introduction. Would it also be useful and instructive to discuss Savage-Dickey where we have a 'super model' and certain parameters (e.g. being zero) structurally alter the model so that we are effectively exploring different models via the parameter values.

I'd maybe add a legend under the diggity image and take some time to relate that back to the data simulation code. Basically use it as a breather/pause to allow the reader to keep up with the details. [Not sure if we can do a legend, so feel free to ignore the formatting point, but I think some more explanation would help readers.]

We can maybe add the hide-input cell tag on cells that are exclusively for plot generation? That's just a personal preference for docs.

Pedantic and python stylistic point, but fit_models could be made cleaner with a dictionary comprehension.

Can we get references in there for each of the prior types. Academic papers and/or useful blogposts. Maybe a brief description of each for those not familiar.

Not sure "reduced-form" is defined anywhere?

Can you be more specific with "schizphrenic posterior distribution"

Sorry if this is annoying, but can you add a glossary term (in glossary.rst) for "probabilistic programming"

Something iffy going on with formatting here: "Y_bin ~ T_bin + feature_0 + feature_1 + feature_2 + feature_3 + feature_4 + feature_5 + feature_6 + feature_7 +feature_8",

If there are any book chapters in Bayesian stats books which touch on this topic, it would be good to add them in as references.

Link or reference for the credibility revolution?

I think there's scope for a summary directed at practitioners. Are you advocating for the general practice of using these kinds of priors for sparsifying model structure? If so, are there concrete situations where that can be used, and how could a skeptic be convinced of it's utility above and beyond this being a form of basic feature selection when you throw all your variables into the causal soup?

Typos

empircal

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

NathanielF · 2025-10-28T17:38:10Z

Thanks @drbenvincent all fair points. Will review each.

NathanielF · 2025-10-28T17:40:51Z

On the BART stuff, yeah if the PR is just this knowledge base article. Maybe not needed in environment... but if we wanted to add CATE functionality it might be a good default for flexible approximation with a bayesian flavour

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

NathanielF · 2025-10-29T20:33:25Z

I'm wondering if we want to use the language of 'model selection' a bit more, especially in the introduction. Would it also be useful and instructive to discuss Savage-Dickey where we have a 'super model' and certain parameters (e.g. being zero) structurally alter the model so that we are effectively exploring different models via the parameter values.

I've added model comparison assessments via the LOO metric, but on the performance scores the models aren't really distinguished much. I prefer the focus on the model comparison as a sensitivity check so i've stated that here too. But i've added Savage Dickey plots to the NHEFS example comparing against the null of the OLS estimate.

I'd maybe add a legend under the diggity image and take some time to relate that back to the data simulation code. Basically use it as a breather/pause to allow the reader to keep up with the details. [Not sure if we can do a legend, so feel free to ignore the formatting point, but I think some more explanation would help readers.]

I've added an explanation.

We can maybe add the hide-input cell tag on cells that are exclusively for plot generation? That's just a personal preference for docs.

Hidden most if not all plotting logic.

Not sure "reduced-form" is defined anywhere?

Can you be more specific with "schizphrenic posterior distribution"

Added clarification on both points

Sorry if this is annoying, but can you add a glossary term (in glossary.rst) for "probabilistic programming"

done

Something iffy going on with formatting here: "Y_bin ~ T_bin + feature_0 + feature_1 + feature_2 + feature_3 + feature_4 + feature_5 + feature_6 + feature_7 +feature_8",

Fixed.

Added a bunch of references.

NathanielF · 2025-10-29T20:42:50Z

I think there's scope for a summary directed at practitioners. Are you advocating for the general practice of using these kinds of priors for sparsifying model structure? If so, are there concrete situations where that can be used, and how could a skeptic be convinced of it's utility above and beyond this being a form of basic feature selection when you throw all your variables into the causal soup?

I was primarily thinking of these as a kind of sensitivity analysis tool for cases with lots of covariates. Not as a magical salve to replace theory driven instrument selection. Re: summary directed at practitioners i can add one at the end before the conclusion, but i'll keep it brief.

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

NathanielF · 2025-10-30T13:31:38Z

For point of discussion I think after this PR i'd propose to add a few issues around

The possible inclusion of a small utility function or class to add variable selection priors to causalpy that can be used across various model classes
Strategise about how to incorporate CATE estimation routines. Here I've used a very simple interaction model. But we should have a more general approach....
Potential plan to augment the existing IV model with more explicit parameterisation of e.g. covariance structure and functionality for CATE and variable selection priors.

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

adding knowledgebase on SCM

a9ccbfb

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

tidying spell check

6c620a8

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

NathanielF added 14 commits October 23, 2025 22:31

improve binary modelling with BART

1d79fee

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

update with CATE RUN

72c8104

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

add visual of probabilistic causal inference

6da45fd

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

update cate and po section

b01c6a3

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

fixing typos

c180054

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

add PO diagram

e359210

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

added empirical example

8999d91

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

fix typo

e682822

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

add fish tank analogy

762137b

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

Merge branch 'main' into cate_example

4807bec

fix typo

4531097

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

tidying text

0628b91

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

further tidying

65db0b4

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

added conclusion

9b27f64

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

NathanielF marked this pull request as ready for review October 28, 2025 11:01

NathanielF requested review from drbenvincent and juanitorduz October 28, 2025 11:01

drbenvincent reviewed Oct 28, 2025

View reviewed changes

More sign posting

34e2fe3

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

update references and write up

8787868

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

add CTA to the end.

d5016b7

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

NathanielF added 4 commits October 30, 2025 06:50

add bayes factor plot

5cc9366

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

hide table inputs for param comparison

79beb30

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

tightening the introduction

d941b77

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

tightening

cbaba03

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

NathanielF added 4 commits October 31, 2025 12:01

tidying

373f011

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

fix typo

22753d3

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

small fix

b406d53

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

tightening the conclusion with advice

eb05301

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

NathanielF requested a review from drbenvincent November 3, 2025 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adding to knowledgebase on SCM #537

adding to knowledgebase on SCM #537

Uh oh!

NathanielF commented Oct 23, 2025 •

edited

Loading

Uh oh!

review-notebook-app bot commented Oct 23, 2025

Uh oh!

codecov bot commented Oct 23, 2025 •

edited

Loading

Uh oh!

drbenvincent left a comment

Uh oh!

drbenvincent left a comment

Uh oh!

NathanielF commented Oct 28, 2025

Uh oh!

NathanielF commented Oct 28, 2025

Uh oh!

NathanielF commented Oct 29, 2025 •

edited

Loading

Uh oh!

NathanielF commented Oct 29, 2025

Uh oh!

NathanielF commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adding to knowledgebase on SCM #537

Are you sure you want to change the base?

adding to knowledgebase on SCM #537

Uh oh!

Conversation

NathanielF commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Oct 23, 2025

Uh oh!

codecov bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

drbenvincent left a comment

Choose a reason for hiding this comment

Uh oh!

drbenvincent left a comment

Choose a reason for hiding this comment

Typos

Uh oh!

NathanielF commented Oct 28, 2025

Uh oh!

NathanielF commented Oct 28, 2025

Uh oh!

NathanielF commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanielF commented Oct 29, 2025

Uh oh!

NathanielF commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NathanielF commented Oct 23, 2025 •

edited

Loading

codecov bot commented Oct 23, 2025 •

edited

Loading

NathanielF commented Oct 29, 2025 •

edited

Loading