Telling Causal from Confounded with Causal Inference

To check whether a observed association between a set of features X and and outcome Y is due to X causing Y, or an unknown confounder Z, we compare two models in terms of minimum description length:

  1. The causal model: A Bayesian linear regression model,
  2. The confounded model: A Probabilistic PCA model.

The model that explains the data better in terms of minimum description length is likely the true model. Please see Section 4 in (Wachinger et al., MedIA, for the full details.


import numpy as np
from compare_models import compare_models

X = np.random.randn(100, 5)

beta = np.zeros(5)
beta[1:] = np.random.uniform(low=-1, high=1, size=4)
Y = X @ beta

result = compare_models(X, Y, DZ=1)

This will print a table that lists the log-likelihood of the causal and confounded model for 10 repetitions. The higher log-likelihood of the causal model, suggests that X is indeed the cause of Y.

iter ll_causal ll_confounded bayes_factor
1 -479.039789 -822.107921 9.830969e+148
2 -479.040413 -822.155958 1.030832e+149
3 -479.041873 -822.185778 1.060485e+149
4 -479.038676 -822.201896 1.081167e+149
5 -479.040882 -822.394069 1.307358e+149
6 -479.040289 -821.986750 8.704735e+148
7 -479.042047 -822.151386 1.024454e+149
8 -479.041100 -822.115588 9.893665e+148
9 -479.041621 -822.332420 1.228286e+149
10 -479.039873 -821.870923 7.755917e+148


Our approach to compare causal and confounded models via minimum description length (MDL) is based on the work of Kaltenpoth and Vreeken. We are not your real parents: Telling causal from confounded by MDL. In: ICDM. 2019.