Nima Hejazi
Based on the tmle3mediate
R
package.
- Examine how the presence of post-treatment mediating variables can complicate a causal analysis, and how direct and indirect effects can be defined to resolve these complications.
- Describe the essential similarities and differences between direct and indirect causal effects, including their definition in terms of stochastic interventions.
- Differentiate the joint interventions required to define direct and indirect effects from the static, dynamic, and stochastic interventions that yield total causal effects.
- Describe the assumptions needed for identification of the natural direct and indirect effects, as well as the limitations of these effect definitions.
- Estimate the natural direct and indirect effects for a binary treatment using
the
tmle3mediate
R
package. - Differentiate the population intervention direct and indirect effects of stochastic interventions from the natural direct and indirect effects, including differences in the assumptions required for their identification.
- Estimate the population intervention direct effect of a binary treatment
using the
tmle3mediate
R
package.
In applications ranging from biology and epidemiology to economics and psychology, scientific inquires are often concerned with ascertaining the effect of a treatment on an outcome variable only through particular pathways between the two. In the presence of post-treatment intermediate variables affected by exposure (that is, mediators), path-specific effects allow for such complex, mechanistic relationships to be teased apart. These causal effects are of such wide interest that their definition and identification has been the object of study in statistics for nearly a century -- indeed, the earliest examples of modern causal mediation analysis can be traced back to work on path analysis [@wright1934method]. In recent decades, renewed interest has resulted in the formulation of novel direct and indirect effects within both the potential outcomes and nonparametric structural equation modeling frameworks [@robins1986new; @pearl1995causal; @pearl2009causality; @spirtes2000causation; @dawid2000causal]. Generally, the indirect effect (IE) is the portion of the total effect found to work through mediating variables, while the direct effect (DE) encompasses all other components of the total effect, including both the effect of the treatment directly on the outcome and its effect through all paths not explicitly involving the mediators. The mechanistic knowledge conveyed by the direct and indirect effects can be used to improve understanding of both why and how treatments may be efficacious.
Modern approaches to causal inference have allowed for significant advances over the methodology of traditional path analysis, overcoming significant restrictions imposed by the use of parametric modeling approaches [@vanderweele2015explanation]. Using distinct frameworks, @robins1992identifiability and @pearl2001direct provided equivalent nonparametric decompositions of the average treatment effect into the natural direct and indirect effects. @vanderweele2015explanation provides a comprehensive overview of classical causal mediation analysis. We provide an alternative perspective, focusing instead on the construction of efficient estimators of these quantities, which have appeared only recently [@tchetgen2012semiparametric; @zheng2012targeted], as well as on more flexible direct and indirect definitions based upon stochastic interventions [@diaz2020causal].
Let us return to our familiar sample of
As in preceding chapters, a structural causal model (SCM) [@pearl2009causality]
helps to formalize the definition of our counterfactual variables:
\begin{align}
W &= f_W(U_W) \ \nonumber
A &= f_A(W, U_A) \ \nonumber
Z &= f_Z(W, A, U_Z) \ \nonumber
Y &= f_Y(W, A, Z, U_Y).
(#eq:npsem-mediate)
\end{align}
This set of equations
constitutes a mechanistic model generating the observed data
By factorizing the likelihood of the data
We have explicitly excluded potential confounders of the mediator-outcome
relationship affected by exposure (i.e., variables affected by
The natural direct and indirect effects arise from a decomposition of the ATE:
\begin{align*}
\E[Y(1) - Y(0)] =
&\underbrace{\E[Y(1, Z(0)) - Y(0, Z(0))]}{\text{NDE}} \ &+
\underbrace{\E[Y(1, Z(1)) - Y(1, Z(0))]}{\text{NIE}}.
\end{align*}
In particular, the natural indirect effect (NIE) measures the effect of the
treatment
::: {.definition name="Exchangeability"}
::: {.definition name="Treatment Positivity"}
For any
::: {.definition name="Mediator Positivity"}
For any
::: {.definition name="Cross-world Counterfactual Independence"}
For all
While the first three assumptions may be familiar based on their analogs in simpler settings, the cross-world independence requirement is unique to identification of the natural direct and indirect effects. This assumption resolves a challenging complication to the identification of these path-specific effects, which has been termed the "recanting witness" by @avin2005identifiability, who introduce a graphical resolution equivalent to this assumption. This independence of counterfactuals indexed by distinct interventions is, in fact, a serious limitation to the scientific relevance of these effect definitions, as it results in the NDE and NIE being unidentifiable in randomized trials [@robins2010alternative], implying that corresponding scientific claims cannot be falsified through experimentation [@popper1934logic; @dawid2000causal] and, consequently, directly contradicting a foundational pillar of the scientific method.
While many attempts have been made to weaken this last assumption [@petersen2006estimation; @imai2010identification; @vansteelandt2012imputation; @vansteelandt2012natural], these results either impose stringent modeling assumptions, propose alternative interpretations of the natural effects, or provide a limited degree of additional flexibility by developing conditions that may more easily be satisfied. For example, @petersen2006estimation weaken this assumption by requiring it only for conditional means (rather than distinct counterfactuals) and adopt a view of the natural direct effect as a weighted average of another type of direct effect, the controlled direct effect. The motivated reader may wish to further examine these details independently. We next review estimation of the NDE and NIE, which remain widely used in modern applications of causal mediation analysis.
The NDE is defined as \begin{align*} \psi_{\text{NDE}} =&~\E[Y(1, Z(0)) - Y(0, Z(0))] \ =& \sum_w \sum_z [\underbrace{\E(Y \mid A = 1, z, w)}{\overline{Q}Y(A = 1, z, w)} - \underbrace{\E(Y \mid A = 0, z, w)}{\overline{Q}Y(A = 0, z, w)}] \ &\times \underbrace{p(z \mid A = 0, w)}{q_Z(Z \mid 0, w))} \underbrace{p(w)}{q_W}, \end{align*} where the likelihood factors arise from a factorization of the joint likelihood: \begin{equation*} p(w, a, z, y) = \underbrace{p(y \mid w, a, z)}{q_Y(A, W, Z)} \underbrace{p(z \mid w, a)}{q_Z(Z \mid A, W)} \underbrace{p(a \mid w)}{g(A \mid W)} \underbrace{p(w)}{q_W}. \end{equation*}
The process of estimating the NDE begins by constructing
A procedure for constructing a targeted maximum likelihood (TML) estimator of
the NDE treats $\overline{Q}{\text{diff}}$ itself as a nuisance parameter,
regressing its estimate $\overline{Q}{\text{diff}, n}$ on baseline covariates
\begin{equation*}
C_Y(q_Z, g)(O) = \Bigg{\frac{\mathbb{I}(A = 1)}{g(1 \mid W)}
\frac{q_Z(Z \mid 0, W)}{q_Z(Z \mid 1, W)} -
\frac{\mathbb{I}(A = 0)}{g(0 \mid W)} \Bigg} \ .
\end{equation*}
Breaking this down,
This subtle appearance of a ratio of conditional densities is concerning --
tools to estimate such quantities are sparse in the statistics literature
[@diaz2011super; @hejazi2020haldensify], unfortunately, and the problem is still
more complicated (and computationally taxing) when
Underneath the hood, the mean outcome difference
Derivation and estimation of the NIE is analogous to that of the NDE. Recall
that the NIE is the effect of
As with the NDE, re-parameterization can be used to replace
At times, the natural direct and indirect effects may prove too limiting, as
these effect definitions are based on static interventions (i.e., setting
We previously discussed stochastic interventions when considering how to
intervene on continuous-valued treatments; however, these intervention
schemes may be applied to all manner of treatment variables.
A particular type of stochastic intervention well-suited to working with binary
treatments is the incremental propensity score intervention (IPSI), first
proposed by @kennedy2019nonparametric. Such interventions do not
deterministically set the treatment level of an observed unit to a fixed
quantity (i.e., setting
- g(1\mid w)},
\end{equation*}
where the scalar
$0 < \delta < \infty$ specifies a change in the odds of receiving treatment. As described by @diaz2020causal in the context of causal mediation analysis, the identification assumptions required for the PIDE and the PIIE are significantly more lax than those required for the NDE and NIE. These identification assumptions include the following. Importantly, the assumption of cross-world counterfactual independence is not at all required.
::: {.definition name="Conditional Exchangeability of Treatment and Mediators"}
Assume that $\E{Y(a, z) \mid Z, A, W} = \E{Y(a, z) \mid Z,
W}\forall(a, z) \in \mathcal{A} \times \mathcal{Z}W$. This assumption is
stronger than and implied by the assumption
::: {.definition name="Common Support of Treatment and Mediators"}
Assume that $\text{supp}{g_{\delta}(\cdot \mid w)} \subseteq
\text{supp}{g(\cdot \mid w)}\forallw \in \mathcal{W}$. This assumption is
standard and requires only that the post-intervention value of
We may decompose the population intervention effect (PIE) in terms of the population intervention direct effect (PIDE) and the population intervention indirect effect (PIIE): \begin{equation*} \mathbb{E}{Y(A_\delta)} - \mathbb{E}Y = \overbrace{\mathbb{E}{Y(A_\delta, Z(A_\delta)) - Y(A_\delta, Z)}}^{\text{PIIE}} + \overbrace{\mathbb{E}{Y(A_\delta, Z) - Y(A, Z)}}^{\text{PIDE}}. \end{equation*}
This decomposition of the PIE as the sum of the population intervention direct and indirect effects has an interpretation analogous to the corresponding standard decomposition of the average treatment effect. In the sequel, we will compute each of the components of the direct and indirect effects above using appropriate estimators as follows
- For
$\E{Y(A, Z)}$ , the sample mean$\frac{1}{n}\sum_{i=1}^n Y_i$ is consistent; - for
$\E{Y(A_{\delta}, Z)}$ , a TML estimator for the effect of a joint intervention altering the treatment mechanism but not the mediation mechanism, based on the proposal in @diaz2020causal; and, - for
$\E{Y(A_{\delta}, Z_{A_{\delta}})}$ , an efficient estimator for the effect of a joint intervention on both the treatment and mediation mechanisms, as per @kennedy2019nonparametric.
As described by @diaz2020causal, the statistical functional identifying the
decomposition term that appears in both the PIDE and PIIE
- D^A_{\delta}(o) + D^{Z,W}_{\delta}(o) - \psi(\delta)$, where the orthogonal components of the EIF are defined as follows:
-
$D^Y_{\delta}(o) = {g_{\delta}(a \mid w) / e(a \mid z, w)} {y - \overline{Q}_{Y}(z,a,w)}$ , -
$D^A_{\delta}(o) = {\delta\phi(w) (a - g(1 \mid w))} / {(\delta g(1 \mid w) + g(0 \mid w))^2}$ , and where $\phi(w) := \E{\overline{Q}{Y}(1, Z, W) - \overline{Q}{Y}(0, Z, W) \mid W = w}$, - $D^{Z,W}{\delta}(o) = \int{\mathcal{A}} \overline{Q}{Y}(z, a, w) g{\delta}(a \mid w) d\kappa(a)$.
The TML estimator may be computed by fluctuating initial estimates of the
nuisance parameters so as to solve the EIF estimating equation. The resultant
TML estimator is
\begin{equation*}
\psi_{n}^{\star}(\delta) = \int_{\mathcal{A}} \frac{1}{n} \sum_{i=1}^n
\overline{Q}{Y,n}^{\star}(Z_i, a, W_i)
g{\delta, n}^{\star}(a \mid W_i) d\kappa(a),
\end{equation*}
where tmle3mediate
package. We demonstrate the use of tmle3mediate
to obtain
We now turn to estimating the natural direct and indirect effects, as well as the population intervention direct effect, using the WASH Benefits data, introduced in earlier chapters. Let's first load the data:
library(data.table)
library(sl3)
library(tmle3)
library(tmle3mediate)
# download data
washb_data <- fread(
paste0(
"https://raw.githubusercontent.com/tlverse/tlverse-data/master/",
"wash-benefits/washb_data.csv"
),
stringsAsFactors = TRUE
)
# make intervention node binary and subsample
washb_data <- washb_data[sample(.N, 600), ]
washb_data[, tr := as.numeric(tr != "Control")]
We'll next define the baseline covariates
node_list <- list(
W = c(
"momage", "momedu", "momheight", "hfiacat", "Nlt18", "Ncomp", "watmin",
"elec", "floor", "walls", "roof"
),
A = "tr",
Z = c("sex", "month", "aged"),
Y = "whz"
)
Here, the node_list
encodes the parents of each node -- for example, process_missing
:
processed <- process_missing(washb_data, node_list)
washb_data <- processed$data
node_list <- processed$node_list
We'll now construct an ensemble learner using a handful of popular machine learning algorithms:
# SL learners used for continuous data (the nuisance parameter Z)
enet_contin_learner <- Lrnr_glmnet$new(
alpha = 0.5, family = "gaussian", nfolds = 3
)
lasso_contin_learner <- Lrnr_glmnet$new(
alpha = 1, family = "gaussian", nfolds = 3
)
fglm_contin_learner <- Lrnr_glm_fast$new(family = gaussian())
mean_learner <- Lrnr_mean$new()
contin_learner_lib <- Stack$new(
enet_contin_learner, lasso_contin_learner, fglm_contin_learner, mean_learner
)
sl_contin_learner <- Lrnr_sl$new(learners = contin_learner_lib)
# SL learners used for binary data (nuisance parameters G and E in this case)
enet_binary_learner <- Lrnr_glmnet$new(
alpha = 0.5, family = "binomial", nfolds = 3
)
lasso_binary_learner <- Lrnr_glmnet$new(
alpha = 1, family = "binomial", nfolds = 3
)
fglm_binary_learner <- Lrnr_glm_fast$new(family = binomial())
binary_learner_lib <- Stack$new(
enet_binary_learner, lasso_binary_learner, fglm_binary_learner, mean_learner
)
sl_binary_learner <- Lrnr_sl$new(learners = binary_learner_lib)
# create list for treatment and outcome mechanism regressions
learner_list <- list(
Y = sl_contin_learner,
A = sl_binary_learner
)
We demonstrate calculation of the NIE below, starting by instantiating a "Spec"
object that encodes exactly which learners to use for the nuisance parameters
tmle3
function, alongside the data, the node list (created above), and a learner list
indicating which machine learning algorithms to use for estimating the nuisance
parameters based on