Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vignette prior predictive #465

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Encoding: UTF-8
Authors@R: c(person("Jonah", "Gabry", email = "jsg2201@columbia.edu", role = "aut"),
person("Imad", "Ali", role = "ctb"),
person("Sam", "Brilleman", role = "ctb"),
person("Lauren", "Kennedy", role = "ctb"),
person(given = "Jacqueline Buros", family = "Novik",
role = "ctb", comment = "R/stan_jm.R"),
person("AstraZeneca", role = "ctb", comment = "R/stan_jm.R"),
Expand Down
290 changes: 290 additions & 0 deletions vignettes/prior-pred.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,290 @@
---
title: "Choosing your rstanarm prior with prior predictive checks"
author: "Lauren Kennedy and Jonah Gabry"
date: "`r Sys.Date()`"
output:
html_vignette:
toc: yes
---
<!--
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Prior predictive checks}
-->
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Introduction

One important part of a Bayesian workflow is the selection of priors that encode
the analyst or researcher's domain specific expertise and other external
information, while also allowing the analysis to be sufficiently informed by the
data. However, because of the interaction between potentially complicated priors
and likelihoods it can be difficult to form an intuition about the relative
informativeness of a given prior. Gabry et al. (2019) argue that one way of
approaching this is to use prior predictive checks. In this vignette we describe
how and why prior predictive checks are done and work through an example
that explores the effects of different priors in the context of logistic
regression. For general information on specifying priors for **rstanarm** models
see the vignette [Prior Distributions for rstanarm Models](priors.html).

```{r packages, message=FALSE}
library(rstanarm)
library(dplyr)
library(tidyr)
library(ggplot2)
library(bayesplot)
theme_set(bayesplot::theme_default())
```

For the purposes of this vignette we focus on a logistic regression. To begin
with we start with a simple intercept only model:

$$
Pr(y=1) = {\rm logit}^{-1}(\beta_0)
$$
In a model this simple, if the number of observations $N$ is very large, the
prior will have limited impact on the estimate of $Pr(y=1)$. For small $N$, the
prior can have a substantial impact, depending on the choice of distribution
and how much information is included.

## Prior predictive distribution

Prior predictive checks involve interrogating data that is simulated from
the model. The idea is that the combination of the prior and the
distribution for the data should be able to simulate any plausible
data set and should more often simulate data sets that are more consistent
with the prior information we have. These simulated data sets are realizations
from the _prior predictive distribution_.

As we will see, **rstanarm** provides some shortcuts for simulating from the
prior predictive distribution, but this can also be done without **rstanarm**
using the following method:

1. Sample from the priors specified in your model. For example, if the prior is
$\beta_0 \sim N(0,1)$, take a bunch of draws $N(0,1)$. Here we'll draw 1000
times:

```{r}
beta_0_ppc <- rnorm(1000, 0, 1)
```

2. For each draw from the prior, use it to draw from the distribution of the
outcome. In the case of a binary outcome it can often be more useful to simply
calculate $Pr(y = 1)$ instead of also then drawing from the Bernoulli distribution,
but for other types of outcomes the correct thing to do is to simulate _outcomes_
(e.g., for a normal distribution don't just calculate the mean $\mu$, draw
outcomes $\tilde{y}$ from $N(\mu, \sigma)$). If the model has predictors $X$
then the observed values of $X$, values of $X$ of particular interest, and/or
simulated values of $X$ can be used. In our example, the model only has an
intercept so to calculate $Pr(y = 1)$ we simply take the inverse logit of
each of the draws of the intercept $\beta_0$.

```{r}
pr_y_ppc <- invlogit(beta_0_ppc)
```

We can then visualize this using ggplot.

```{r}
ggplot(data=data.frame(x=pr_y_ppc), aes(x))+
geom_histogram(binwidth = 0.01) +
xlab("Pr(y = 1)") +
yaxis_text(FALSE) +
yaxis_title(FALSE)
```

# Prior predictive checks with rstanarm

This is relatively simple for this simple model, but how can we use this to explore priors in more complex models? In this vignette we use **rstanarm** to explore the default and alternative priors for logistic regression.

To begin with, we simulate some data . We won't be using this data to update the
priors (as we concern ourselves entirely with prior predictive checks), but we
use it to specify the _form_ of the model to **rstanarm**.

```{r}
dat <- data.frame(
y = rbinom(1000, 1, 0.5),
x = rep(c(0,1), 500) # The x variable will be used in the second stage of our vignette
)
```

## Priors for intercept only models

To begin we focus on the simplest possible model---an intercept only model.

### Default priors in rstanarm

We can begin by exploring the default priors used by **rstanarm**. We use the
argument `prior_PD=TRUE` to indicate that we don't want **rstanarm** to estimate
the parameters using the data but rather just draw from the prior.

```{r, message=FALSE}
m1 <- stan_glm(formula = y ~ 1, data = dat, family = binomial(link="logit"),
prior_PD = TRUE, refresh = 0)
```

We can confirm the priors used via the `prior_summary()` function.

```{r}
prior_summary(m1)
```

Once we have samples from the prior we can use the `posterior_epred()` function
to calculate the implied $Pr(y = 1)$. Because we specified a binomial model with
logit link, `posterior_epred()` will apply the inverse logit transformation for us.

```{r}
m1_predict <- posterior_epred(m1)
dim(m1_predict)
```

The dimensions are `4000 x 1000` because by default **rstanarm** will take 4000
draws from the prior and there are 1000 observations (this is inferred from the
number of rows in the data, although the data itself were not actually used
anywhere). Each row corresponds to a different draw of $\beta_0$ from the
prior and each column corresponds to a different data point.


```{r}
m1_long <- pivot_longer(data.frame(m1_predict), cols = everything())
ggplot(m1_long, aes(x=value, group=name))+
geom_histogram(binwidth = 0.01, alpha = 0.5) +
xlab("Pr(y = 1)") +
yaxis_text(FALSE) +
yaxis_title(FALSE)
```

As we can see, the default priors in **rstanarm** place a lot of weight on
$Pr(y=1)$ being close to 0 or 1 and less weight on intermediate values. While
there are some reasons to have such extreme priors, we can use prior predictive
checks to explore alternative priors for the intercept.

### Uniform probability

Say, for example, that we have no a priori information suggesting that any
values for $Pr(y = 1)$ are more likely than others. In this case we might want
to specify a uniform prior. To do this exactly we would need to use $\beta_0
\sim {\rm Logistic}(0,1)$ --- using the standard logistic distribution on the
logit scale results in a uniform distribution on the probability scale --- but
**rstanarm** doesn't offer the logistic distribution as an option. As an
alternative we use ${\rm Normal}(0, 1.6)$, which
[closely approximates](https://www.johndcook.com/blog/2010/05/18/normal-approximation-to-logistic/)
the ${\rm Logistic}(0,1)$ distribution. The rationale for this is discussed in
Gelman and Hill (2007).

To explore how this impacts the probability values, we repeat the steps above
but specify `prior_intercept = normal(0, 1.6)`.

```{r}
m1b <- stan_glm(formula = y ~ 1, data = dat, family=binomial(link="logit"),
prior_PD = TRUE, prior_intercept = normal(0, 1.6),
refresh = 0)
```

As expected, we can see visually that this prior implies that $Pr(y = 1)$ could
be any value between 0 and 1 with roughly equal probability.

```{r}
m1b_predict <- posterior_epred(m1b)
m1b_long <- pivot_longer(data.frame(m1b_predict), cols = everything())

ggplot(m1b_long, aes(x=value, group=name))+
geom_histogram(binwidth = 0.01, alpha = 0.5) +
xlab("Pr(y = 1)") +
yaxis_text(FALSE) +
yaxis_title(FALSE)
```

### Priors for rare events

If we are in a situation where it is reasonable to expect that the outcome $y=1$
is very unlikely, we might like to use a prior that a priori places much more weight
on small probabilities relative to large probabilities. To do this, we need to
set a prior that is mostly negative on the logit scale, like ${\rm Normal}(-3,1)$.
Using the same code as above but substituting this prior, we can see that
now the prior probability that $y = 1$ is mostly concentrated below 20%.

```{r}
m1c <- stan_glm(formula = y ~ 1, data = dat, family=binomial(link="logit"),
prior_PD = TRUE, prior_intercept = normal(-3, 1),
refresh = 0)
```

```{r}
m1c_predict <- posterior_epred(m1c)
m1c_long <- pivot_longer(data.frame(m1c_predict), cols = everything())

ggplot(m1c_long, aes(x=value, group=name))+
geom_histogram(binwidth = 0.01, alpha = 0.5) +
xlab("Pr(y = 1)") +
yaxis_text(FALSE) +
yaxis_title(FALSE)
```

## Priors for the slope
One of the challenges of picking priors is that they work in symphony with the rest of the model, including priors for other parameters. To explore this, we use the extend the intercept model to include a parameter that represents a binary variable (such as experimental condition). We start out with the default priors from rstanarm.

```{r, echo= TRUE, message=FALSE, results="hide"}
m2 <- stan_glm(formula = y ~ x, prior_PD=TRUE, family=binomial(link="logit"), data=dat_intercept)
prior_summary(m2)
```
There's a few different ways to visualize the impact of this prior, but we choose to consider it in terms of potential outcomes. We create two dataframes, to predict for using our priors. One supposes everyone is treated. The other supposes everyone is a control. The concept is to simulate for both groups the effect if treated and if control, and then compare the difference.
```{r}
df_treat <- data.frame(dat_intercept$y,x=1)
df_ctrl <- data.frame(dat_intercept$y,x=0)
```

To do this we use the newdata arguement in posterior_linpred, which allows us to predict for a new dataset. We predict assuming treated and control, and then compare the two.

```{r}
m2_predict_treat <- posterior_linpred(m2, newdata = df_treat, transform = TRUE)
m2_predict_ctrl <- posterior_linpred(m2, newdata = df_ctrl, transform = TRUE)
m2_predict_dif <- m2_predict_treat-m2_predict_ctrl
```

We then take the median expected value for each observation, and plot the apriori expected difference between the two conditions. While we may have expected from our prior that the treatment effect was centered by zero, it is surprising that althought the prior for the effect term is relatively diffuse, the prior expected effect is very tight.

```{r}
m2_long <- gather(data.frame(m2_predict_dif))

ggplot(m2_long, aes(x=value,group=key))+
geom_histogram(binwidth = .005, alpha=.5) + xlab("Expected difference in probability") + ylab("Density")
```

##Now with a different intercept prior

Now without changing the prior for the slope parameter, we show that changing the prior of the intercept can have large implications for the expected difference between groups. Here we change the prior for the intercept to a normal(0,1.6) as we used before. We follow the same steps as above

```{r, echo= TRUE, message=FALSE, results="hide"}
m2b <- stan_glm(formula = y ~ x, prior_PD=TRUE,,prior_intercept = normal(0,1.6), family=binomial(link="logit"), data=dat_intercept)
prior_summary(m2b)
```

```{r}
m2b_predict_treat <- posterior_linpred(m2b, newdata = df_treat, transform = TRUE)
m2b_predict_ctrl <- posterior_linpred(m2b, newdata = df_ctrl, transform = TRUE)
m2b_predict_dif <- m2b_predict_treat-m2b_predict_ctrl
```

We then take the median expected value for each observation, and plot the distribution of the expected difference between the two conditions. Without changing the prior on the effect of condition we can obtain vastly different expected differences between conditions.

```{r}
m2b_long <- gather(data.frame(m2b_predict_dif))

ggplot(m2b_long, aes(x=value,group=key))+
geom_histogram(binwidth = .005, alpha=.5) + xlab("Expected difference in probability") + ylab("Density")
```

Priors are really difficult to think about. Not only are they only understood in the context of the likelihood, but they are also sensitive to other priors in the model. Rstanarm had to have default priors for functionality, but they should be used with caution. Ideally before running an analysis prior predictive checks should be used to understand apriori the implications of the priors chosen. We do not make recommendations for any specific prior in this vignette, but rather demonstrate how priors can be explored using rstanarm. This is a deliberate choice. Often priors are useful but specific to the context of the application. Rather than making broad statements of priors that should be chosen, instead we advocate for careful exploration. In this vignette we demonstrate using a simple model how to use the convenience functions included in the rstanarm package to do just that.

# Acknowledgements

Many kind thanks to Andrew Gelman, Michael Betancourt and Jonah Gabry for their helpful thoughts and suggestions on this vignette.

# References

Gabry, J., Simpson, D., Vehtari, A., Betancourt, M., & Gelman, A. (2019). Visualization in Bayesian workflow. Journal of the Royal Statistical Society: Series A (Statistics in Society), 182(2), 389-402.

Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge university press.