From eb0dd97a9da80d31d35d370ab6e5ddf2a43bea7a Mon Sep 17 00:00:00 2001
From: Jonah Gabry <jgabry@gmail.com>
Date: Wed, 18 Sep 2019 17:26:29 -0400
Subject: [PATCH 1/4] Create prior-pred.Rmd

---
 vignettes/prior-pred.Rmd | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 vignettes/prior-pred.Rmd

diff --git a/vignettes/prior-pred.Rmd b/vignettes/prior-pred.Rmd
new file mode 100644
index 000000000..8b1378917
--- /dev/null
+++ b/vignettes/prior-pred.Rmd
@@ -0,0 +1 @@
+

From 6fb3d1d740f4edee711d0bddd0ea81b89b53e8ff Mon Sep 17 00:00:00 2001
From: Lauren Kennedy <lauken13@users.noreply.github.com>
Date: Wed, 2 Oct 2019 14:17:59 -0400
Subject: [PATCH 2/4] Add initial content

---
 vignettes/prior-pred.Rmd | 198 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 198 insertions(+)

diff --git a/vignettes/prior-pred.Rmd b/vignettes/prior-pred.Rmd
index 8b1378917..5a66d0f6f 100644
--- a/vignettes/prior-pred.Rmd
+++ b/vignettes/prior-pred.Rmd
@@ -1 +1,199 @@
+---
+title: "Choosing your Rstanarm prior with prior predictive checks"
+author: "Lauren Kennedy"
+date: "6/23/2019"
+output: html_document
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+One important part of a Bayesian workflow is the selection of priors to include researcher and domain specific information while allowing the analysis to learn from the data. With complex likelihoods, it can be difficult to imagine the relative informativeness of the a given prior. Gabry et al. (2019) argue that one way of approaching this is to use prior predictive checks. In this vignette we describe how and why prior predictive checks are done, before working through an example with logisitic regression with a logit link function to explore the effects of different priors. For more information on the priors available, see \href{https://cran.r-project.org/web/packages/rstanarm/vignettes/priors.html}.
+
+```{r,echo=FALSE,message=FALSE}
+library(rstanarm)
+library(ggplot2)
+library(dplyr)
+library(tidyr)
+library(arm)
+```
+
+For the purposes of this vignette we focus on a logistic regression. To begin with we start with a simple intercept only model: 
+
+$$
+Pr(y=1) = logit^{-1}(\beta_0)
+$$
+For a very large $n$, the prior has limited impact on the estimate of $Pr(y=1)$, but for small $n$, the prior will impact the fitted values. Given that, we would like to understand the impact of prior choices on $Pr(y=1)$. In this vignette we demonstrate how to use rstanarm to do understand the impact of prior choices. 
+
+This, of course, can be done without rstanarm by the following method
+
+1. Sample from the priors specified in your model. If the prior for $\beta_0 \sim N(0,1)$, draw a number (in this case 1000) of samples from $N(0,1)$. 
+
+```{r}
+beta_0_ppc <- rnorm(1000,0,1)
+```
+
+2. Use those sampled priors with reasonable values of the covariates X (in this case the model has none) to understand the apriori suggested values for $Pr(y = 1)$. Here we simply take the inverse logit of the samples from the $\beta_0$ prior. 
+
+```{r}
+pr_y_ppc <- invlogit(beta_0_ppc) 
+```
+
+We can then visualize this using ggplot. 
+
+```{r}
+ggplot(data=data.frame(x=pr_y_ppc),aes(x=x, y=..density..))+
+  geom_histogram(binwidth=.005) + theme_bw() + xlab("Predicted Pr(y=1)") + ylab("Density")
+```
+
+However, rstanarm often uses complex priors, some of which may be unexpected. Priors in rstanarm often reflect the best use practices for priors at the current time, and can sometimes be complex to understand. 
+
+#Prior predictive checks with rstanarm
+
+This is relatively simple for this simple model, but how can we use this to explore priors in more complex models? In this vignette we use rstanarm to explore the default and alternative priors for logistic regression. To begin with, we simulate some data. We won't be using the data to update the priors (as we concern ourselves entirely with prior predictive checks), but we use the data to specify the form of the model to rstanarm. 
+
+```{r}
+dat_intercept <- data.frame(
+  y = rbinom(1000,1,.5),
+  x = rep(c(0,1),500) #The x variable will be used in the second stage of our vignette
+)
+```
+
+## Priors for intercept only models 
+
+To begin we focus on the simplest possible model---an intercept only model. 
+
+### Default priors in rstanarm
+We can begin by exploring the default priors used by rstanarm. We use the arguement prior_PD=TRUE to prevent rstanarm from updating the data given the data. 
+
+```{r, echo=TRUE, message=FALSE, results="hide"}
+m1 <- stan_glm(formula = y ~ 1, prior_PD=TRUE, family=binomial(link="logit"), data=dat_intercept)
+```
+And we can confirm the priors used
+
+```{r}
+prior_summary(m1)
+```
+
+Once we have sampled from the prior we can use the posterior_linpred function to observe expected probabilty. By default this function will provide a posterior distribution for each observation of $x$, even if the model didn't update based on the data. We use the transform = TRUE argument to ensure obtain predictions on the probability scale.
+
+```{r}
+m1_predict <- posterior_linpred(m1,transform = TRUE)
+dim(m1_predict)
+```
+
+We then take the prior median expected value for each observation, and plot the distribution of expected probabilites for $y=1$. 
+
+```{r}
+m1_long <- gather(data.frame(m1_predict))
+
+ggplot(m1_long, aes(x=value,group=key))+
+  geom_histogram(binwidth = .005, alpha=.5) + xlab("Expected probability") + ylab("Density")
+```
+
+As we can see, the default priors in rstanarm suggest that the probability ($Pr(y=1)$) is either very close to 1 or very close to 0. While there are some reasons to have such extreme priors, we use prior predictive checks to explore other alternative priors for the intercept. 
+
+### Uniform probability
+
+Say for example we have no apriori expectation for the likely probability for our outcome data. In this case we might want to specify a uniform prior so that every probability is equally likely. To do this exactly we would need the set $\beta_0 \sim logistic(0,1)$, but this is not a possible distribution in rstanarm. As an alternative we use  \href{an approximation to the logistic(0,1) distribution}{https://www.johndcook.com/blog/2010/05/18/normal-approximation-to-logistic/}, normal(0,1.6). The rationale for this is discussed in Gelman and Hill (2007).
+
+To explore how this impacts the expected probability values, we repeat the steps above but specify the prior as normal(0,1.6).
+
+```{r, echo= TRUE, message=FALSE, results="hide"}
+m1b <- stan_glm(formula = y ~ 1,prior_intercept = normal(0,1.6), prior_PD=TRUE, family=binomial(link="logit"), data=dat_intercept)
+```
+
+```{r}
+m1b_predict <- posterior_linpred(m1b,transform = TRUE)
+
+m1b_long <- gather(data.frame(m1b_predict))
+
+ggplot(m1b_long, aes(x=value,group=key))+
+  geom_histogram(binwidth = .005, alpha=.5) + xlab("Expected probability") + ylab("Density")
+```
+
+This prior suggests that many of the probabilities that y=1 are equally likely. 
+
+### Priors for rare events
+However sometimes we expect that the outcome is very unlikely. Perhaps we might like to use a prior that apriori suggests only very small probabilities. To do this, we need to set a prior that is negative, like a normal(-3,1). We use the same code as above to see the expected distribution of the probability $Pr(y=1)$ and see that that most of the expected mass is less than 20%.
+
+```{r, , echo= FALSE, message=FALSE, results="hide"}
+m1c <- stan_glm(formula = y ~ 1,prior_intercept = normal(-3,1), prior_PD=TRUE, family=binomial(link="logit"), data=dat_intercept)
+```
+
+```{r}
+m1c_predict <- posterior_linpred(m1c,transform = TRUE)
+
+m1c_long <- gather(data.frame(m1c_predict))
+
+ggplot(m1c_long, aes(x=value,group=key))+
+  geom_histogram(binwidth = .005, alpha=.5) + xlab("Expected probability") + ylab("Density")
+```
+
+## Priors for the slope 
+One of the challenges of picking priors is that they work in symphony with the rest of the model, including priors for other parameters. To explore this, we use the extend the intercept model to include a parameter that represents a binary variable (such as experimental condition). We start out with the default priors from rstanarm. 
+
+```{r, echo= TRUE, message=FALSE, results="hide"}
+m2 <- stan_glm(formula = y ~ x, prior_PD=TRUE, family=binomial(link="logit"), data=dat_intercept)
+prior_summary(m2)
+```
+There's a few different ways to visualize the impact of this prior, but we choose to consider it in terms of potential outcomes. We create two dataframes, to predict for using our priors. One supposes everyone is treated. The other supposes everyone is a control. The concept is to simulate for both groups the effect if treated and if control, and then compare the difference. 
+```{r}
+df_treat <- data.frame(dat_intercept$y,x=1)
+df_ctrl <- data.frame(dat_intercept$y,x=0)
+```
+
+To do this we use the newdata arguement in posterior_linpred, which allows us to predict for a new dataset. We predict assuming treated and control, and then compare the two. 
+
+```{r}
+m2_predict_treat <- posterior_linpred(m2, newdata = df_treat, transform = TRUE)
+m2_predict_ctrl <- posterior_linpred(m2, newdata = df_ctrl, transform = TRUE)
+m2_predict_dif <- m2_predict_treat-m2_predict_ctrl
+```
+
+We then take the median expected value for each observation, and plot the apriori expected difference between the two conditions. While we may have expected from our prior that the treatment effect was centered by zero, it is surprising that althought the prior for the effect term is relatively diffuse, the prior expected effect is very tight. 
+
+```{r}
+m2_long <- gather(data.frame(m2_predict_dif))
+
+ggplot(m2_long, aes(x=value,group=key))+
+  geom_histogram(binwidth = .005, alpha=.5) + xlab("Expected difference in probability") + ylab("Density")
+```
+
+##Now with a different intercept prior
+
+Now without changing the prior for the slope parameter, we show that changing the prior of the intercept can have large implications for the expected difference between groups. Here we change the prior for the intercept to a normal(0,1.6) as we used before. We follow the same steps as above
+
+```{r, echo= TRUE, message=FALSE, results="hide"}
+m2b <- stan_glm(formula = y ~ x, prior_PD=TRUE,,prior_intercept = normal(0,1.6), family=binomial(link="logit"), data=dat_intercept)
+prior_summary(m2b)
+```
+
+```{r}
+m2b_predict_treat <- posterior_linpred(m2b, newdata = df_treat, transform = TRUE)
+m2b_predict_ctrl <- posterior_linpred(m2b, newdata = df_ctrl, transform = TRUE)
+m2b_predict_dif <- m2b_predict_treat-m2b_predict_ctrl
+```
+
+We then take the median expected value for each observation, and plot the distribution of the expected difference between the two conditions. Without changing the prior on the effect of condition we can obtain vastly different expected differences between conditions.
+
+```{r}
+m2b_long <- gather(data.frame(m2b_predict_dif))
+
+ggplot(m2b_long, aes(x=value,group=key))+
+  geom_histogram(binwidth = .005, alpha=.5) + xlab("Expected difference in probability") + ylab("Density")
+```
+
+Priors are really difficult to think about. Not only are they only understood in the context of the likelihood, but they are also sensitive to other priors in the model. Rstanarm had to have default priors for functionality, but they should be used with caution. Ideally before running an analysis prior predictive checks should be used to understand apriori the implications of the priors chosen. We do not make recommendations for any specific prior in this vignette, but rather demonstrate how priors can be explored using rstanarm. This is a deliberate choice. Often priors are useful but specific to the context of the application. Rather than making broad statements of priors that should be chosen, instead we advocate for careful exploration. In this vignette we demonstrate using a simple model how to use the convenience functions included in the rstanarm package to do just that. 
+
+# Acknowledgements
+
+Many kind thanks to Andrew Gelman, Michael Betancourt and Jonah Gabry for their helpful thoughts and suggestions on this vignette. 
+
+# References
+
+Gabry, J., Simpson, D., Vehtari, A., Betancourt, M., & Gelman, A. (2019). Visualization in Bayesian workflow. Journal of the Royal Statistical Society: Series A (Statistics in Society), 182(2), 389-402.
+
+Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge university press.
 

From adee4736eff10d08160f445d95ba6602f29d2b9f Mon Sep 17 00:00:00 2001
From: jgabry <jgabry@gmail.com>
Date: Wed, 26 Aug 2020 13:28:48 -0600
Subject: [PATCH 3/4] Add Lauren to DESCRIPTION file

@lauken13 I should have already done this after we merged the MRP vignette but looks like I forgot, sorry!
---
 DESCRIPTION | 1 +
 1 file changed, 1 insertion(+)

diff --git a/DESCRIPTION b/DESCRIPTION
index 96c51232b..13a5c773a 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -7,6 +7,7 @@ Encoding: UTF-8
 Authors@R: c(person("Jonah", "Gabry", email = "jsg2201@columbia.edu", role = "aut"),
              person("Imad", "Ali", role = "ctb"),
              person("Sam", "Brilleman", role = "ctb"),
+             person("Lauren", "Kennedy", role = "ctb"),
              person(given = "Jacqueline Buros", family = "Novik", 
                     role = "ctb", comment = "R/stan_jm.R"),
              person("AstraZeneca", role = "ctb", comment = "R/stan_jm.R"),

From 41ab17ee0f5800258e81f8e0129bd22ee128d35b Mon Sep 17 00:00:00 2001
From: jgabry <jgabry@gmail.com>
Date: Wed, 26 Aug 2020 18:01:29 -0600
Subject: [PATCH 4/4] edits up to (not including) "Priors for the slope"
 section

---
 vignettes/prior-pred.Rmd | 207 ++++++++++++++++++++++++++++-----------
 1 file changed, 149 insertions(+), 58 deletions(-)

diff --git a/vignettes/prior-pred.Rmd b/vignettes/prior-pred.Rmd
index 5a66d0f6f..dd0561b40 100644
--- a/vignettes/prior-pred.Rmd
+++ b/vignettes/prior-pred.Rmd
@@ -1,40 +1,84 @@
 ---
-title: "Choosing your Rstanarm prior with prior predictive checks"
-author: "Lauren Kennedy"
-date: "6/23/2019"
-output: html_document
+title: "Choosing your rstanarm prior with prior predictive checks"
+author: "Lauren Kennedy and Jonah Gabry"
+date: "`r Sys.Date()`"
+output:
+  html_vignette:
+    toc: yes
 ---
-
+<!--
+%\VignetteEngine{knitr::rmarkdown}
+%\VignetteIndexEntry{Prior predictive checks}
+-->
 ```{r setup, include=FALSE}
 knitr::opts_chunk$set(echo = TRUE)
 ```
 
-One important part of a Bayesian workflow is the selection of priors to include researcher and domain specific information while allowing the analysis to learn from the data. With complex likelihoods, it can be difficult to imagine the relative informativeness of the a given prior. Gabry et al. (2019) argue that one way of approaching this is to use prior predictive checks. In this vignette we describe how and why prior predictive checks are done, before working through an example with logisitic regression with a logit link function to explore the effects of different priors. For more information on the priors available, see \href{https://cran.r-project.org/web/packages/rstanarm/vignettes/priors.html}.
-
-```{r,echo=FALSE,message=FALSE}
+# Introduction
+
+One important part of a Bayesian workflow is the selection of priors that encode
+the analyst or researcher's domain specific expertise and other external
+information, while also allowing the analysis to be sufficiently informed by the
+data. However, because of the interaction between potentially complicated priors
+and likelihoods it can be difficult to form an intuition about the relative
+informativeness of a given prior. Gabry et al. (2019) argue that one way of
+approaching this is to use prior predictive checks. In this vignette we describe
+how and why prior predictive checks are done and work through an example
+that explores the effects of different priors in the context of logistic
+regression. For general information on specifying priors for **rstanarm** models
+see the vignette [Prior Distributions for rstanarm Models](priors.html).
+
+```{r packages, message=FALSE}
 library(rstanarm)
-library(ggplot2)
 library(dplyr)
 library(tidyr)
-library(arm)
+library(ggplot2)
+library(bayesplot)
+theme_set(bayesplot::theme_default())
 ```
 
-For the purposes of this vignette we focus on a logistic regression. To begin with we start with a simple intercept only model: 
+For the purposes of this vignette we focus on a logistic regression. To begin
+with we start with a simple intercept only model:
 
 $$
-Pr(y=1) = logit^{-1}(\beta_0)
+Pr(y=1) = {\rm logit}^{-1}(\beta_0)
 $$
-For a very large $n$, the prior has limited impact on the estimate of $Pr(y=1)$, but for small $n$, the prior will impact the fitted values. Given that, we would like to understand the impact of prior choices on $Pr(y=1)$. In this vignette we demonstrate how to use rstanarm to do understand the impact of prior choices. 
+In a model this simple, if the number of observations $N$ is very large, the
+prior will have limited impact on the estimate of $Pr(y=1)$. For small $N$, the
+prior can have a substantial impact, depending on the choice of distribution 
+and how much information is included. 
+
+## Prior predictive distribution
 
-This, of course, can be done without rstanarm by the following method
+Prior predictive checks involve interrogating data that is simulated from 
+the model. The idea is that the combination of the prior and the
+distribution for the data should be able to simulate any plausible
+data set and should more often simulate data sets that are more consistent
+with the prior information we have. These simulated data sets are realizations
+from the _prior predictive distribution_. 
 
-1. Sample from the priors specified in your model. If the prior for $\beta_0 \sim N(0,1)$, draw a number (in this case 1000) of samples from $N(0,1)$. 
+As we will see, **rstanarm** provides some shortcuts for simulating from the
+prior predictive distribution, but this can also be done without **rstanarm**
+using the following method:
+
+1. Sample from the priors specified in your model. For example, if the prior is
+$\beta_0 \sim N(0,1)$, take a bunch of draws $N(0,1)$. Here we'll draw 1000
+times:
 
 ```{r}
-beta_0_ppc <- rnorm(1000,0,1)
+beta_0_ppc <- rnorm(1000, 0, 1)
 ```
 
-2. Use those sampled priors with reasonable values of the covariates X (in this case the model has none) to understand the apriori suggested values for $Pr(y = 1)$. Here we simply take the inverse logit of the samples from the $\beta_0$ prior. 
+2. For each draw from the prior, use it to draw from the distribution of the 
+outcome. In the case of a binary outcome it can often be more useful to simply
+calculate $Pr(y = 1)$ instead of also then drawing from the Bernoulli distribution, 
+but for other types of outcomes the correct thing to do is to simulate _outcomes_
+(e.g., for a normal distribution don't just calculate the mean $\mu$, draw 
+outcomes $\tilde{y}$ from $N(\mu, \sigma)$). If the model has predictors $X$ 
+then the observed values of $X$, values of $X$ of particular interest, and/or
+simulated values of $X$ can be used. In our example, the model only has an 
+intercept so to calculate $Pr(y = 1)$ we simply take the inverse logit of 
+each of the draws of the intercept $\beta_0$.
 
 ```{r}
 pr_y_ppc <- invlogit(beta_0_ppc) 
@@ -43,20 +87,25 @@ pr_y_ppc <- invlogit(beta_0_ppc)
 We can then visualize this using ggplot. 
 
 ```{r}
-ggplot(data=data.frame(x=pr_y_ppc),aes(x=x, y=..density..))+
-  geom_histogram(binwidth=.005) + theme_bw() + xlab("Predicted Pr(y=1)") + ylab("Density")
+ggplot(data=data.frame(x=pr_y_ppc), aes(x))+
+  geom_histogram(binwidth = 0.01) + 
+  xlab("Pr(y = 1)") + 
+  yaxis_text(FALSE) + 
+  yaxis_title(FALSE)
 ```
 
-However, rstanarm often uses complex priors, some of which may be unexpected. Priors in rstanarm often reflect the best use practices for priors at the current time, and can sometimes be complex to understand. 
+# Prior predictive checks with rstanarm
 
-#Prior predictive checks with rstanarm
+This is relatively simple for this simple model, but how can we use this to explore priors in more complex models? In this vignette we use **rstanarm** to explore the default and alternative priors for logistic regression. 
 
-This is relatively simple for this simple model, but how can we use this to explore priors in more complex models? In this vignette we use rstanarm to explore the default and alternative priors for logistic regression. To begin with, we simulate some data. We won't be using the data to update the priors (as we concern ourselves entirely with prior predictive checks), but we use the data to specify the form of the model to rstanarm. 
+To begin with, we simulate some data . We won't be using this data to update the
+priors (as we concern ourselves entirely with prior predictive checks), but we
+use it to specify the _form_ of the model to **rstanarm**.
 
 ```{r}
-dat_intercept <- data.frame(
-  y = rbinom(1000,1,.5),
-  x = rep(c(0,1),500) #The x variable will be used in the second stage of our vignette
+dat <- data.frame(
+  y = rbinom(1000, 1, 0.5),
+  x = rep(c(0,1), 500) # The x variable will be used in the second stage of our vignette
 )
 ```
 
@@ -65,70 +114,112 @@ dat_intercept <- data.frame(
 To begin we focus on the simplest possible model---an intercept only model. 
 
 ### Default priors in rstanarm
-We can begin by exploring the default priors used by rstanarm. We use the arguement prior_PD=TRUE to prevent rstanarm from updating the data given the data. 
 
-```{r, echo=TRUE, message=FALSE, results="hide"}
-m1 <- stan_glm(formula = y ~ 1, prior_PD=TRUE, family=binomial(link="logit"), data=dat_intercept)
+We can begin by exploring the default priors used by **rstanarm**. We use the
+argument `prior_PD=TRUE` to indicate that we don't want **rstanarm** to estimate
+the parameters using the data but rather just draw from the prior.
+
+```{r, message=FALSE}
+m1 <- stan_glm(formula = y ~ 1, data = dat, family = binomial(link="logit"), 
+               prior_PD = TRUE, refresh = 0)
 ```
-And we can confirm the priors used
+
+We can confirm the priors used via the `prior_summary()` function.
 
 ```{r}
 prior_summary(m1)
 ```
 
-Once we have sampled from the prior we can use the posterior_linpred function to observe expected probabilty. By default this function will provide a posterior distribution for each observation of $x$, even if the model didn't update based on the data. We use the transform = TRUE argument to ensure obtain predictions on the probability scale.
+Once we have samples from the prior we can use the `posterior_epred()` function 
+to calculate the implied $Pr(y = 1)$. Because we specified a binomial model with 
+logit link, `posterior_epred()` will apply the inverse logit transformation for us. 
 
 ```{r}
-m1_predict <- posterior_linpred(m1,transform = TRUE)
+m1_predict <- posterior_epred(m1)
 dim(m1_predict)
 ```
 
-We then take the prior median expected value for each observation, and plot the distribution of expected probabilites for $y=1$. 
+The dimensions are `4000 x 1000` because by default **rstanarm** will take 4000
+draws from the prior and there are 1000 observations (this is inferred from the
+number of rows in the data, although the data itself were not actually used
+anywhere). Each row corresponds to a different draw of $\beta_0$ from the 
+prior and each column corresponds to a different data point. 
 
-```{r}
-m1_long <- gather(data.frame(m1_predict))
 
-ggplot(m1_long, aes(x=value,group=key))+
-  geom_histogram(binwidth = .005, alpha=.5) + xlab("Expected probability") + ylab("Density")
+```{r}
+m1_long <- pivot_longer(data.frame(m1_predict), cols = everything())
+ggplot(m1_long, aes(x=value, group=name))+
+  geom_histogram(binwidth = 0.01, alpha = 0.5) + 
+  xlab("Pr(y = 1)") + 
+  yaxis_text(FALSE) + 
+  yaxis_title(FALSE)
 ```
 
-As we can see, the default priors in rstanarm suggest that the probability ($Pr(y=1)$) is either very close to 1 or very close to 0. While there are some reasons to have such extreme priors, we use prior predictive checks to explore other alternative priors for the intercept. 
+As we can see, the default priors in **rstanarm** place a lot of weight on
+$Pr(y=1)$ being close to 0 or 1 and less weight on intermediate values. While
+there are some reasons to have such extreme priors, we can use prior predictive
+checks to explore alternative priors for the intercept.
 
 ### Uniform probability
 
-Say for example we have no apriori expectation for the likely probability for our outcome data. In this case we might want to specify a uniform prior so that every probability is equally likely. To do this exactly we would need the set $\beta_0 \sim logistic(0,1)$, but this is not a possible distribution in rstanarm. As an alternative we use  \href{an approximation to the logistic(0,1) distribution}{https://www.johndcook.com/blog/2010/05/18/normal-approximation-to-logistic/}, normal(0,1.6). The rationale for this is discussed in Gelman and Hill (2007).
+Say, for example, that we have no a priori information suggesting that any
+values for $Pr(y = 1)$ are more likely than others. In this case we might want
+to specify a uniform prior. To do this exactly we would need to use $\beta_0
+\sim {\rm Logistic}(0,1)$ --- using the standard logistic distribution on the
+logit scale results in a uniform distribution on the probability scale --- but
+**rstanarm** doesn't offer the logistic distribution as an option. As an
+alternative we use ${\rm Normal}(0, 1.6)$, which
+[closely approximates](https://www.johndcook.com/blog/2010/05/18/normal-approximation-to-logistic/) 
+the ${\rm Logistic}(0,1)$ distribution. The rationale for this is discussed in
+Gelman and Hill (2007).
 
-To explore how this impacts the expected probability values, we repeat the steps above but specify the prior as normal(0,1.6).
-
-```{r, echo= TRUE, message=FALSE, results="hide"}
-m1b <- stan_glm(formula = y ~ 1,prior_intercept = normal(0,1.6), prior_PD=TRUE, family=binomial(link="logit"), data=dat_intercept)
-```
+To explore how this impacts the probability values, we repeat the steps above
+but specify `prior_intercept = normal(0, 1.6)`.
 
 ```{r}
-m1b_predict <- posterior_linpred(m1b,transform = TRUE)
+m1b <- stan_glm(formula = y ~ 1, data = dat, family=binomial(link="logit"), 
+                prior_PD = TRUE, prior_intercept = normal(0, 1.6), 
+                refresh = 0)
+```
 
-m1b_long <- gather(data.frame(m1b_predict))
+As expected, we can see visually that this prior implies that $Pr(y = 1)$ could
+be any value between 0 and 1 with roughly equal probability.
 
-ggplot(m1b_long, aes(x=value,group=key))+
-  geom_histogram(binwidth = .005, alpha=.5) + xlab("Expected probability") + ylab("Density")
+```{r}
+m1b_predict <- posterior_epred(m1b)
+m1b_long <- pivot_longer(data.frame(m1b_predict), cols = everything())
+
+ggplot(m1b_long, aes(x=value, group=name))+
+  geom_histogram(binwidth = 0.01, alpha = 0.5) + 
+  xlab("Pr(y = 1)") + 
+  yaxis_text(FALSE) + 
+  yaxis_title(FALSE)
 ```
 
-This prior suggests that many of the probabilities that y=1 are equally likely. 
-
 ### Priors for rare events
-However sometimes we expect that the outcome is very unlikely. Perhaps we might like to use a prior that apriori suggests only very small probabilities. To do this, we need to set a prior that is negative, like a normal(-3,1). We use the same code as above to see the expected distribution of the probability $Pr(y=1)$ and see that that most of the expected mass is less than 20%.
 
-```{r, , echo= FALSE, message=FALSE, results="hide"}
-m1c <- stan_glm(formula = y ~ 1,prior_intercept = normal(-3,1), prior_PD=TRUE, family=binomial(link="logit"), data=dat_intercept)
-```
+If we are in a situation where it is reasonable to expect that the outcome $y=1$ 
+is very unlikely, we might like to use a prior that a priori places much more weight 
+on small probabilities relative to large probabilities. To do this, we need to 
+set a prior that is mostly negative on the logit scale, like ${\rm Normal}(-3,1)$. 
+Using the same code as above but substituting this prior, we can see that 
+now the prior probability that $y = 1$ is mostly concentrated below 20%.
 
 ```{r}
-m1c_predict <- posterior_linpred(m1c,transform = TRUE)
-
-m1c_long <- gather(data.frame(m1c_predict))
+m1c <- stan_glm(formula = y ~ 1, data = dat, family=binomial(link="logit"), 
+                prior_PD = TRUE, prior_intercept = normal(-3, 1), 
+                refresh = 0)
+```
 
-ggplot(m1c_long, aes(x=value,group=key))+
-  geom_histogram(binwidth = .005, alpha=.5) + xlab("Expected probability") + ylab("Density")
+```{r}
+m1c_predict <- posterior_epred(m1c)
+m1c_long <- pivot_longer(data.frame(m1c_predict), cols = everything())
+
+ggplot(m1c_long, aes(x=value, group=name))+
+  geom_histogram(binwidth = 0.01, alpha = 0.5) + 
+  xlab("Pr(y = 1)") + 
+  yaxis_text(FALSE) + 
+  yaxis_title(FALSE)
 ```
 
 ## Priors for the slope