Skip to content

Commit 807f3d2

Browse files
Copilotmmcky
andcommitted
Fix: ar1_bayes style guide compliance (JAX Rule #1, Writing Rule #2, Code Rule #4)
Co-authored-by: mmcky <8263752+mmcky@users.noreply.github.com>
1 parent a6c3b1a commit 807f3d2

File tree

1 file changed

+65
-59
lines changed

1 file changed

+65
-59
lines changed

lectures/ar1_bayes.md

Lines changed: 65 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ kernelspec:
1919
```{code-cell} ipython3
2020
:tags: [hide-output]
2121
22-
!pip install numpyro jax
22+
!pip install numpyro
2323
```
2424

2525
In addition to what's included in base Anaconda, we need to install the following packages
@@ -53,9 +53,7 @@ logger.setLevel(logging.CRITICAL)
5353

5454
This lecture uses Bayesian methods offered by [pymc](https://www.pymc.io/projects/docs/en/stable/) and [numpyro](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
5555

56-
57-
The model is a good laboratory for illustrating
58-
consequences of alternative ways of modeling the distribution of the initial $y_0$:
56+
The model is a good laboratory for illustrating consequences of alternative ways of modeling the distribution of the initial $y_0$:
5957

6058
- As a fixed number
6159

@@ -68,7 +66,8 @@ $$
6866
y_{t+1} = \rho y_t + \sigma_x \epsilon_{t+1}, \quad t \geq 0
6967
$$ (eq:themodel)
7068
71-
where the scalars $\rho$ and $\sigma_x$ satisfy $|\rho| < 1$ and $\sigma_x > 0$;
69+
where the scalars $\rho$ and $\sigma_x$ satisfy $|\rho| < 1$ and $\sigma_x > 0$.
70+
7271
$\{\epsilon_{t+1}\}$ is a sequence of i.i.d. normal random variables with mean $0$ and variance $1$.
7372
7473
The second component of the statistical model is
@@ -81,8 +80,7 @@ $$ (eq:themodel_2)
8180
8281
Consider a sample $\{y_t\}_{t=0}^T$ governed by this statistical model.
8382
84-
The model
85-
implies that the likelihood function of $\{y_t\}_{t=0}^T$ can be **factored**:
83+
The model implies that the likelihood function of $\{y_t\}_{t=0}^T$ can be **factored**:
8684
8785
$$
8886
f(y_T, y_{T-1}, \ldots, y_0) = f(y_T| y_{T-1}) f(y_{T-1}| y_{T-2}) \cdots f(y_1 | y_0 ) f(y_0)
@@ -103,7 +101,9 @@ We want to study how inferences about the unknown parameters $(\rho, \sigma_x)$
103101
104102
Below, we study two widely used alternative assumptions:
105103
106-
- $(\mu_0,\sigma_0) = (y_0, 0)$ which means that $y_0$ is drawn from the distribution ${\mathcal N}(y_0, 0)$; in effect, we are **conditioning on an observed initial value**.
104+
- $(\mu_0,\sigma_0) = (y_0, 0)$ which means that $y_0$ is drawn from the distribution ${\mathcal N}(y_0, 0)$.
105+
106+
In effect, we are **conditioning on an observed initial value**.
107107
108108
- $\mu_0,\sigma_0$ are functions of $\rho, \sigma_x$ because $y_0$ is drawn from the stationary distribution implied by $\rho, \sigma_x$.
109109
@@ -115,16 +115,23 @@ Unknown parameters are $\rho, \sigma_x$.
115115
116116
We have independent **prior probability distributions** for $\rho, \sigma_x$ and want to compute a posterior probability distribution after observing a sample $\{y_{t}\}_{t=0}^T$.
117117
118-
The notebook uses `pymc4` and `numpyro` to compute a posterior distribution of $\rho, \sigma_x$. We will use NUTS samplers to generate samples from the posterior in a chain. Both of these libraries support NUTS samplers.
118+
The notebook uses `pymc4` and `numpyro` to compute a posterior distribution of $\rho, \sigma_x$.
119+
120+
We will use NUTS samplers to generate samples from the posterior in a chain.
119121
120-
NUTS is a form of Monte Carlo Markov Chain (MCMC) algorithm that bypasses random walk behaviour and allows for convergence to a target distribution more quickly. This not only has the advantage of speed, but allows for complex models to be fitted without having to employ specialised knowledge regarding the theory underlying those fitting methods.
122+
Both of these libraries support NUTS samplers.
123+
124+
NUTS is a form of Monte Carlo Markov Chain (MCMC) algorithm that bypasses random walk behaviour and allows for convergence to a target distribution more quickly.
125+
126+
This not only has the advantage of speed, but allows for complex models to be fitted without having to employ specialised knowledge regarding the theory underlying those fitting methods.
121127
122128
Thus, we explore consequences of making these alternative assumptions about the distribution of $y_0$:
123129
124-
- A first procedure is to condition on whatever value of $y_0$ is observed. This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
130+
- A first procedure is to condition on whatever value of $y_0$ is observed.
125131
126-
- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
127-
so that $y_0 \sim {\cal N} \left(0, {\sigma_x^2\over (1-\rho)^2} \right) $
132+
This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
133+
134+
- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel` so that $y_0 \sim {\cal N} \left(0, {\sigma_x^2\over (1-\rho)^2} \right) $
128135
129136
When the initial value $y_0$ is far out in a tail of the stationary distribution, conditioning on an initial value gives a posterior that is **more accurate** in a sense that we'll explain.
130137
@@ -137,7 +144,9 @@ We begin by solving a **direct problem** that simulates an AR(1) process.
137144
138145
How we select the initial value $y_0$ matters.
139146
140-
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$. Why? Because $y_0$ contains information about $\rho, \sigma_x$.
147+
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$.
148+
149+
Why? Because $y_0$ contains information about $\rho, \sigma_x$.
141150
142151
* If we suspect that $y_0$ is far in the tails of the stationary distribution -- so that variation in early observations in the sample have a significant **transient component** -- it is better to condition on $y_0$ by setting $f(y_0) = 1$.
143152
@@ -146,25 +155,25 @@ To illustrate the issue, we'll begin by choosing an initial $y_0$ that is far ou
146155
147156
```{code-cell} ipython3
148157
149-
def ar1_simulate(rho, sigma, y0, T):
158+
def ar1_simulate(ρ, σ, y0, T):
150159
151160
# Allocate space and draw epsilons
152161
y = np.empty(T)
153-
eps = np.random.normal(0.,sigma,T)
162+
eps = np.random.normal(0., σ, T)
154163
155164
# Initial condition and step forward
156165
y[0] = y0
157166
for t in range(1, T):
158-
y[t] = rho*y[t-1] + eps[t]
167+
y[t] = ρ * y[t-1] + eps[t]
159168
160169
return y
161170
162-
sigma = 1.
163-
rho = 0.5
171+
σ = 1.
172+
ρ = 0.5
164173
T = 50
165174
166175
np.random.seed(145353452)
167-
y = ar1_simulate(rho, sigma, 10, T)
176+
y = ar1_simulate(ρ, σ, 10, T)
168177
```
169178
170179
```{code-cell} ipython3
@@ -180,8 +189,7 @@ First we'll use **pymc4**.
180189
181190
## PyMC Implementation
182191
183-
For a normal distribution in `pymc`,
184-
$var = 1/\tau = \sigma^{2}$.
192+
For a normal distribution in `pymc`, $var = 1/\tau = \sigma^{2}$.
185193
186194
```{code-cell} ipython3
187195
@@ -190,14 +198,14 @@ AR1_model = pmc.Model()
190198
with AR1_model:
191199
192200
# Start with priors
193-
rho = pmc.Uniform('rho', lower=-1., upper=1.) # Assume stable rho
194-
sigma = pmc.HalfNormal('sigma', sigma = np.sqrt(10))
201+
ρ = pmc.Uniform('ρ', lower=-1., upper=1.) # Assume stable ρ
202+
σ = pmc.HalfNormal('σ', sigma = np.sqrt(10))
195203
196-
# Expected value of y at the next period (rho * y)
197-
yhat = rho * y[:-1]
204+
# Expected value of y at the next period (ρ * y)
205+
yhat = ρ * y[:-1]
198206
199207
# Likelihood of the actual realization
200-
y_like = pmc.Normal('y_obs', mu=yhat, sigma=sigma, observed=y[1:])
208+
y_like = pmc.Normal('y_obs', mu=yhat, sigma=σ, observed=y[1:])
201209
```
202210
203211
[pmc.sample](https://www.pymc.io/projects/docs/en/v5.10.0/api/generated/pymc.sample.html#pymc-sample) by default uses the NUTS samplers to generate samples as shown in the below cell:
@@ -216,7 +224,7 @@ with AR1_model:
216224
217225
Evidently, the posteriors aren't centered on the true values of $.5, 1$ that we used to generate the data.
218226
219-
This is a symptom of the classic **Hurwicz bias** for first order autoregressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`.)
227+
This is a symptom of the classic **Hurwicz bias** for first order autoregressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`).
220228
221229
The Hurwicz bias is worse the smaller is the sample (see {cite}`Orcutt_Winokur_69`).
222230
@@ -246,15 +254,15 @@ AR1_model_y0 = pmc.Model()
246254
with AR1_model_y0:
247255
248256
# Start with priors
249-
rho = pmc.Uniform('rho', lower=-1., upper=1.) # Assume stable rho
250-
sigma = pmc.HalfNormal('sigma', sigma=np.sqrt(10))
257+
ρ = pmc.Uniform('ρ', lower=-1., upper=1.) # Assume stable ρ
258+
σ = pmc.HalfNormal('σ', sigma=np.sqrt(10))
251259
252260
# Standard deviation of ergodic y
253-
y_sd = sigma / np.sqrt(1 - rho**2)
261+
y_sd = σ / np.sqrt(1 - ρ**2)
254262
255263
# yhat
256-
yhat = rho * y[:-1]
257-
y_data = pmc.Normal('y_obs', mu=yhat, sigma=sigma, observed=y[1:])
264+
yhat = ρ * y[:-1]
265+
y_data = pmc.Normal('y_obs', mu=yhat, sigma=σ, observed=y[1:])
258266
y0_data = pmc.Normal('y0_obs', mu=0., sigma=y_sd, observed=y[0])
259267
```
260268
@@ -284,8 +292,7 @@ Please note how the posterior for $\rho$ has shifted to the right relative to wh
284292
Think about why this happens.
285293
286294
```{hint}
287-
It is connected to how Bayes Law (conditional probability) solves an **inverse problem** by putting high probability on parameter values
288-
that make observations more likely.
295+
It is connected to how Bayes Law (conditional probability) solves an **inverse problem** by putting high probability on parameter values that make observations more likely.
289296
```
290297
291298
We'll return to this issue after we use `numpyro` to compute posteriors under our two alternative assumptions about the distribution of $y_0$.
@@ -302,38 +309,38 @@ def plot_posterior(sample):
302309
Plot trace and histogram
303310
"""
304311
# To np array
305-
rhos = sample['rho']
306-
sigmas = sample['sigma']
307-
rhos, sigmas, = np.array(rhos), np.array(sigmas)
312+
ρs = sample['ρ']
313+
σs = sample['σ']
314+
ρs, σs, = np.array(ρs), np.array(σs)
308315
309316
fig, axs = plt.subplots(2, 2, figsize=(17, 6))
310317
# Plot trace
311-
axs[0, 0].plot(rhos) # rho
312-
axs[1, 0].plot(sigmas) # sigma
318+
axs[0, 0].plot(ρs) # ρ
319+
axs[1, 0].plot(σs) # σ
313320
314321
# Plot posterior
315-
axs[0, 1].hist(rhos, bins=50, density=True, alpha=0.7)
322+
axs[0, 1].hist(ρs, bins=50, density=True, alpha=0.7)
316323
axs[0, 1].set_xlim([0, 1])
317-
axs[1, 1].hist(sigmas, bins=50, density=True, alpha=0.7)
324+
axs[1, 1].hist(σs, bins=50, density=True, alpha=0.7)
318325
319-
axs[0, 0].set_title("rho")
320-
axs[0, 1].set_title("rho")
321-
axs[1, 0].set_title("sigma")
322-
axs[1, 1].set_title("sigma")
326+
axs[0, 0].set_title("ρ")
327+
axs[0, 1].set_title("ρ")
328+
axs[1, 0].set_title("σ")
329+
axs[1, 1].set_title("σ")
323330
plt.show()
324331
```
325332
326333
```{code-cell} ipython3
327334
def AR1_model(data):
328335
# set prior
329-
rho = numpyro.sample('rho', dist.Uniform(low=-1., high=1.))
330-
sigma = numpyro.sample('sigma', dist.HalfNormal(scale=np.sqrt(10)))
336+
ρ = numpyro.sample('ρ', dist.Uniform(low=-1., high=1.))
337+
σ = numpyro.sample('σ', dist.HalfNormal(scale=np.sqrt(10)))
331338
332-
# Expected value of y at the next period (rho * y)
333-
yhat = rho * data[:-1]
339+
# Expected value of y at the next period (ρ * y)
340+
yhat = ρ * data[:-1]
334341
335342
# Likelihood of the actual realization.
336-
y_data = numpyro.sample('y_obs', dist.Normal(loc=yhat, scale=sigma), obs=data[1:])
343+
y_data = numpyro.sample('y_obs', dist.Normal(loc=yhat, scale=σ), obs=data[1:])
337344
338345
```
339346
@@ -370,17 +377,17 @@ Here's the new code to achieve this.
370377
```{code-cell} ipython3
371378
def AR1_model_y0(data):
372379
# Set prior
373-
rho = numpyro.sample('rho', dist.Uniform(low=-1., high=1.))
374-
sigma = numpyro.sample('sigma', dist.HalfNormal(scale=np.sqrt(10)))
380+
ρ = numpyro.sample('ρ', dist.Uniform(low=-1., high=1.))
381+
σ = numpyro.sample('σ', dist.HalfNormal(scale=np.sqrt(10)))
375382
376383
# Standard deviation of ergodic y
377-
y_sd = sigma / jnp.sqrt(1 - rho**2)
384+
y_sd = σ / jnp.sqrt(1 - ρ**2)
378385
379-
# Expected value of y at the next period (rho * y)
380-
yhat = rho * data[:-1]
386+
# Expected value of y at the next period (ρ * y)
387+
yhat = ρ * data[:-1]
381388
382389
# Likelihood of the actual realization.
383-
y_data = numpyro.sample('y_obs', dist.Normal(loc=yhat, scale=sigma), obs=data[1:])
390+
y_data = numpyro.sample('y_obs', dist.Normal(loc=yhat, scale=σ), obs=data[1:])
384391
y0_data = numpyro.sample('y0_obs', dist.Normal(loc=0., scale=y_sd), obs=data[0])
385392
```
386393
@@ -408,8 +415,7 @@ mcmc2.print_summary()
408415
409416
Look what happened to the posterior!
410417
411-
It has moved far from the true values of the parameters used to generate the data because of how Bayes' Law (i.e., conditional probability)
412-
is telling `numpyro` to explain what it interprets as "explosive" observations early in the sample.
418+
It has moved far from the true values of the parameters used to generate the data because of how Bayes' Law (i.e., conditional probability) is telling `numpyro` to explain what it interprets as "explosive" observations early in the sample.
413419
414420
Bayes' Law is able to generate a plausible likelihood for the first observation by driving $\rho \rightarrow 1$ and $\sigma \uparrow$ in order to raise the variance of the stationary distribution.
415421

0 commit comments

Comments
 (0)