You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This lecture uses Bayesian methods offered by [pymc](https://www.pymc.io/projects/docs/en/stable/) and [numpyro](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
55
55
56
-
57
-
The model is a good laboratory for illustrating
58
-
consequences of alternative ways of modeling the distribution of the initial $y_0$:
56
+
The model is a good laboratory for illustrating consequences of alternative ways of modeling the distribution of the initial $y_0$:
@@ -103,7 +101,9 @@ We want to study how inferences about the unknown parameters $(\rho, \sigma_x)$
103
101
104
102
Below, we study two widely used alternative assumptions:
105
103
106
-
- $(\mu_0,\sigma_0) = (y_0, 0)$ which means that $y_0$ is drawn from the distribution ${\mathcal N}(y_0, 0)$; in effect, we are **conditioning on an observed initial value**.
104
+
- $(\mu_0,\sigma_0) = (y_0, 0)$ which means that $y_0$ is drawn from the distribution ${\mathcal N}(y_0, 0)$.
105
+
106
+
In effect, we are **conditioning on an observed initial value**.
107
107
108
108
- $\mu_0,\sigma_0$ are functions of $\rho, \sigma_x$ because $y_0$ is drawn from the stationary distribution implied by $\rho, \sigma_x$.
109
109
@@ -115,16 +115,23 @@ Unknown parameters are $\rho, \sigma_x$.
115
115
116
116
We have independent **prior probability distributions** for $\rho, \sigma_x$ and want to compute a posterior probability distribution after observing a sample $\{y_{t}\}_{t=0}^T$.
117
117
118
-
The notebook uses `pymc4` and `numpyro` to compute a posterior distribution of $\rho, \sigma_x$. We will use NUTS samplers to generate samples from the posterior in a chain. Both of these libraries support NUTS samplers.
118
+
The notebook uses `pymc4` and `numpyro` to compute a posterior distribution of $\rho, \sigma_x$.
119
+
120
+
We will use NUTS samplers to generate samples from the posterior in a chain.
119
121
120
-
NUTS is a form of Monte Carlo Markov Chain (MCMC) algorithm that bypasses random walk behaviour and allows for convergence to a target distribution more quickly. This not only has the advantage of speed, but allows for complex models to be fitted without having to employ specialised knowledge regarding the theory underlying those fitting methods.
122
+
Both of these libraries support NUTS samplers.
123
+
124
+
NUTS is a form of Monte Carlo Markov Chain (MCMC) algorithm that bypasses random walk behaviour and allows for convergence to a target distribution more quickly.
125
+
126
+
This not only has the advantage of speed, but allows for complex models to be fitted without having to employ specialised knowledge regarding the theory underlying those fitting methods.
121
127
122
128
Thus, we explore consequences of making these alternative assumptions about the distribution of $y_0$:
123
129
124
-
- A first procedure is to condition on whatever value of $y_0$ is observed. This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
130
+
- A first procedure is to condition on whatever value of $y_0$ is observed.
125
131
126
-
- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
127
-
so that $y_0 \sim {\cal N} \left(0, {\sigma_x^2\over (1-\rho)^2} \right) $
132
+
This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
133
+
134
+
- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel` so that $y_0 \sim {\cal N} \left(0, {\sigma_x^2\over (1-\rho)^2} \right) $
128
135
129
136
When the initial value $y_0$ is far out in a tail of the stationary distribution, conditioning on an initial value gives a posterior that is **more accurate** in a sense that we'll explain.
130
137
@@ -137,7 +144,9 @@ We begin by solving a **direct problem** that simulates an AR(1) process.
137
144
138
145
How we select the initial value $y_0$ matters.
139
146
140
-
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$. Why? Because $y_0$ contains information about $\rho, \sigma_x$.
147
+
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$.
148
+
149
+
Why? Because $y_0$ contains information about $\rho, \sigma_x$.
141
150
142
151
* If we suspect that $y_0$ is far in the tails of the stationary distribution -- so that variation in early observations in the sample have a significant **transient component** -- it is better to condition on $y_0$ by setting $f(y_0) = 1$.
143
152
@@ -146,25 +155,25 @@ To illustrate the issue, we'll begin by choosing an initial $y_0$ that is far ou
146
155
147
156
```{code-cell} ipython3
148
157
149
-
def ar1_simulate(rho, sigma, y0, T):
158
+
def ar1_simulate(ρ, σ, y0, T):
150
159
151
160
# Allocate space and draw epsilons
152
161
y = np.empty(T)
153
-
eps = np.random.normal(0.,sigma,T)
162
+
eps = np.random.normal(0., σ, T)
154
163
155
164
# Initial condition and step forward
156
165
y[0] = y0
157
166
for t in range(1, T):
158
-
y[t] = rho*y[t-1] + eps[t]
167
+
y[t] = ρ * y[t-1] + eps[t]
159
168
160
169
return y
161
170
162
-
sigma = 1.
163
-
rho = 0.5
171
+
σ = 1.
172
+
ρ = 0.5
164
173
T = 50
165
174
166
175
np.random.seed(145353452)
167
-
y = ar1_simulate(rho, sigma, 10, T)
176
+
y = ar1_simulate(ρ, σ, 10, T)
168
177
```
169
178
170
179
```{code-cell} ipython3
@@ -180,8 +189,7 @@ First we'll use **pymc4**.
180
189
181
190
## PyMC Implementation
182
191
183
-
For a normal distribution in `pymc`,
184
-
$var = 1/\tau = \sigma^{2}$.
192
+
For a normal distribution in `pymc`, $var = 1/\tau = \sigma^{2}$.
[pmc.sample](https://www.pymc.io/projects/docs/en/v5.10.0/api/generated/pymc.sample.html#pymc-sample) by default uses the NUTS samplers to generate samples as shown in the below cell:
@@ -216,7 +224,7 @@ with AR1_model:
216
224
217
225
Evidently, the posteriors aren't centered on the true values of $.5, 1$ that we used to generate the data.
218
226
219
-
This is a symptom of the classic **Hurwicz bias** for first order autoregressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`.)
227
+
This is a symptom of the classic **Hurwicz bias** for first order autoregressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`).
220
228
221
229
The Hurwicz bias is worse the smaller is the sample (see {cite}`Orcutt_Winokur_69`).
@@ -284,8 +292,7 @@ Please note how the posterior for $\rho$ has shifted to the right relative to wh
284
292
Think about why this happens.
285
293
286
294
```{hint}
287
-
It is connected to how Bayes Law (conditional probability) solves an **inverse problem** by putting high probability on parameter values
288
-
that make observations more likely.
295
+
It is connected to how Bayes Law (conditional probability) solves an **inverse problem** by putting high probability on parameter values that make observations more likely.
289
296
```
290
297
291
298
We'll return to this issue after we use `numpyro` to compute posteriors under our two alternative assumptions about the distribution of $y_0$.
It has moved far from the true values of the parameters used to generate the data because of how Bayes' Law (i.e., conditional probability)
412
-
is telling `numpyro` to explain what it interprets as "explosive" observations early in the sample.
418
+
It has moved far from the true values of the parameters used to generate the data because of how Bayes' Law (i.e., conditional probability) is telling `numpyro` to explain what it interprets as "explosive" observations early in the sample.
413
419
414
420
Bayes' Law is able to generate a plausible likelihood for the first observation by driving $\rho \rightarrow 1$ and $\sigma \uparrow$ in order to raise the variance of the stationary distribution.
0 commit comments