From 89eb1f3113c05a096b893af64cefb314c98c006f Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 28 Jun 2022 10:05:25 +1000 Subject: [PATCH 01/13] FIX: Add arviz to environment --- environment.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/environment.yml b/environment.yml index b1109c041..dba2f3790 100644 --- a/environment.yml +++ b/environment.yml @@ -14,6 +14,7 @@ dependencies: - ghp-import==1.1.0 - sphinxcontrib-youtube==1.1.0 - sphinx-togglebutton==0.3.1 + - arviz==0.12.1 # Sandpit Requirements - quantecon - array-to-latex From 0d50ed2d989047da8252e4b517ef547f30a93e07 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 28 Jun 2022 10:20:03 +1000 Subject: [PATCH 02/13] FIX: headers in bayes_nonconj --- lectures/bayes_nonconj.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/lectures/bayes_nonconj.md b/lectures/bayes_nonconj.md index f92c1ff07..2d17b6f90 100644 --- a/lectures/bayes_nonconj.md +++ b/lectures/bayes_nonconj.md @@ -888,7 +888,7 @@ SVI_num_steps = 5000 true_theta = 0.8 ``` -#### Beta Prior and Posteriors +### Beta Prior and Posteriors: Let's compare outcomes when we use a Beta prior. @@ -944,7 +944,7 @@ Here the MCMC approximation looks good. But the VI approximation doesn't look so good. - * even though we use the beta distribution as our guide, the VI approximated posterior distributions do not closely resemble the posteriors that we had just computed analytically. +* even though we use the beta distribution as our guide, the VI approximated posterior distributions do not closely resemble the posteriors that we had just computed analytically. (Here, our initial parameter for Beta guide is (0.5, 0.5).) @@ -960,8 +960,6 @@ BayesianInferencePlot(true_theta, num_list, BETA_numpyro).SVI_plot(guide_dist='b ``` - - ## Non-conjugate Prior Distributions Having assured ourselves that our MCMC and VI methods can work well when we have conjugate prior and so can also compute analytically, we From 785e28d50d8ce3c9bab017b9cb87adfb113d3bd3 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 28 Jun 2022 10:23:38 +1000 Subject: [PATCH 03/13] FIX: equation references --- lectures/lagrangian_lqdp.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/lectures/lagrangian_lqdp.md b/lectures/lagrangian_lqdp.md index 1ab5e3442..1ce8eba87 100644 --- a/lectures/lagrangian_lqdp.md +++ b/lectures/lagrangian_lqdp.md @@ -160,7 +160,7 @@ For the undiscounted optimal linear regulator problem, form the Lagrangian $$ {\cal L} = - \sum^\infty_{t=0} \biggl\{ x^\prime_t R x_t + u_t^\prime Q u_t + 2 \mu^\prime_{t+1} [A x_t + B u_t - x_{t+1}]\biggr\} -$$ (eq1) +$$ (lag-lqdp-eq1) where $2 \mu_{t+1}$ is a vector of Lagrange multipliers on the time $t$ transition law $x_{t+1} = A x_t + B u_t$. @@ -172,16 +172,16 @@ $$ \begin{aligned} 2 Q u_t &+ 2B^\prime \mu_{t+1} = 0 \ ,\ t \geq 0 \cr \mu_t &= R x_t + A^\prime \mu_{t+1}\ ,\ t\geq 1.\cr \end{aligned} -$$ (eq2) +$$ (lag-lqdp-eq2) -Define $\mu_0$ to be a vector of shadow prices of $x_0$ and apply an envelope condition to {eq}`eq1` +Define $\mu_0$ to be a vector of shadow prices of $x_0$ and apply an envelope condition to {eq}`lag-lqdp-eq1` to deduce that $$ \mu_0 = R x_0 + A' \mu_1, $$ -which is a time $t=0 $ counterpart to the second equation of system {eq}`eq2`. +which is a time $t=0 $ counterpart to the second equation of system {eq}`lag-lqdp-eq2`. An important fact is that @@ -199,11 +199,11 @@ corresponds to the **state** vector $x_t$. It is useful to proceed with the following steps: -* solve the first equation of {eq}`eq2` for $u_t$ in terms of $\mu_{t+1}$. +* solve the first equation of {eq}`lag-lqdp-eq2` for $u_t$ in terms of $\mu_{t+1}$. * substitute the result into the law of motion $x_{t+1} = A x_t + B u_t$. -* arrange the resulting equation and the second equation of {eq}`eq2` into the form +* arrange the resulting equation and the second equation of {eq}`lag-lqdp-eq2` into the form $$ L\ \begin{pmatrix}x_{t+1}\cr \mu_{t+1}\cr\end{pmatrix}\ = \ N\ \begin{pmatrix}x_t\cr \mu_t\cr\end{pmatrix}\ @@ -271,7 +271,7 @@ The rank of $J$ is $2n$. $$ MJM^\prime = J. -$$ (eq3) +$$ (lag-lqdp-eq3) Salient properties of symplectic matrices that are readily verified include: @@ -280,14 +280,14 @@ Salient properties of symplectic matrices that are readily verified include: It can be verified directly that $M$ in equation {eq}`Mdefn` is symplectic. -It follows from equation {eq}`eq3` and from the fact $J^{-1} = J^\prime = -J$ that for any symplectic +It follows from equation {eq}`lag-lqdp-eq3` and from the fact $J^{-1} = J^\prime = -J$ that for any symplectic matrix $M$, $$ M^\prime = J^{-1} M^{-1} J. -$$ (eq4) +$$ (lag-lqdp-eq4) -Equation {eq}`eq4` states that $M^\prime$ is related to the inverse of $M$ +Equation {eq}`lag-lqdp-eq4` states that $M^\prime$ is related to the inverse of $M$ by a **similarity transformation**. For square matrices, recall that @@ -298,7 +298,7 @@ For square matrices, recall that * a matrix and its transpose share eigenvalues -It then follows from equation {eq}`eq4` that +It then follows from equation {eq}`lag-lqdp-eq4` that the eigenvalues of $M$ occur in reciprocal pairs: if $\lambda$ is an eigenvalue of $M$, so is $\lambda^{-1}$. @@ -809,7 +809,7 @@ $$ which is a time $t=0 $ counterpart to the second equation of system {eq}`eq662`. -Proceeding as we did above with the undiscounted system {eq}`eq2`, we can rearrange the first-order conditions into the +Proceeding as we did above with the undiscounted system {eq}`lag-lqdp-eq2`, we can rearrange the first-order conditions into the system $$ @@ -821,7 +821,7 @@ $$ \left[\begin{matrix} x_t \cr \mu_t \end{matrix}\right] $$ (eq663) -which in the special case that $\beta = 1$ agrees with equation {eq}`eq2`, as expected. +which in the special case that $\beta = 1$ agrees with equation {eq}`lag-lqdp-eq2`, as expected. +++ From 3d4a95550456c51bc943cdcaeb23e7c99cd66c9c Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 28 Jun 2022 10:23:47 +1000 Subject: [PATCH 04/13] FIX: headers in prob_matrix --- lectures/prob_matrix.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md index f2fdaa784..c771630a3 100644 --- a/lectures/prob_matrix.md +++ b/lectures/prob_matrix.md @@ -1002,7 +1002,7 @@ Below we draw some samples confirm that the "sampling" distribution agrees well +++ -#### Sample results +**Sample results:** ```{code-cell} ipython3 # specify parameters From 4c6d7bd1ee29b15fec07eeaa677d2d70b9d71631 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 28 Jun 2022 10:28:16 +1000 Subject: [PATCH 05/13] FIX: add install for packages outside env --- lectures/ar1_turningpts.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/lectures/ar1_turningpts.md b/lectures/ar1_turningpts.md index c1f0e78b0..28d3e534d 100644 --- a/lectures/ar1_turningpts.md +++ b/lectures/ar1_turningpts.md @@ -13,6 +13,12 @@ kernelspec: # Forecasting an AR(1) process +```{code-cell} ipython3 +:tags: [hide-output] + +!pip install arviz pymc +``` + This lecture describes methods for forecasting statistics that are functions of future values of a univariate autogressive process. The methods are designed to take into account two possible sources of uncertainty about these statistics: From a5b2b6cf5c3de9e5b33d614326b9f20e83309875 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 28 Jun 2022 11:07:05 +1000 Subject: [PATCH 06/13] FIX: Adjust prob_matrix for PDF --- lectures/prob_matrix.md | 199 +++++++++++++--------------------------- 1 file changed, 66 insertions(+), 133 deletions(-) diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md index c771630a3..a68cd3d91 100644 --- a/lectures/prob_matrix.md +++ b/lectures/prob_matrix.md @@ -19,17 +19,17 @@ After providing somewhat informal definitions of the underlying objects, we'll u Among concepts that we'll be studying include - - a joint probability distribution - - marginal distributions associated with a given joint distribution - - conditional probability distributions - - statistical independence of two random variables - - joint distributions associated with a prescribed set of marginal distributions - - couplings - - copulas - - the probability distribution of a sum of two independent random variables - - convolution of marginal distributions - - parameters that define a probability distribution - - sufficient statistics as data summaries +- a joint probability distribution +- marginal distributions associated with a given joint distribution +- conditional probability distributions +- statistical independence of two random variables +- joint distributions associated with a prescribed set of marginal distributions + - couplings + - copulas +- the probability distribution of a sum of two independent random variables + - convolution of marginal distributions +- parameters that define a probability distribution +- sufficient statistics as data summaries We'll use a matrix to represent a bivariate probability distribution and a vector to represent a univariate probability distribution @@ -57,13 +57,8 @@ We'll briefly define what we mean by a **probability space**, a **probability me For most of this lecture, we sweep these objects into the background, but they are there underlying the other objects that we'll mainly focus on. - - - Let $\Omega$ be a set of possible underlying outcomes and let $\omega \in \Omega$ be a particular underlying outcomes. - - Let $\mathcal{G} \subset \Omega$ be a subset of $\Omega$. Let $\mathcal{F}$ be a collection of such subsets $\mathcal{G} \subset \Omega$. @@ -72,7 +67,7 @@ The pair $\Omega,\mathcal{F}$ forms our **probability space** on which we want A **probability measure** $\mu$ maps a set of possible underlying outcomes $\mathcal{G} \in \mathcal{F}$ into a scalar number between $0$ and $1$ - - this is the "probability" that $X$ belongs to $A$, denoted by $ \textrm{Prob}\{X\in A\}$. +- this is the "probability" that $X$ belongs to $A$, denoted by $ \textrm{Prob}\{X\in A\}$. A **random variable** $X(\omega)$ is a function of the underlying outcome $\omega \in \Omega$. @@ -89,8 +84,6 @@ where ${\mathcal G}$ is the subset of $\Omega$ for which $X(\omega) \in A$. We call this the induced probability distribution of random variable $X$. - - ## Digression: What Does Probability Mean? Before diving in, we'll say a few words about what probability theory means and how it connects to statistics. @@ -103,7 +96,6 @@ These are purely mathematical objects. To appreciate how statisticians connect probabilities to data, the key is to understand the following concepts: - * A single draw from a probability distribution * Repeated independently and identically distributed (i.i.d.) draws of "samples" or "realizations" from the same probability distribution * A **statistic** defined as a function of a sequence of samples @@ -111,7 +103,7 @@ To appreciate how statisticians connect probabilities to data, the key is to und * The idea that a population probability distribution is what we anticipate **relative frequencies** will be in a long sequence of i.i.d. draws. Here the following mathematical machinery makes precise what is meant by **anticipated relative frequencies** - **Law of Large Numbers (LLN)** - **Central Limit Theorem (CLT)** -+++ + **Scalar example** @@ -136,41 +128,38 @@ $$ \end{aligned} $$ - - Consider the **empirical distribution**: -\begin{align} +$$ +\begin{aligned} i & = 0,\dots,I-1,\\ N_i & = \text{number of times} \ X = i,\\ N & = \sum^{I-1}_{i=0} N_i \quad \text{total number of draws},\\ \tilde {f_i} & = \frac{N_i}{N} \sim \ \text{frequency of draws for which}\ X=i -\end{align} +\end{aligned} +$$ Key ideas that justify connecting probability theory with statistics are laws of large numbers and central limit theorems **LLN:** - - A Law of Large Numbers (LLN) states that $\tilde {f_i} \to f_i \text{ as } N \to \infty$ +- A Law of Large Numbers (LLN) states that $\tilde {f_i} \to f_i \text{ as } N \to \infty$ **CLT:** - - A Central Limit Theorem (CLT) describes a **rate** at which $\tilde {f_i} \to f_i$ +- A Central Limit Theorem (CLT) describes a **rate** at which $\tilde {f_i} \to f_i$ **Remarks** - - For "frequentist" statisticians, **anticipated relative frequency** is **all** that a probability distribution means. - - - But for a Bayesian it means something more or different. +- For "frequentist" statisticians, **anticipated relative frequency** is **all** that a probability distribution means. +- But for a Bayesian it means something more or different. ## Representing Probability Distributions - - A probability distribution $\textrm{Prob} (X \in A)$ can be described by its **cumulative distribution function (CDF)** $$ @@ -194,9 +183,9 @@ When a probability density exists, a probability distribution can be characteriz For a **discrete-valued** random variable - * the number of possible values of $X$ is finite or countably infinite - * we replace a **density** with a **probability mass function**, a non-negative sequence that sums to one - * we replace integration with summation in the formula like {eq}`eq:CDFfromdensity` that relates a CDF to a probability mass function +* the number of possible values of $X$ is finite or countably infinite +* we replace a **density** with a **probability mass function**, a non-negative sequence that sums to one +* we replace integration with summation in the formula like {eq}`eq:CDFfromdensity` that relates a CDF to a probability mass function In this lecture, we mostly discuss discrete random variables. @@ -204,9 +193,7 @@ In this lecture, we mostly discuss discrete random variables. Doing this enables us to confine our tool set basically to linear algebra. Later we'll briefly discuss how to approximate a continuous random variable with a discrete random variable. - -+++ ## Univariate Probability Distributions @@ -259,11 +246,12 @@ where $\theta $ is a vector of parameters that is of much smaller dimension than **Remarks:** - - The concept of **parameter** is intimately related to the notion of **sufficient statistic**. - - Sufficient statistic are nonlinear function of a data set. - - Sufficient statistics are designed to summarize all **information** about the parameters that is contained in the big data set. - - They are important tools that AI uses to reduce the size of a **big data** set - - R. A. Fisher provided a sharp definition of **information** -- see + +- The concept of **parameter** is intimately related to the notion of **sufficient statistic**. +- Sufficient statistic are nonlinear function of a data set. +- Sufficient statistics are designed to summarize all **information** about the parameters that is contained in the big data set. +- They are important tools that AI uses to reduce the size of a **big data** set +- R. A. Fisher provided a sharp definition of **information** -- see @@ -283,8 +271,6 @@ $$ f_i( \theta)\ge0, \sum_{i=0}^{\infty}f_i(\theta)=1 $$ -+++ - ### Continuous random variable Let $X$ be a continous random variable that takes values $X \in \tilde{X}\equiv[X_U,X_L]$ whose distributions have parameters $\theta$. @@ -299,15 +285,12 @@ $$ \textrm{Prob}\{X\in \tilde{X}\} =1 $$ -+++ - ## Bivariate Probability Distributions We'll now discuss a bivariate **joint distribution**. To begin, we restrict ourselves to two discrete random variables. - Let $X,Y$ be two discrete random variables that take values: $$ @@ -335,7 +318,6 @@ where $$ \sum_{i}\sum_{j}f_{ij}=1 $$ -+++ ## Marginal Probability Distributions @@ -349,8 +331,6 @@ $$ \textrm{Prob}\{Y=j\}= \sum_{i=0}^{I-1}f_{ij} = \nu_i, i=0,\ldots,J-1 $$ - - For example, let the joint distribution over $(X,Y)$ be $$ @@ -362,12 +342,11 @@ F = \left[ \right] $$ (eq:example101discrete) - Then marginal distributions are: - $$ -\begin{aligned} \textrm{Prob} \{X=0\}&=.25+.1=.35\\ +\begin{aligned} +\textrm{Prob} \{X=0\}&=.25+.1=.35\\ \textrm{Prob}\{X=1\}& =.15+.5=.65\\ \textrm{Prob}\{Y=0\}&=.25+.15=.4\\ \textrm{Prob}\{Y=1\}&=.1+.5=.6 @@ -376,7 +355,7 @@ $$ -**Digression:** If two random variables $X,Y$ are continuous and have joint density $f(x,y)$, then marginal distributions can be computed by +**Digression:** If two random variables $X,Y$ are continuous and have joint density $f(x,y)$, then marginal distributions can be computed by $$ \begin{aligned} @@ -385,8 +364,6 @@ f(y)& = \int_{\mathbb{R}} f(x,y) dx \end{aligned} $$ -+++ - ## Conditional Probability Distributions Conditional probabilities are defined according to @@ -425,8 +402,6 @@ $$ \textrm{Prob}\{X=0|Y=1\} =\frac{ .1}{.1+.5}=\frac{.1}{.6} $$ -+++ - ## Statistical Independence Random variables X and Y are statistically **independent** if @@ -438,28 +413,24 @@ $$ where $$ -\begin{align} +\begin{aligned} \textrm{Prob}\{X=i\} &=f_i\ge0, \sum{f_i}=1 \cr \textrm{Prob}\{Y=j\} & =g_j\ge0, \sum{g_j}=1 -\end{align} +\end{aligned} $$ Conditional distributions are $$ -\begin{align} +\begin{aligned} \textrm{Prob}\{X=i|Y=j\} & =\frac{f_ig_i}{\sum_{i}f_ig_j}=\frac{f_ig_i}{g_i}=f_i \\ \textrm{Prob}\{Y=j|X=i\} & =\frac{f_ig_i}{\sum_{j}f_ig_j}=\frac{f_ig_i}{f_i}=g_i -\end{align} +\end{aligned} $$ -+++ ## Means and Variances -+++ - - The mean and variance of a discrete random variable $X$ are $$ @@ -480,10 +451,6 @@ $$ $$ - - - - ## Classic Trick for Generating Random Numbers Suppose we have at our disposal a pseudo random number that draws a uniform random variable, i.e., one with probability distribution @@ -613,19 +580,23 @@ plt.show() Let $X$ distributed geometrically, that is -\begin{align} +$$ +\begin{aligned} \textrm{Prob}(X=i) & =(1-\lambda)\lambda^i,\quad\lambda\in(0,1), \quad i=0,1,\dots \\ & \sum_{i=0}^{\infty}\textrm{Prob}(X=i)=1\longleftrightarrow(1- \lambda)\sum_{i=0}^{\infty}\lambda^i=\frac{1-\lambda}{1-\lambda}=1 -\end{align} +\end{aligned} +$$ Its CDF is given by -\begin{align} +$$ +\begin{aligned} \textrm{Prob}(X\le i)& =(1-\lambda)\sum_{j=0}^{i}\lambda^i\\ & =(1-\lambda)[\frac{1-\lambda^{i+1}}{1-\lambda}]\\ & =1-\lambda^{i+1}\\ & =F(X)=F_i \quad -\end{align} +\end{aligned} +$$ Again, let $\tilde{U}$ follow a uniform distribution and we want to find $X$ such that $F(X)=\tilde{U}$. @@ -688,8 +659,6 @@ plt.hist(x_g, bins=150, density=True, alpha=0.6) plt.show() ``` - - ## Some Discrete Probability Distributions @@ -703,11 +672,17 @@ We'll use our code to ## Geometric distribution -$$ \textrm{Prob}(X=k)=(1-p)^{k-1}p,k=1,2, \ldots $$ +$$ +\textrm{Prob}(X=k)=(1-p)^{k-1}p,k=1,2, \ldots +$$ + $\implies$ -$$\begin{align} -\mathbb{E}(X) & =\frac{1}{p}\\\mathbb{D}(X) & =\frac{1-p}{p^2} \end{align}$$ +$$ +\begin{aligned} +\mathbb{E}(X) & =\frac{1}{p}\\\mathbb{D}(X) & =\frac{1-p}{p^2} +\end{aligned} +$$ We draw observations from the distribution and compare the sample mean and variance with the theoretical results. @@ -797,7 +772,8 @@ Its distribution is $$ \begin{aligned} X & \sim NB(r,p) \\ -\textrm{Prob}(X=k;r,p) & = \begin{pmatrix}k+r-1 \\ r-1 \end{pmatrix}p^r(1-p)^{k} \end{aligned} +\textrm{Prob}(X=k;r,p) & = \begin{pmatrix}k+r-1 \\ r-1 \end{pmatrix}p^r(1-p)^{k} +\end{aligned} $$ Here, we choose from among $k+r-1$ possible outcomes because the last draw is by definition a success. @@ -928,8 +904,6 @@ $$ f(x) = 0.0005 $$ -+++ - Let's start by generating a random sample and computing sample moments. ```{code-cell} ipython3 @@ -969,19 +943,10 @@ print("mean: ", mean) print("variance: ", var) ``` - - - - ## Matrix Representation of Some Bivariate Distributions -+++ - Let's use matrices to represent a joint distribution, conditional distribution, marginal distribution, and the mean and variance of a bivariate random variable. -+++ - - The table below illustrates a probability distribution for a bivariate random variable. $$ @@ -996,12 +961,8 @@ Marginal distributions are $$ \textrm{Prob}(X=i)=\sum_j{f_{ij}}=u_i $$ $$ \textrm{Prob}(Y=j)=\sum_i{f_{ij}}=v_j $$ - - Below we draw some samples confirm that the "sampling" distribution agrees well with the "population" distribution. -+++ - **Sample results:** ```{code-cell} ipython3 @@ -1129,9 +1090,6 @@ $$ These population objects closely resemble sample counterparts computed above. -+++ - - Let's wrap some of the functions we have used in a Python class for a general discrete bivariate joint distribution. ```{code-cell} ipython3 @@ -1287,9 +1245,6 @@ d_new.marg_dist() d_new.cond_dist() ``` - -+++ - ## A Continuous Bivariate Random Vector @@ -1301,9 +1256,7 @@ $$ $$ -\begin{equation} \frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}\exp\left[-\frac{1}{2(1-\rho^2)}\left(\frac{(x-\mu_1)^2}{\sigma_1^2}-\frac{2\rho(x-\mu_1)(y-\mu_2)}{\sigma_1\sigma_2}+\frac{(y-\mu_2)^2}{\sigma_2^2}\right)\right] -\end{equation} $$ We start with a bivariate normal distribution pinned down by @@ -1494,7 +1447,8 @@ Define a new random variable $Z=X+Y$. Evidently, $Z$ takes values from $\bar{Z}$ defined as follows: $$ -\begin{aligned} \bar{X} & =\{0,1,\ldots,I-1\};\qquad f_i= \textrm{Prob} \{X=i\}\\ +\begin{aligned} +\bar{X} & =\{0,1,\ldots,I-1\};\qquad f_i= \textrm{Prob} \{X=i\}\\ \bar{Y} & =\{0,1,\ldots,J-1\};\qquad g_j= \textrm{Prob}\{Y=j\}\\ \bar{Z}& =\{0,1,\ldots,I+J-2\};\qquad h_k= \textrm{Prob} \{X+Y=k\} \end{aligned} @@ -1523,11 +1477,7 @@ $$ f_{Z}(z)=\int_{-\infty}^{\infty} f_{X}(x) f_{Y}(z-x) dx \equiv f_{X}*g_{Y} $$ -where $ f_{X}*g_{Y}$ denotes the **convolution** of the $f_X$ and $g_Y$ functions. - - -+++ - +where $ f_{X}*g_{Y} $ denotes the **convolution** of the $f_X$ and $g_Y$ functions. ## Transition Probability Matrix @@ -1538,6 +1488,7 @@ Let $X,Y$ be discrete random variables with joint distribution $$ \textrm{Prob}\{X=i,Y=j\} = \rho_{ij} $$ + where $i = 0,\dots,I-1; j = 0,\dots,J-1$ and $$ @@ -1551,11 +1502,8 @@ $$ = \frac{\textrm{Prob}\{Y=j, X=i\}}{\textrm{Prob}\{ X=i\}} $$ -+++ - We can define a transition probability matrix - $$ p_{ij}=\textrm{Prob}\{Y=j|X=i\}= \frac{\rho_{ij}}{ \sum_{j}\rho_{ij}} $$ @@ -1578,11 +1526,8 @@ The second row is the probability of $Y=j, j=0,1$ conditional on $X=1$. Note that - $\sum_{j}\rho_{ij}= \frac{ \sum_{j}\rho_{ij}}{ \sum_{j}\rho_{ij}}=1$, so each row of $\rho$ is a probability distribution (not so for each column. -+++ - ## Coupling - Start with a joint distribution $$ @@ -1614,8 +1559,6 @@ We'll find that from two marginal distributions, can we usually construct more t Each of these joint distributions is called a **coupling** of the two martingal distributions. -+++ - Let's start with marginal distributions $$ @@ -1641,9 +1584,7 @@ $$ \end{aligned} $$ -+++ - -We construct two couplings. +We construct two couplings. The first coupling if our two marginal distributions is the joint distribution @@ -1668,8 +1609,6 @@ $$ \end{aligned} $$ -+++ - A second coupling of our two marginal distributions is the joint distribution @@ -1704,15 +1643,13 @@ Thus, multiple joint distributions $[f_{ij}]$ can have the same marginals. **Remark:** - Couplings are important in optimal transport problems and in Markov processes. -+++ - ## Copula Functions Suppose that $X_1, X_2, \dots, X_n$ are $N$ random variables and that - * their marginal distributions are $F_1(x_1), F_2(x_2),\dots, F_N(x_N)$, and - - * their joint distribution is $H(x_1,x_2,\dots,x_N)$ +* their marginal distributions are $F_1(x_1), F_2(x_2),\dots, F_N(x_N)$, and + +* their joint distribution is $H(x_1,x_2,\dots,x_N)$ Then there exists a **copula function** $C(\cdot)$ that verifies @@ -1734,8 +1671,6 @@ Thus, for given marginal distributions, we can use a copula function to determi Copula functions are often used to characterize **dependence** of random variables. -+++ - **Discrete marginal distribution** TOM -- REWRITE OR MAYBE DROP PARTS OF @@ -1962,8 +1897,6 @@ We have verified that both joint distributions, $c_1$ and $c_2$, have identical So they are both couplings of $X$ and $Y$. -+++ - ## Time Series Suppose that there are two time periods. @@ -1976,10 +1909,10 @@ Let $X(0)$ be a random variable to be realized at $t=0$, $X(1)$ be a random var Suppose that $$ -\begin{align} +\begin{aligned} \textrm{\textrm{Prob}}\{X(0)=i,X(1)=j\} &=f_{ij}≥0,i=0,……,I-1\\ \sum_{i}\sum_{j}f_{ij}&=1 -\end{align} +\end{aligned} $$ $f_{ij} $ is a joint distribution over $[X(0), X(1)]$. From 8d82c1b032a222184db0ccb7a85e225b69527457 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 28 Jun 2022 11:27:05 +1000 Subject: [PATCH 07/13] fix equations in ar1_turningpoint --- lectures/ar1_bayes.md | 2 -- lectures/ar1_turningpts.md | 26 +++++++++++++------------- 2 files changed, 13 insertions(+), 15 deletions(-) diff --git a/lectures/ar1_bayes.md b/lectures/ar1_bayes.md index 53c3d5b58..66b7a85f3 100644 --- a/lectures/ar1_bayes.md +++ b/lectures/ar1_bayes.md @@ -13,8 +13,6 @@ kernelspec: ## Posterior Distributions for AR(1) Parameters - - We'll begin with some Python imports. diff --git a/lectures/ar1_turningpts.md b/lectures/ar1_turningpts.md index 28d3e534d..a8b256884 100644 --- a/lectures/ar1_turningpts.md +++ b/lectures/ar1_turningpts.md @@ -67,45 +67,45 @@ Consider the univariate AR(1) model: $$ y_{t+1} = \rho y_t + \sigma \epsilon_{t+1}, \quad t \geq 0 -$$ (eq1) +$$ (ar1-tp-eq1) where the scalars $\rho$ and $\sigma$ satisfy $|\rho| < 1$ and $\sigma > 0$; $\{\epsilon_{t+1}\}$ is a sequence of i.i.d. normal random variables with mean $0$ and variance $1$. The initial condition $y_{0}$ is a known number. -Equation {eq}`eq1` implies that for $t \geq 0$, the conditional density of $y_{t+1}$ is +Equation {eq}`ar1-tp-eq1` implies that for $t \geq 0$, the conditional density of $y_{t+1}$ is $$ f(y_{t+1} | y_{t}; \rho, \sigma) \sim {\mathcal N}(\rho y_{t}, \sigma^2) \ -$$ (eq2) +$$ (ar1-tp-eq2) -Further, equation {eq}`eq1` also implies that for $t \geq 0$, the conditional density of $y_{t+j}$ for $j \geq 1$ is +Further, equation {eq}`ar1-tp-eq1` also implies that for $t \geq 0$, the conditional density of $y_{t+j}$ for $j \geq 1$ is $$ f(y_{t+j} | y_{t}; \rho, \sigma) \sim {\mathcal N}\left(\rho^j y_{t}, \sigma^2 \frac{1 - \rho^{2j}}{1 - \rho^2} \right) -$$ (eq3) +$$ (ar1-tp-eq3) -The predictive distribution {eq}`eq3` that assumes that the parameters $\rho, \sigma$ are known, which we express +The predictive distribution {eq}`ar1-tp-eq3` that assumes that the parameters $\rho, \sigma$ are known, which we express by conditioning on them. We also want to compute a predictive distribution that does not condition on $\rho,\sigma$ but instead takes account of our uncertainty about them. -We form this predictive distribution by integrating {eq}`eq3` with respect to a joint posterior distribution $\pi_t(\rho,\sigma | y^t )$ +We form this predictive distribution by integrating {eq}`ar1-tp-eq3` with respect to a joint posterior distribution $\pi_t(\rho,\sigma | y^t )$ that conditions on an observed history $y^t = \{y_s\}_{s=0}^t$: $$ f(y_{t+j} | y^t) = \int f(y_{t+j} | y_{t}; \rho, \sigma) \pi_t(\rho,\sigma | y^t ) d \rho d \sigma -$$ (eq4) +$$ (ar1-tp-eq4) -Predictive distribution {eq}`eq3` assumes that parameters $(\rho,\sigma)$ are known. +Predictive distribution {eq}`ar1-tp-eq3` assumes that parameters $(\rho,\sigma)$ are known. -Predictive distribution {eq}`eq4` assumes that parameters $(\rho,\sigma)$ are uncertain, but have known probability distribution $\pi_t(\rho,\sigma | y^t )$. +Predictive distribution {eq}`ar1-tp-eq4` assumes that parameters $(\rho,\sigma)$ are uncertain, but have known probability distribution $\pi_t(\rho,\sigma | y^t )$. We also want to compute some predictive distributions of "sample path statistics" that might include, for example @@ -129,7 +129,7 @@ First, we'll simulate a sample path from which to launch our forecasts. In addition to plotting the sample path, under the assumption that the true parameter values are known, we'll plot $.9$ and $.95$ coverage intervals using conditional distribution -{eq}`eq3` described above. +{eq}`ar1-tp-eq3` described above. We'll also plot a bunch of samples of sequences of future values and watch where they fall relative to the coverage interval. @@ -461,9 +461,9 @@ plt.show() ## Extended Wecker Method Now we apply we apply our "extended" Wecker method based on predictive densities of $y$ defined by -{eq}`eq4` that acknowledge posterior uncertainty in the parameters $\rho, \sigma$. +{eq}`ar1-tp-eq4` that acknowledge posterior uncertainty in the parameters $\rho, \sigma$. -To approximate the intergration on the right side of {eq}`eq4`, we repeately draw parameters from the joint posterior distribution each time we simulate a sequence of future values from model {eq}`eq1`. +To approximate the intergration on the right side of {eq}`ar1-tp-eq4`, we repeately draw parameters from the joint posterior distribution each time we simulate a sequence of future values from model {eq}`ar1-tp-eq1`. ```{code-cell} ipython3 def plot_extended_Wecker(post_samples, initial_path, N, ax): From e22cc2f6d572ed091a921ef1f4a3d5b34750ebcc Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 28 Jun 2022 11:29:12 +1000 Subject: [PATCH 08/13] FIX: header in ar1_bayes --- lectures/ar1_bayes.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lectures/ar1_bayes.md b/lectures/ar1_bayes.md index 66b7a85f3..f49adc3de 100644 --- a/lectures/ar1_bayes.md +++ b/lectures/ar1_bayes.md @@ -11,7 +11,7 @@ kernelspec: name: python3 --- -## Posterior Distributions for AR(1) Parameters +# Posterior Distributions for AR(1) Parameters We'll begin with some Python imports. @@ -172,7 +172,7 @@ Now we shall use Bayes' law to construct a posterior distribution, conditioning First we'll use **pymc4**. -### `PyMC` Implementation +## `PyMC` Implementation For a normal distribution in `pymc`, $var = 1/\tau = \sigma^{2}$. @@ -284,7 +284,7 @@ We'll return to this issue after we use `numpyro` to compute posteriors under ou We'll now repeat the calculations using `numpyro`. -### `Numpyro` Implementation +## `Numpyro` Implementation ```{code-cell} ipython3 From 6be8a4c842bc7f7ac04fb13fa67ceda156214bd2 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 28 Jun 2022 11:36:09 +1000 Subject: [PATCH 09/13] FIX: $ duplicate --- lectures/prob_matrix.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md index a68cd3d91..229da6bbf 100644 --- a/lectures/prob_matrix.md +++ b/lectures/prob_matrix.md @@ -353,8 +353,6 @@ $$ \end{aligned} $$ - - **Digression:** If two random variables $X,Y$ are continuous and have joint density $f(x,y)$, then marginal distributions can be computed by $$ @@ -493,7 +491,7 @@ Thus, suppose that It turns out that if we use draw uniform random numbers $U$ and then compute $X$ from $$ -X=F^{-1}(U)$, +X=F^{-1}(U), $$ then $X$ ia a random variable with CDF $F_X(x)=F(x)=\textrm{Prob}\{X\le x\}$. From 6c9e8f60c935d89bea7803c9250bd30429ee3d01 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 28 Jun 2022 11:42:35 +1000 Subject: [PATCH 10/13] FIX: math --- lectures/prob_matrix.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md index 229da6bbf..6d22133a0 100644 --- a/lectures/prob_matrix.md +++ b/lectures/prob_matrix.md @@ -657,7 +657,7 @@ plt.hist(x_g, bins=150, density=True, alpha=0.6) plt.show() ``` -## Some Discrete Probability Distributions +## Some Discrete Probability Distributions Let's write some Python code to compute means and variances of soem univariate random variables. @@ -1359,8 +1359,8 @@ The population conditional distribution is $$ \begin{aligned} -[X|Y & =y ]\sim \mathbb{N}\bigg[\mu_X+\rho\sigma_X\frac{y-\mu_Y}{\sigma_Y},\sigma_X^2(1-\rho^2)\bigg] \\ -[Y|X= &x ]\sim \mathbb{N}\bigg[\mu_Y+\rho\sigma_Y\frac{x-\mu_X}{\sigma_X},\sigma_Y^2(1-\rho^2)\bigg] +[X|Y &= y ]\sim \mathbb{N}\bigg[\mu_X+\rho\sigma_X\frac{y-\mu_Y}{\sigma_Y},\sigma_X^2(1-\rho^2)\bigg] \\ +[Y|X &= x ]\sim \mathbb{N}\bigg[\mu_Y+\rho\sigma_Y\frac{x-\mu_X}{\sigma_X},\sigma_Y^2(1-\rho^2)\bigg] \end{aligned} $$ From 743c2c38ebe37de4ae78e355f41643ae62a1d5ea Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 28 Jun 2022 11:50:30 +1000 Subject: [PATCH 11/13] fix math --- lectures/prob_matrix.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md index 6d22133a0..4a511b657 100644 --- a/lectures/prob_matrix.md +++ b/lectures/prob_matrix.md @@ -1358,7 +1358,7 @@ plt.show() The population conditional distribution is $$ -\begin{aligned} +\begin{aligned} \\ [X|Y &= y ]\sim \mathbb{N}\bigg[\mu_X+\rho\sigma_X\frac{y-\mu_Y}{\sigma_Y},\sigma_X^2(1-\rho^2)\bigg] \\ [Y|X &= x ]\sim \mathbb{N}\bigg[\mu_Y+\rho\sigma_Y\frac{x-\mu_X}{\sigma_X},\sigma_Y^2(1-\rho^2)\bigg] \end{aligned} From 51c78b3edbea77dc2239878633b96b389542f551 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 28 Jun 2022 13:55:39 +1000 Subject: [PATCH 12/13] Adjust ar1_turningpts --- lectures/ar1_turningpts.md | 42 ++++++++++++-------------------------- 1 file changed, 13 insertions(+), 29 deletions(-) diff --git a/lectures/ar1_turningpts.md b/lectures/ar1_turningpts.md index a8b256884..d85fc5e52 100644 --- a/lectures/ar1_turningpts.md +++ b/lectures/ar1_turningpts.md @@ -31,7 +31,7 @@ We consider two sorts of statistics: - prospective values $y_{t+j}$ of a random process $\{y_t\}$ that is governed by the AR(1) process -- sample path properties that are defined as non-linear functions of future values $\{y_{t+j}\}_{j\geq 1}$ at time $t$. +- sample path properties that are defined as non-linear functions of future values $\{y_{t+j}\}_{j \geq 1}$ at time $t$. **Sample path properties** are things like "time to next turning point" or "time to next recession" @@ -60,8 +60,7 @@ logger = logging.getLogger('pymc') logger.setLevel(logging.CRITICAL) ``` -## A Univariate First-Order Autoregressive Process - +## A Univariate First-Order Autoregressive Process Consider the univariate AR(1) model: @@ -72,7 +71,7 @@ $$ (ar1-tp-eq1) where the scalars $\rho$ and $\sigma$ satisfy $|\rho| < 1$ and $\sigma > 0$; $\{\epsilon_{t+1}\}$ is a sequence of i.i.d. normal random variables with mean $0$ and variance $1$. -The initial condition $y_{0}$ is a known number. +The initial condition $y_{0}$ is a known number. Equation {eq}`ar1-tp-eq1` implies that for $t \geq 0$, the conditional density of $y_{t+1}$ is @@ -80,34 +79,29 @@ $$ f(y_{t+1} | y_{t}; \rho, \sigma) \sim {\mathcal N}(\rho y_{t}, \sigma^2) \ $$ (ar1-tp-eq2) - Further, equation {eq}`ar1-tp-eq1` also implies that for $t \geq 0$, the conditional density of $y_{t+j}$ for $j \geq 1$ is - $$ f(y_{t+j} | y_{t}; \rho, \sigma) \sim {\mathcal N}\left(\rho^j y_{t}, \sigma^2 \frac{1 - \rho^{2j}}{1 - \rho^2} \right) $$ (ar1-tp-eq3) - The predictive distribution {eq}`ar1-tp-eq3` that assumes that the parameters $\rho, \sigma$ are known, which we express by conditioning on them. We also want to compute a predictive distribution that does not condition on $\rho,\sigma$ but instead takes account of our uncertainty about them. -We form this predictive distribution by integrating {eq}`ar1-tp-eq3` with respect to a joint posterior distribution $\pi_t(\rho,\sigma | y^t )$ +We form this predictive distribution by integrating {eq}`ar1-tp-eq3` with respect to a joint posterior distribution $\pi_t(\rho,\sigma | y^t)$ that conditions on an observed history $y^t = \{y_s\}_{s=0}^t$: $$ f(y_{t+j} | y^t) = \int f(y_{t+j} | y_{t}; \rho, \sigma) \pi_t(\rho,\sigma | y^t ) d \rho d \sigma $$ (ar1-tp-eq4) - - Predictive distribution {eq}`ar1-tp-eq3` assumes that parameters $(\rho,\sigma)$ are known. Predictive distribution {eq}`ar1-tp-eq4` assumes that parameters $(\rho,\sigma)$ are uncertain, but have known probability distribution $\pi_t(\rho,\sigma | y^t )$. -We also want to compute some predictive distributions of "sample path statistics" that might include, for example +We also want to compute some predictive distributions of "sample path statistics" that might include, for example - the time until the next "recession", - the minimum value of $Y$ over the next 8 periods, @@ -121,8 +115,6 @@ To accomplish that for situations in which we are uncertain about parameter valu - for each draw $n=0,1,...,N$, simulate a "future path" of length $T_1$ with parameters $\left(\rho_n,\sigma_n\right)$ and compute our three "sample path statistics"; - finally, plot the desired statistics from the $N$ samples as an empirical distribution. - - ## Implementation First, we'll simulate a sample path from which to launch our forecasts. @@ -133,8 +125,6 @@ we'll plot $.9$ and $.95$ coverage intervals using conditional distribution We'll also plot a bunch of samples of sequences of future values and watch where they fall relative to the coverage interval. - - ```{code-cell} ipython3 def AR1_simulate(rho, sigma, y0, T): @@ -204,13 +194,13 @@ Wecker {cite}`wecker1979predicting` proposed using simulation techniques to char He called these functions "path properties" to contrast them with properties of single data points. -He studied two special prospective path properties of a given series $\{y_t\}$. +He studied two special prospective path properties of a given series $\{y_t\}$. -The first was **time until the next turning point** +The first was **time until the next turning point** - * he defined a **"turning point"** to be the date of the second of two successive declines in $y$. +* he defined a **"turning point"** to be the date of the second of two successive declines in $y$. -To examine this statistic, let $Z$ be an indicator process +To examine this statistic, let $Z$ be an indicator process $$ Z_t(Y(\omega)) := \left\{ @@ -226,8 +216,8 @@ $$ W_t(\omega):= \inf \{ k\geq 1 \mid Z_{t+k}(\omega) = 1\} $$ -Wecker {cite}`wecker1979predicting` also studied **the minimum value of $Y$ over the next 8 quarters** -which can be defined as the random variable +Wecker {cite}`wecker1979predicting` also studied **the minimum value of $Y$ over the next 8 quarters** +which can be defined as the random variable $$ M_t(\omega) := \min \{ Y_{t+1}(\omega); Y_{t+2}(\omega); \dots; Y_{t+8}(\omega)\} @@ -258,14 +248,12 @@ $$ This is designed to express the event - - ``after one or two decrease(s), $Y$ will grow for two consecutive quarters'' - +- ``after one or two decrease(s), $Y$ will grow for two consecutive quarters'' Following {cite}`wecker1979predicting`, we can use simulations to calculate probabilities of $P_t$ and $N_t$ for each period $t$. ## A Wecker-Like Algorithm - The procedure consists of the following steps: * index a sample path by $\omega_i$ @@ -278,9 +266,7 @@ $$ * for each path $\omega_i$, compute the associated value of $W_t(\omega_i), W_{t+1}(\omega_i), \dots$ -* consider the sets $ -\{W_t(\omega_i)\}^{T}_{i=1}, \ \{W_{t+1}(\omega_i)\}^{T}_{i=1}, \ \dots, \ \{W_{t+N}(\omega_i)\}^{T}_{i=1} -$ as samples from the predictive distributions $f(W_{t+1} \mid \mathcal y_t, \dots)$, $f(W_{t+2} \mid y_t, y_{t-1}, \dots)$, $\dots$, $f(W_{t+N} \mid y_t, y_{t-1}, \dots)$. +* consider the sets $\{W_t(\omega_i)\}^{T}_{i=1}, \ \{W_{t+1}(\omega_i)\}^{T}_{i=1}, \ \dots, \ \{W_{t+N}(\omega_i)\}^{T}_{i=1}$ as samples from the predictive distributions $f(W_{t+1} \mid \mathcal y_t, \dots)$, $f(W_{t+2} \mid y_t, y_{t-1}, \dots)$, $\dots$, $f(W_{t+N} \mid y_t, y_{t-1}, \dots)$. ## Using Simulations to Approximate a Posterior Distribution @@ -334,7 +320,6 @@ The graphs on the left portray posterior marginal distributions. ## Calculating Sample Path Statistics - Our next step is to prepare Python codeto compute our sample path statistics. ```{code-cell} ipython3 @@ -531,4 +516,3 @@ plot_extended_Wecker(post_samples, initial_path, 1000, ax) plt.legend() plt.show() ``` - From 516d0c11d48dea50a8d378d5be93e028dc984c97 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 28 Jun 2022 14:51:34 +1000 Subject: [PATCH 13/13] Remove offending TeX for pdflatex builds --- lectures/ar1_turningpts.md | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/lectures/ar1_turningpts.md b/lectures/ar1_turningpts.md index d85fc5e52..34d3eb490 100644 --- a/lectures/ar1_turningpts.md +++ b/lectures/ar1_turningpts.md @@ -202,13 +202,13 @@ The first was **time until the next turning point** To examine this statistic, let $Z$ be an indicator process -$$ + Then the random variable **time until the next turning point** is defined as the following **stopping time** with respect to $Z$: @@ -226,25 +226,25 @@ $$ It is interesting to study yet another possible concept of a **turning point**. Thus, let - + Define a **positive turning point today or tomorrow** statistic as -$$ + This is designed to express the event @@ -264,9 +264,9 @@ $$ Y(\omega_i) = \left\{ Y_{t+1}(\omega_i), Y_{t+2}(\omega_i), \dots, Y_{t+N}(\omega_i)\right\}_{i=1}^I $$ -* for each path $\omega_i$, compute the associated value of $W_t(\omega_i), W_{t+1}(\omega_i), \dots$ +* for each path $\omega_i$, compute the associated value of $W_t(\omega_i), W_{t+1}(\omega_i), \dots$ -* consider the sets $\{W_t(\omega_i)\}^{T}_{i=1}, \ \{W_{t+1}(\omega_i)\}^{T}_{i=1}, \ \dots, \ \{W_{t+N}(\omega_i)\}^{T}_{i=1}$ as samples from the predictive distributions $f(W_{t+1} \mid \mathcal y_t, \dots)$, $f(W_{t+2} \mid y_t, y_{t-1}, \dots)$, $\dots$, $f(W_{t+N} \mid y_t, y_{t-1}, \dots)$. +* consider the sets $\{W_t(\omega_i)\}^{T}_{i=1}, \ \{W_{t+1}(\omega_i)\}^{T}_{i=1}, \ \dots, \ \{W_{t+N}(\omega_i)\}^{T}_{i=1}$ as samples from the predictive distributions $f(W_{t+1} \mid \mathcal y_t, \dots)$, $f(W_{t+2} \mid y_t, y_{t-1}, \dots)$, $\dots$, $f(W_{t+N} \mid y_t, y_{t-1}, \dots)$. ## Using Simulations to Approximate a Posterior Distribution @@ -275,7 +275,6 @@ The next code cells use `pymc` to compute the time $t$ posterior distribution of Note that in defining the likelihood function, we choose to condition on the initial value $y_0$. - ```{code-cell} ipython3 def draw_from_posterior(sample): """