Skip to content

Commit

Permalink
Extend optimization section and fix wrong link
Browse files Browse the repository at this point in the history
  • Loading branch information
krasserm committed Dec 12, 2020
1 parent a3479de commit ac5b273
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions gaussian-processes/gaussian_processes_sparse.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,11 @@
"p(\\mathbf{f}_* \\mid \\mathbf{y}) = \\int p(\\mathbf{f}_* \\mid \\mathbf{f}) p(\\mathbf{f} \\mid \\mathbf{y}) d\\mathbf{f} \\tag{4}\n",
"$$\n",
"\n",
"where the conditioning on inputs $\\mathbf{X}$ and $\\mathbf{X}_*$ has been made implicit. The second term inside the integral is the posterior over the training latent variables $\\mathbf{f}$ conditioned on observations $\\mathbf{y}$, the first term is the posterior over predictions $\\mathbf{f}_*$ conditioned on latent training variables $\\mathbf{f}$ (see also equation $(3)$ in [this article](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes_classification.ipynb)). Both terms are intractable to compute for larger training datasets for reasons explained above.\n",
"where the conditioning on inputs $\\mathbf{X}$ and $\\mathbf{X}_*$ has been made implicit. The second term inside the integral is the posterior over the training latent variables $\\mathbf{f}$ conditioned on observations $\\mathbf{y}$, the first term is the posterior over predictions $\\mathbf{f}_*$ conditioned on latent training variables $\\mathbf{f}$ (see also equation $(3)$ in [this article](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb)). Both terms are intractable to compute for larger training datasets for reasons explained above.\n",
"\n",
"### Sparse Gaussian processes\n",
"\n",
"Suppose there is a small set of $m$ inducing variables $\\mathbf{f}_m$ evaluated at inputs $\\mathbf{X}_m$ that describe the function to be modeled \"sufficiently well\" then we could use them as approximation to $\\mathbf{f}$ and $\\mathbf{X}$ and define an approximate posterior:\n",
"Suppose there is a small set of $m$ inducing variables $\\mathbf{f}_m$ evaluated at inputs $\\mathbf{X}_m$ that describe the function to be modeled \"sufficiently well\" then we could use them as approximation to $\\mathbf{f}$ and $\\mathbf{X}$ and define an approximate posterior\n",
"\n",
"$$\n",
"q(\\mathbf{f}_*) = \\int p(\\mathbf{f}_* \\mid \\mathbf{f}_m) \\phi(\\mathbf{f}_m) d\\mathbf{f}_m \\tag{5}\n",
Expand Down Expand Up @@ -258,7 +258,7 @@
"\\end{align*}\n",
"$$\n",
"\n",
"where $\\mathbf{c} = \\mathbf{L_B}^{-1} \\mathbf{A} \\mathbf{y} \\sigma_y^{-1}$, $\\mathbf{B} = \\mathbf{I} + \\mathbf{A}\\mathbf{A}^T$ and $\\mathbf{A} = \\mathbf{L}^{-1} \\mathbf{K}_{mn} \\sigma_y^{-1}$. Lower-triangular matrices $\\mathbf{L_B}$ and $\\mathbf{L}$ are obtained from a [Cholesky decomposition](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $\\mathbf{B} = \\mathbf{L_B} \\mathbf{L_B}^T$ and $\\mathbf{K}_{mm} = \\mathbf{L} \\mathbf{L}^T$, respectively. $\\mathbf{c}$ and $\\mathbf{A}$ are obtained by solving the equations $\\mathbf{L_B} \\mathbf{c} = \\mathbf{A} \\mathbf{y} \\sigma_y^{-1}$ and $\\mathbf{L} \\mathbf{A} = \\mathbf{K}_{mn} \\sigma_y^{-1}$, respectively. Using these definitions, a numerically stable implementation of a negative lower bound (`nlb`) is straightforward."
"where $\\mathbf{c} = \\mathbf{L_B}^{-1} \\mathbf{A} \\mathbf{y} \\sigma_y^{-1}$, $\\mathbf{B} = \\mathbf{I} + \\mathbf{A}\\mathbf{A}^T$ and $\\mathbf{A} = \\mathbf{L}^{-1} \\mathbf{K}_{mn} \\sigma_y^{-1}$. Lower-triangular matrices $\\mathbf{L_B}$ and $\\mathbf{L}$ are obtained from a [Cholesky decomposition](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $\\mathbf{B} = \\mathbf{L_B} \\mathbf{L_B}^T$ and $\\mathbf{K}_{mm} = \\mathbf{L} \\mathbf{L}^T$, respectively. $\\mathbf{c}$ and $\\mathbf{A}$ are obtained by solving the equations $\\mathbf{L_B} \\mathbf{c} = \\mathbf{A} \\mathbf{y} \\sigma_y^{-1}$ and $\\mathbf{L} \\mathbf{A} = \\mathbf{K}_{mn} \\sigma_y^{-1}$, respectively. The log determinant of $\\mathbf{B}$ is $2 \\sum_{i=1}^m \\log {L_B}_{ii}$ as explained in [this post](https://math.stackexchange.com/a/3211219/648651), for example. Using these definitions, a numerically stable implementation of a negative lower bound (`nlb`) is straightforward."
]
},
{
Expand Down

0 comments on commit ac5b273

Please sign in to comment.