Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions lectures/Discriminative Classification.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
### A Pluto.jl notebook ###
# v0.20.19
# v0.20.21

#> [frontmatter]
#> image = "https://github.com/bmlip/course/blob/v2/assets/figures/Figure4.9.png?raw=true"
Expand Down Expand Up @@ -351,25 +351,26 @@ md"""
The **Laplace Approximation** approximates a function by a Gaussian-shaped function. In this case, we will approximate the weight posterior ``p(w|D)`` by a Gaussian distribution

```math
q(w) = \mathcal{N}\left(w\,|\, w_{N}, S_N\right) \tag{B-4.144}
q(w) = \mathcal{N}\left(w\,|\, m_{N}, S_N\right) \tag{B-4.144}
```

with

```math
\begin{align}
w_N &= \arg\max_w \log p(w|D) \\
m_N &= \arg\max_w \log p(w|D) \\
S_N^{-1} &= S_0^{-1} + \sum_n \sigma_n (1-\sigma_n) x_n x_n^T \tag{B-4.143}
\end{align}
```
where we used short-hand ``\sigma_n = \sigma\left((2y_n-1) w_{N}^T x_n\right)``.
where we used short-hand ``\sigma_n = \sigma\left((2y_n-1) w^T x_n\right)``.

If we substitute the Gaussian approximation from Eq. B-4.143 into the expression for the predictive class distribution (Eq. B-4.145), we obtain (after some additional approximations):

```math
\begin{align*}
p(y_\bullet = 1 \mid x_\bullet, D) &= \int p(y_\bullet = 1 \,|\, x_\bullet, w) \cdot p(w\,|\, D) \,\mathrm{d}w \\
&\approx \int \sigma(w^T x_\bullet) \cdot \mathcal{N}\left(w \,|\, w_N, S_N\right) \,\mathrm{d}w \tag{B-4.145} \\
&\approx \int p(y_\bullet = 1 \,|\, x_\bullet, w) \cdot q(w) \,\mathrm{d}w \\
&= \int \sigma(w^T x_\bullet) \cdot \mathcal{N}\left(w \,|\, m_N, S_N\right) \,\mathrm{d}w \tag{B-4.145} \\
&\approx \Phi\left( \frac{\mu_a}{\sqrt(\lambda^{-2} +\sigma_a^2)}\right) \tag{B-4.152}
\end{align*}
```
Expand All @@ -379,7 +380,7 @@ where
```math
\begin{align}
\lambda^2 &= \pi / 8 \\
\mu_a &= w^T_{N} x_\bullet \tag{B-4.149} \\
\mu_a &= m^T_{N} x_\bullet \tag{B-4.149} \\
\sigma_a^2 &= x^T_\bullet S_N x_\bullet \tag{B-4.150}
\end{align}
```
Expand Down
10 changes: 5 additions & 5 deletions minis/Laplace Approximation.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
### A Pluto.jl notebook ###
# v0.20.19
# v0.20.21

#> [frontmatter]
#> image = "https://i.imgur.com/AUaQapv.png"
Expand Down Expand Up @@ -299,10 +299,10 @@ OK, not as straightforward as promised, but, in principle, this is standard matr

# ╔═╡ 2096c0e1-c4ff-48ea-92d7-901600a81478
md"""
We can now use the gradient ``\nabla_w \log p(w | D)`` to find the **mode** ``w_{N}`` of ``\log p(w|D)`` (e.g., by some gradient-based optimization procedure) and then use the Hessian ``-\left. \nabla\nabla_w \log p(w | D)\right|_{w=w_N}``, evaluated at ``w=w_N``, to get the variance of ``q(w)``, leading to a **Gaussian weights posterior**
We can now use the gradient ``\nabla_w \log p(w | D)`` to find the **mode** ``m_{N}`` of ``\log p(w|D)`` (e.g., by some gradient-based optimization procedure) and then use the Hessian ``-\left. \nabla\nabla_w \log p(w | D)\right|_{w=m_N}``, evaluated at ``w=m_N``, to get the variance of ``q(w)``, leading to a **Gaussian weights posterior**

```math
q(w) = \mathcal{N}\left(w\,|\, w_{N}, S_N\right) \tag{B-4.144}
q(w) = \mathcal{N}\left(w\,|\, m_{N}, S_N\right) \tag{B-4.144}
```

with
Expand All @@ -324,7 +324,7 @@ For a new observation ``x_\bullet``, the class probability then evaluates to
```math
\begin{align*}
p(y_\bullet = 1 \mid x_\bullet, D) &= \int p(y_\bullet = 1 \,|\, x_\bullet, w) \cdot p(w\,|\, D) \,\mathrm{d}w \\
&\approx \int \sigma(w^T x_\bullet) \cdot \mathcal{N}\left(w \,|\, w_N, S_N\right) \,\mathrm{d}w \tag{B-4.145} \\
&\approx \int \sigma(w^T x_\bullet) \cdot \mathcal{N}\left(w \,|\, m_N, S_N\right) \,\mathrm{d}w \tag{B-4.145} \\
&\approx \Phi\left( \frac{\mu_a}{\sqrt(\lambda^{-2} +\sigma_a^2)}\right) \tag{B-4.152}
\end{align*}
```
Expand All @@ -334,7 +334,7 @@ where
```math
\begin{align}
\lambda^2 &= \pi / 8 \\
\mu_a &= w^T_{N} x_\bullet \tag{B-4.149} \\
\mu_a &= m^T_{N} x_\bullet \tag{B-4.149} \\
\sigma_a^2 &= x^T_\bullet S_N x_\bullet \tag{B-4.150}
\end{align}
```
Expand Down
Loading