MLMC: Machine Learning Monte Carlo

Sam Foreman 2024-04-03

MLMC: Machine Learning Monte Carlo
for Lattice Gauge Theory

Sam Foreman
Xiao-Yong Jin, James C. Osborn
saforem2/{lattice23, l2hmc-qcd}

2023-07-31 @ Lattice 2023

Overview

Background: {MCMC,HMC}
L2HMC: Generalizing MD
- 4D $SU(3)$ Model
- Results
References
Extras

Markov Chain Monte Carlo (MCMC)

[!NOTE]

Goal

Generate independent samples ${x_{i}}$, such that¹ $${x_{i}} \sim p(x) \propto e^{-S(x)}$$ where $S(x)$ is the action (or potential energy)

Want to calculate observables $\mathcal{O}$:
$\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x), p(x)}$

If these were independent, we could approximate: $\left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}{n=1}\mathcal{O}(x{n})$
$$\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}$$

saforem2/lattice23

Markov Chain Monte Carlo (MCMC)

[!NOTE]

Goal

Generate independent samples ${x_{i}}$, such that² $${x_{i}} \sim p(x) \propto e^{-S(x)}$$ where $S(x)$ is the action (or potential energy)

Want to calculate observables $\mathcal{O}$:
$\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x), p(x)}$

Instead, nearby configs are correlated, and we incur a factor of $\textcolor{#FF5252}{\tau^{\mathcal{O}}{\mathrm{int}}}$: $$\sigma{\mathcal{O}}^{2} = \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}$$

saforem2/lattice23

Hamiltonian Monte Carlo (HMC)

Want to (sequentially) construct a chain of states: $$x_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}$$

such that, as $N \rightarrow \infty$: $$\left{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}$$

Tip

Trick

Introduce fictitious momentum $v \sim \mathcal{N}(0, \mathbb{1})$
- Normally distributed independent of $x$, i.e. $$\begin{align*} p(x, v) &\textcolor{#02b875}{=} p(x),p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v} = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]} \textcolor{#02b875}{=} e^{-H(x, v)} \end{align*}$$

Hamiltonian Monte Carlo (HMC)

Idea: Evolve the $(\dot{x}, \dot{v})$ system to get new states ${x_{i}}$❗
Write the joint distribution $p(x, v)$: $$ p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)} $$

[!TIP]

Hamiltonian Dynamics

$H = S[x] + \frac{1}{2} v^{T} v \Longrightarrow$ $$\dot{x} = +\partial_{v} H, ,,\dot{v} = -\partial_{x} H$$

Leapfrog Integrator (HMC)

[!TIP]

Hamiltonian Dynamics

$\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)$

[!NOTE]

Leapfrog Step

input $,\left(x, v\right) \rightarrow \left(x', v'\right),$ output

$$\begin{align*} \tilde{v} &:= \textcolor{#F06292}{\Gamma}(x, v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ x' &:= \textcolor{#FD971F}{\Lambda}(x, \tilde{v}) , = x + \varepsilon , \tilde{v} \\ v' &:= \textcolor{#F06292}{\Gamma}(x', \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x') \end{align*}$$

[!WARNING]

Warning!

Resample $v_{0} \sim \mathcal{N}(0, \mathbb{1})$
at the beginning of each trajectory

Note: $\partial_{x} S(x)$ is the force

HMC Update

We build a trajectory of $N_{\mathrm{LF}}$ leapfrog steps³ $$\begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}$$
And propose $x'$ as the next state in our chain

$$\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}$$

We then accept / reject $x'$ using Metropolis-Hastings criteria,
$A(x'|x) = \min\left{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right}$

HMC Demo

Issues with HMC

What do we want in a good sampler?
- Fast mixing (small autocorrelations)
- Fast burn-in (quick convergence)
Problems with HMC:
- Energy levels selected randomly $\rightarrow$ slow mixing
- Cannot easily traverse low-density zones $\rightarrow$ slow convergence

$HMC Samples with $\varepsilon=0.25$$

$HMC Samples with $\varepsilon=0.5$$

Topological Freezing

Topological Charge: $$Q = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor \in \mathbb{Z}$$

note: $\left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi \left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor$

[!IMPORTANT]

Critical Slowing Down

$Q$ gets stuck!

as $\beta\longrightarrow \infty$:

$Q \longrightarrow \text{const.}$

$\delta Q = \left(Q^{\ast} - Q\right) \rightarrow 0 \textcolor{#FF5252}{\Longrightarrow}$

# configs required to estimate errors
grows exponentially: $\tau_{\mathrm{int}}^{Q} \longrightarrow \infty$

$Note \delta Q \rightarrow 0 at increasing \beta$

Can we do better?

Introduce two (invertible NNs) vNet and xNet⁴:
- vNet: $(x, F) \longrightarrow \left(s_{v},, t_{v},, q_{v}\right)$
- xNet: $(x, v) \longrightarrow \left(s_{x},, t_{x},, q_{x}\right)$

Use these $(s, t, q)$ in the generalized MD update:
- $\Gamma_{\theta}^{\pm}$ $: ({x}, \textcolor{#07B875}{v}) \xrightarrow[]{\textcolor{#F06292}{s_{v}, t_{v}, q_{v}}} (x, \textcolor{#07B875}{v'})$
- $\Lambda_{\theta}^{\pm}$ $: (\textcolor{#AE81FF}{x}, v) \xrightarrow[]{\textcolor{#FD971F}{s_{x}, t_{x}, q_{x}}} (\textcolor{#AE81FF}{x'}, v)$

L2HMC: Generalizing the MD Update

[!NONE]

L2HMC Update

Introduce $d \sim \mathcal{U}(\pm)$ to determine the direction of our update

$\textcolor{#07B875}{v'} =$ $\Gamma^{\pm}$$({x}, \textcolor{#07B875}{v})$ $\hspace{46pt}$ update $v$

$\textcolor{#AE81FF}{x'} =$ $x_{B}$$,+,$$\Lambda^{\pm}$$($$x_{A}$$, {v'})$ $\hspace{10pt}$ update first half: $x_{A}$

$\textcolor{#AE81FF}{x''} =$ $x'{A}$$,+,$$\Lambda^{\pm}$$($$x'{B}$$, {v'})$ $\hspace{8pt}$ update other half: $x_{B}$

$\textcolor{#07B875}{v''} =$ $\Gamma^{\pm}$$({x''}, \textcolor{#07B875}{v'})$ $\hspace{36pt}$ update $v$

[!NONE]

🎲 Re-Sampling

Resample both $v\sim \mathcal{N}(0, 1)$, and $d \sim \mathcal{U}(\pm)$ at the beginning of each trajectory

To ensure ergodicity + reversibility, we split the $x$ update into sequential (complementary) updates

Introduce directional variable $d \sim \mathcal{U}(\pm)$, resampled at the beginning of each trajectory:

Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$, i.e. $$\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)$$

L2HMC: Leapfrog Layer

L2HMC Update

[!NONE]

Algorithm

input: $x$

Resample: $\textcolor{#07B875}{v} \sim \mathcal{N}(0, \mathbb{1})$; $,,{d\sim\mathcal{U}(\pm)}$

Construct initial state: $\textcolor{#939393}{\xi} =(\textcolor{#AE81FF}{x}, \textcolor{#07B875}{v}, {\pm})$

forward: Generate proposal $\xi'$ by passing initial $\xi$ through $N_{\mathrm{LF}}$ leapfrog layers
$$\textcolor{#939393} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#f8f8f8}{\xi'} := (\textcolor{#AE81FF}{x''}, \textcolor{#07B875}{v''})$$

Accept / Reject: $$\begin{equation*} A({\textcolor{#f8f8f8}{\xi'}}|{\textcolor{#939393}{\xi}})= \mathrm{min}\left{1, \frac{\pi(\textcolor{#f8f8f8}{\xi'})}{\pi(\textcolor{#939393}{\xi})} \left| \mathcal{J}\left(\textcolor{#f8f8f8}{\xi'},\textcolor{#939393}{\xi}\right)\right| \right} \end{equation*}$$

backward (if training):

Evaluate the loss function⁵ $\mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#f8f8f8}{\xi'}, \textcolor{#939393}{\xi})$ and backprop

return: $\textcolor{#AE81FF}{x}{i+1}$
Evaluate MH criteria $(1)$ and return accepted config, $$\textcolor{#AE81FF}{{x}{i+1}}\gets \begin{cases} \textcolor{#f8f8f8}{\textcolor{#AE81FF}{x''}} \small{\text{ w/ prob }} A(\textcolor{#f8f8f8}{\xi''}|\textcolor{#939393}{\xi}) \hspace{26pt} ✅ \ \textcolor{#939393}{\textcolor{#AE81FF}{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#f8f8f8}{\xi''}|{\textcolor{#939393}{\xi}}) \hspace{10pt} 🚫 \end{cases}$$

4D $SU(3)$ Model

[!NOTE]

Link Variables

Write link variables $U_{\mu}(x) \in SU(3)$:

$$ \begin{align*} U_{\mu}(x) &= \mathrm{exp}\left[{i, \textcolor{#AE81FF}{\omega^{k}_{\mu}(x)} \lambda^{k}}\right]\ &= e^{i \textcolor{#AE81FF}{Q}},\quad \text{with} \quad \textcolor{#AE81FF}{Q} \in \mathfrak{su}(3) \end{align*}$$

where $\omega^{k}_{\mu}(x)$ $\in \mathbb{R}$, and $\lambda^{k}$ are the generators of $SU(3)$

[!TIP]

Conjugate Momenta

Introduce $P_{\mu}(x) = P^{k}{\mu}(x) \lambda^{k}$ conjugate to $\omega^{k}{\mu}(x)$

[!IMPORTANT]

Wilson Action

$$ S_{G} = -\frac{\beta}{6} \sum \mathrm{Tr}\left[U_{\mu\nu}(x)

U^{\dagger}_{\mu\nu}(x)\right] $$

where $U_{\mu\nu}(x) = U_{\mu}(x) U_{\nu}(x+\hat{\mu}) U^{\dagger}{\mu}(x+\hat{\nu}) U^{\dagger}{\nu}(x)$

HMC: 4D $SU(3)$

Hamiltonian: $H[P, U] = \frac{1}{2} P^{2} + S[U] \Longrightarrow$

[!NONE]

$U$ update: $\frac{d\omega^{k}}{dt} = \frac{\partial H}{\partial P^{k}}$ $$\frac{d\omega^{k}}{dt}\lambda^{k} = P^{k}\lambda^{k} \Longrightarrow \frac{dQ}{dt} = P$$ $$\begin{align*} Q(\textcolor{#FFEE58}{\varepsilon}) &= Q(0) + \textcolor{#FFEE58}{\varepsilon} P(0)\Longrightarrow\ -i, \log U(\textcolor{#FFEE58}{\varepsilon}) &= -i, \log U(0) + \textcolor{#FFEE58}{\varepsilon} P(0) \ U(\textcolor{#FFEE58}{\varepsilon}) &= e^{i,\textcolor{#FFEE58}{\varepsilon} P(0)} U(0)\Longrightarrow \ &\hspace{1pt}\ \textcolor{#FD971F}{\Lambda}:,, U \longrightarrow U' &:= e^{i\varepsilon P'} U \end{align*}$$

$\textcolor{#FFEE58}{\varepsilon}$ is the step size

[!NONE]

$P$ update: $\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}}$ $$\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} = -\frac{\partial H}{\partial Q} = -\frac{dS}{dQ}\Longrightarrow$$ $$\begin{align*} P(\textcolor{#FFEE58}{\varepsilon}) &= P(0) - \textcolor{#FFEE58}{\varepsilon} \left.\frac{dS}{dQ}\right|_{t=0} \ &= P(0) - \textcolor{#FFEE58}{\varepsilon} ,\textcolor{#E599F7}{F[U]} \ &\hspace{1pt}\ \textcolor{#F06292}{\Gamma}:,, P \longrightarrow P' &:= P - \frac{\varepsilon}{2} F[U] \end{align*}$$

$\textcolor{#E599F7}{F[U]}$ is the force term

HMC: 4D $SU(3)$

Momentum Update: $$\textcolor{#F06292}{\Gamma}: P \longrightarrow P' := P - \frac{\varepsilon}{2} F[U]$$
Link Update: $$\textcolor{#FD971F}{\Lambda}: U \longrightarrow U' := e^{i\varepsilon P'} U\quad\quad$$
We maintain a batch of Nb lattices, all updated in parallel
- $U$.dtype = complex128
- $U$.shape
  = [Nb, 4, Nt, Nx, Ny, Nz, 3, 3]

Networks 4D $SU(3)$

$U$-Network:

UNet: $(U, P) \longrightarrow \left(s_{U},, t_{U},, q_{U}\right)$

$P$-Network:

PNet: $(U, P) \longrightarrow \left(s_{P},, t_{P},, q_{P}\right)$

Networks 4D $SU(3)$

$U$-Network:

UNet: $(U, P) \longrightarrow \left(s_{U},, t_{U},, q_{U}\right)$

$P$-Network:

PNet: $(U, P) \longrightarrow \left(s_{P},, t_{P},, q_{P}\right)$

$\uparrow$
let’s look at this

$P$-`Network` (pt. 1)

input⁶: $\hspace{7pt}\left(U, F\right) := (e^{i Q}, F)$ $$\begin{align*} h_{0} &= \sigma\left( w_{Q} Q + w_{F} F + b \right) \ h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \ &\vdots \ h_{n} &= \sigma\left(w_{n-1} h_{n-2} + b_{n}\right) \ \textcolor{#FF5252}{z} & := \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \ \end{align*}$$

output⁷: $\hspace{7pt} (s_{P}, t_{P}, q_{P})$
- $s_{P} = \lambda_{s} \tanh(w_s \textcolor{#FF5252}{z} + b_s)$
- $t_{P} = w_{t} \textcolor{#FF5252}{z} + b_{t}$
- $q_{P} = \lambda_{q} \tanh(w_{q} \textcolor{#FF5252}{z} + b_{q})$

$P$-`Network` (pt. 2)

Use $(s_{P}, t_{P}, q_{P})$ to update $\Gamma^{\pm}: (U, P) \rightarrow \left(U, P_{\pm}\right)$⁸:
- forward $(d = \textcolor{#FF5252}{+})$: $$\Gamma^{\textcolor{#FF5252}{+}}(U, P) := P_{\textcolor{#FF5252}{+}} = P \cdot e^{\frac{\varepsilon}{2} s_{P}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]$$
- backward $(d = \textcolor{#1A8FFF}{-})$: $$\Gamma^{\textcolor{#1A8FFF}{-}}(U, P) := P_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{P}} \left{P + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]\right}$$

Results: 2D $U(1)$

[!IMPORTANT]

Improvement

We can measure the performance by comparing $\tau_{\mathrm{int}}$ for the trained model vs. HMC.

Note: lower is better

Interpretation

Deviation in $x_{P}$

Topological charge mixing

Artificial influx of energy

Interpretation

$Average plaquette: $\langle x_{P}\rangle$ vs LF step$

$Average energy: $H - \sum\log|\mathcal{J}|$$

4D $SU(3)$ Results

Distribution of $\log|\mathcal{J}|$ over all chains, at each leapfrog step, $N_{\mathrm{LF}}$ ($= 0, 1, \ldots, 8$) during training:

4D $SU(3)$ Results: $\delta U_{\mu\nu}$

Next Steps

Further code development
- saforem2/l2hmc-qcd
Continue to use / test different network architectures
- Gauge equivariant NNs for $U_{\mu}(x)$ update
Continue to test different loss functions for training
Scaling:
- Lattice volume
- Network size
- Batch size
- # of GPUs

Thank you!

Note

Acknowledgements

This research used resources of the Argonne Leadership Computing Facility,
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

Acknowledgements

Links:
- Link to github
- reach out!
References:
- Link to slides
  - link to github with slides
- (Foreman et al. 2022; Foreman, Jin, and Osborn 2022, 2021)
- (Boyda et al. 2022; Shanahan et al. 2022)

Huge thank you to:
- Yannick Meurice
- Norman Christ
- Akio Tomiya
- Nobuyuki Matsumoto
- Richard Brower
- Luchang Jin
- Chulwoo Jung
- Peter Boyle
- Taku Izubuchi
- Denis Boyda
- Dan Hackett
- ECP-CSD group
- ALCF Staff + Datascience Group

Links

saforem2/l2hmc-qcd
📊 slides (Github: saforem2/lattice23)

References

(I don’t know why this is broken 🤷🏻‍♂️ )

Boyda, Denis et al. 2022. “Applications of Machine Learning to Lattice Quantum Field Theory.” In Snowmass 2021. https://arxiv.org/abs/2202.05838.

Foreman, Sam, Taku Izubuchi, Luchang Jin, Xiao-Yong Jin, James C. Osborn, and Akio Tomiya. 2022. “HMC with Normalizing Flows.” PoS LATTICE2021: 073. https://doi.org/10.22323/1.396.0073.

Foreman, Sam, Xiao-Yong Jin, and James C. Osborn. 2021. “Deep Learning Hamiltonian Monte Carlo.” In 9th International Conference on Learning Representations. https://arxiv.org/abs/2105.03418.

———. 2022. “LeapfrogLayers: A Trainable Framework for Effective Topological Sampling.” PoS LATTICE2021 (May): 508. https://doi.org/10.22323/1.396.0508.

Shanahan, Phiala et al. 2022. “Snowmass 2021 Computational Frontier CompF03 Topical Group Report: Machine Learning,” September. https://arxiv.org/abs/2209.07559.

Extras

Integrated Autocorrelation Time

Comparison

Plaquette analysis: $x_{P}$

Deviation from $V\rightarrow\infty$ limit, $x_{P}^{\ast}$

Average $\langle x_{P}\rangle$, with $x_{P}^{\ast}$ (dotted-lines)

Loss Function

Want to maximize the expected squared charge difference⁹: $$\begin{equation*} \mathcal{L}{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}$$
Where:
- $\delta Q$ is the tunneling rate: $$\begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}$$
- $A(\xi^{\ast}|\xi)$ is the probability¹⁰ of accepting the proposal $\xi^{\ast}$: $$\begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}$$

Networks 2D $U(1)$

Stack gauge links as shape$\left(U_{\mu}\right)$=[Nb, 2, Nt, Nx] $\in \mathbb{C}$

$$ x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]$$

with shape$\left(x_{\mu}\right)$= [Nb, 2, Nt, Nx, 2] $\in \mathbb{R}$
$x$-Network:
- $\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},, t_{x},, q_{x}\right)$
$v$-Network:
- $\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},, t_{v},, q_{v}\right)$ $\hspace{2pt}\longleftarrow$ lets look at this

$v$-Update¹¹

forward $(d = \textcolor{#FF5252}{+})$:

$$\Gamma^{\textcolor{#FF5252}{+}}: (x, v) \rightarrow v' := v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]$$

backward $(d = \textcolor{#1A8FFF}{-})$:

$$\Gamma^{\textcolor{#1A8FFF}{-}}: (x, v) \rightarrow v' := e^{-\frac{\varepsilon}{2} s_{v}} \left{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right}$$

$x$-Update

forward $(d = \textcolor{#FF5252}{+})$:

$$\Lambda^{\textcolor{#FF5252}{+}}(x, v) = x \cdot e^{\frac{\varepsilon}{2} s_{x}} - \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]$$

backward $(d = \textcolor{#1A8FFF}{-})$:

$$\Lambda^{\textcolor{#1A8FFF}{-}}(x, v) = e^{-\frac{\varepsilon}{2} s_{x}} \left{x + \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]\right}$$

Lattice Gauge Theory (2D $U(1)$)

[!NOTE]

Link Variables

$$U_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quad$$ $$x_{\mu}(n) \in [-\pi,\pi)$$

[!IMPORTANT]

Wilson Action

$$S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},$$

$$\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu})

x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]$$

Note: $\textcolor{#00CCFF}{x_{P}}$ is the product of links around $1\times 1$ square, called a “plaquette”

Annealing Schedule

Introduce an annealing schedule during the training phase:

$$\left{ \gamma_{t} \right}{t=0}^{N} = \left{\gamma{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right}$$

where $\gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1$, and $\left|\gamma_{t+1} - \gamma_{t}\right| \ll 1$
Note:
- for $\left|\gamma_{t}\right| < 1$, this rescaling helps to reduce the height of the energy barriers $\Longrightarrow$
- easier for our sampler to explore previously inaccessible regions of the phase space

Networks 2D $U(1)$

Stack gauge links as shape$\left(U_{\mu}\right)$=[Nb, 2, Nt, Nx] $\in \mathbb{C}$

$$ x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]$$

with shape$\left(x_{\mu}\right)$= [Nb, 2, Nt, Nx, 2] $\in \mathbb{R}$
$x$-Network:
- $\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},, t_{x},, q_{x}\right)$
$v$-Network:
- $\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},, t_{v},, q_{v}\right)$

Toy Example: GMM $\in \mathbb{R}^{2}$

Physical Quantities

To estimate physical quantities, we:
- Calculate physical observables at increasing spatial resolution
- Perform extrapolation to continuum limit

Extra

Here, $\sim$ means “is distributed according to” ↩
Here, $\sim$ means “is distributed according to” ↩
We always start by resampling the momentum, $v_{0} \sim \mathcal{N}(0, \mathbb{1})$ ↩
L2HMC: (Foreman, Jin, and Osborn 2021, 2022) ↩
For simple $\mathbf{x} \in \mathbb{R}^{2}$ example, $\mathcal{L}_{\theta} = A(\xi^{\ast}|\xi)\cdot \left(\mathbf{x}^{\ast} - \mathbf{x}\right)^{2}$ ↩
$\sigma(\cdot)$ denotes an activation function ↩
$\lambda_{s},, \lambda_{q} \in \mathbb{R}$ are trainable parameters ↩
Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$, i.e. $\Gamma^{+}\left[\Gamma^{-}(U, P)\right] = \Gamma^{-}\left[\Gamma^{+}(U, P)\right] = (U, P)$ ↩
Where $\xi^{\ast}$ is the proposed configuration (prior to Accept / Reject) ↩
And $\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|$ is the Jacobian of the transformation from $\xi \rightarrow \xi^{\ast}$ ↩
Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$, i.e. $\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)$ ↩

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
assets		assets
css		css
docs		docs
.gitattributes		.gitattributes
.gitignore		.gitignore
.nojekyll		.nojekyll
_quarto.yml		_quarto.yml
index.qmd		index.qmd
mlmc-foreman-lattice23.pdf		mlmc-foreman-lattice23.pdf
references.bib		references.bib

saforem2/lattice23

Folders and files

Latest commit

History

Repository files navigation

MLMC: Machine Learning Monte Carlo

Overview

Markov Chain Monte Carlo (MCMC)

Goal

Markov Chain Monte Carlo (MCMC)

Goal

Hamiltonian Monte Carlo (HMC)

Trick

Hamiltonian Monte Carlo (HMC)

Hamiltonian Dynamics

Leapfrog Integrator (HMC)

Hamiltonian Dynamics

Leapfrog Step

Warning!

HMC Update

HMC Demo

Issues with HMC

Topological Freezing

Critical Slowing Down

Can we do better?

L2HMC: Generalizing the MD Update

L2HMC Update

🎲 Re-Sampling

L2HMC: Leapfrog Layer

L2HMC Update

Algorithm

4D $SU(3)$ Model

Link Variables

Conjugate Momenta

Wilson Action

HMC: 4D $SU(3)$

HMC: 4D $SU(3)$

Networks 4D $SU(3)$

Networks 4D $SU(3)$

$P$-Network (pt. 1)

$P$-Network (pt. 2)

Results: 2D $U(1)$

Improvement

Interpretation

Interpretation

4D $SU(3)$ Results

4D $SU(3)$ Results: $\delta U_{\mu\nu}$

4D $SU(3)$ Results: $\delta U_{\mu\nu}$

Next Steps

Thank you!

Acknowledgements

Acknowledgements

Links

References

References

Extras

Integrated Autocorrelation Time

Comparison

Plaquette analysis: $x_{P}$

Loss Function

Networks 2D $U(1)$

$v$-Update11

$x$-Update

Lattice Gauge Theory (2D $U(1)$)

Link Variables

Wilson Action

Annealing Schedule

Networks 2D $U(1)$

Toy Example: GMM $\in \mathbb{R}^{2}$

Physical Quantities

Extra

Footnotes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

$P$-`Network` (pt. 1)

$P$-`Network` (pt. 2)

$v$-Update¹¹

Packages