Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation #27

Merged
merged 46 commits into from
Aug 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
ec8c4c3
add NF intro
zuhengxu Aug 10, 2023
c609b6e
Merge branch 'main' of github.com:zuhengxu/NormalizingFlows.jl into d…
zuhengxu Aug 10, 2023
b623258
set up doc files
zuhengxu Aug 10, 2023
1d26737
add gitignore
zuhengxu Aug 10, 2023
81a6514
minor update to readme
zuhengxu Aug 10, 2023
980487e
update home page
zuhengxu Aug 10, 2023
128da10
update docs for each funciton
zuhengxu Aug 13, 2023
9926538
update docs
zuhengxu Aug 13, 2023
f056311
src
zuhengxu Aug 13, 2023
eec2362
update function docs
zuhengxu Aug 13, 2023
f26b90b
update docs
zuhengxu Aug 13, 2023
36bd72d
fix readme math rendering issue
zuhengxu Aug 13, 2023
37eb101
update docs
zuhengxu Aug 14, 2023
b6f21c2
update example doc
zuhengxu Aug 14, 2023
39e49a7
update customize layer docs
zuhengxu Aug 15, 2023
87932a6
finish docs
zuhengxu Aug 15, 2023
7d22c5c
finish docs
zuhengxu Aug 15, 2023
5d911f9
Update README.md
zuhengxu Aug 17, 2023
f2db21e
Update README.md
zuhengxu Aug 17, 2023
84975a0
Update README.md
zuhengxu Aug 17, 2023
70024c2
Update docs/src/index.md
zuhengxu Aug 17, 2023
39913b7
Update README.md
zuhengxu Aug 17, 2023
9e63832
Update docs/src/index.md
zuhengxu Aug 17, 2023
a762a59
Update docs/src/customized_layer.md
zuhengxu Aug 17, 2023
0146a79
Update docs/src/customized_layer.md
zuhengxu Aug 17, 2023
eb8b6b9
Update docs/src/customized_layer.md
zuhengxu Aug 17, 2023
183ea30
Update docs/src/customized_layer.md
zuhengxu Aug 17, 2023
f359a32
Update docs/src/customized_layer.md
zuhengxu Aug 17, 2023
51e9ec2
Update docs/src/index.md
zuhengxu Aug 17, 2023
37be7ea
Update docs/src/index.md
zuhengxu Aug 17, 2023
c457b13
Update docs/src/index.md
zuhengxu Aug 17, 2023
63d51ca
Update docs/src/example.md
zuhengxu Aug 17, 2023
732a1ae
Update docs/src/example.md
zuhengxu Aug 17, 2023
48a3111
Update docs/src/example.md
zuhengxu Aug 17, 2023
19443f0
Update docs/src/example.md
zuhengxu Aug 17, 2023
6b437cc
Update docs/src/example.md
zuhengxu Aug 17, 2023
52bfb9e
Update docs/src/example.md
zuhengxu Aug 17, 2023
6e01169
Update docs/src/example.md
zuhengxu Aug 17, 2023
b10a0e1
Update docs/src/customized_layer.md
zuhengxu Aug 17, 2023
791f398
Update docs/src/customized_layer.md
zuhengxu Aug 17, 2023
535ff50
Update docs/src/customized_layer.md
zuhengxu Aug 17, 2023
ca5606a
Update docs/src/customized_layer.md
zuhengxu Aug 17, 2023
b492957
minor ed
zuhengxu Aug 17, 2023
4d13c3a
Merge branch 'documentation' of github.com:zuhengxu/NormalizingFlows.…
zuhengxu Aug 17, 2023
c53f306
minor ed to fix latex issue
zuhengxu Aug 17, 2023
7115e28
minor update
zuhengxu Aug 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,89 @@

[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://turinglang.github.io/NormalizingFlows.jl/dev/)
[![Build Status](https://github.com/TuringLang/NormalizingFlows.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/TuringLang/NormalizingFlows.jl/actions/workflows/CI.yml?query=branch%3Amain)


A normalizing flow library for Julia.

The purpose of this package is to provide a simple and flexible interface for variational inference (VI) and normalizing flows (NF) for Bayesian computation or generative modeling.
The key focus is to ensure modularity and extensibility, so that users can easily
construct (e.g., define customized flow layers) and combine various components
(e.g., choose different VI objectives or gradient estimates)
for variational approximation of general target distributions,
without being tied to specific probabilistic programming frameworks or applications.

See the [documentation](https://turinglang.org/NormalizingFlows.jl/dev/) for more.

## Installation
To install the package, run the following command in the Julia REPL:
```julia
] # enter Pkg mode
(@v1.9) pkg> add git@github.com:TuringLang/NormalizingFlows.jl.git
```
Then simply run the following command to use the package:
```julia
using NormalizingFlows
```

## Quick recap of normalizing flows
Normalizing flows transform a simple reference distribution $q_0$ (sometimes known as base distribution) to
a complex distribution $q$ using invertible functions.

In more details, given the base distribution, usually a standard Gaussian distribution, i.e., $q_0 = \mathcal{N}(0, I)$,
we apply a series of parameterized invertible transformations (called flow layers), $T_{1, \theta_1}, \cdots, T_{N, \theta_k}$, yielding that
```math
Z_N = T_{N, \theta_N} \circ \cdots \circ T_{1, \theta_1} (Z_0) , \quad Z_0 \sim q_0,\quad Z_N \sim q_{\theta},
```
where $\theta = (\theta_1, \dots, \theta_N)$ is the parameter to be learned, and $q_{\theta}$ is the variational distribution (flow distribution). This describes **sampling procedure** of normalizing flows, which requires sending draws through a forward pass of these flow layers.

Since all the transformations are invertible (techinically [diffeomorphic](https://en.wikipedia.org/wiki/Diffeomorphism)), we can evaluate the density of a normalizing flow distribution $q_{\theta}$ by the change of variable formula:
```math
q_\theta(x)=\frac{q_0\left(T_1^{-1} \circ \cdots \circ
T_N^{-1}(x)\right)}{\prod_{n=1}^N J_n\left(T_n^{-1} \circ \cdots \circ
T_N^{-1}(x)\right)} \quad J_n(x)=\left|\operatorname{det} \nabla_x
T_n(x)\right|.
```
Here we drop the subscript $\theta_n, n = 1, \dots, N$ for simplicity.
Density evaluation of normalizing flow requires computing the **inverse** and the
**Jacobian determinant** of each flow layer.

Given the feasibility of i.i.d. sampling and density evaluation, normalizing flows can be trained by minimizing some statistical distances to the target distribution $p$. The typical choice of the statistical distance is the forward and backward Kullback-Leibler (KL) divergence, which leads to the following optimization problems:
```math
\begin{aligned}
\text{Reverse KL:}\quad
&\argmin _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\
&= \argmin _{\theta} \mathbb{E}_{q_0}\left[\log \frac{q_\theta(T_N\circ \cdots \circ T_1(Z_0))}{p(T_N\circ \cdots \circ T_1(Z_0))}\right] \\
&= \argmax _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ F_1(X)\right)\right]
\end{aligned}
```
and
```math
\begin{aligned}
\text{Forward KL:}\quad
&\argmin _{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\
&= \argmin _{\theta} \mathbb{E}_{p}\left[\log q_\theta(Z)\right]
\end{aligned}
```
Both problems can be solved via standard stochastic optimization algorithms,
such as stochastic gradient descent (SGD) and its variants.

Reverse KL minimization is typically used for **Bayesian computation**, where one
wants to approximate a posterior distribution $p$ that is only known up to a
normalizing constant.
In contrast, forward KL minimization is typically used for **generative modeling**, where one wants to approximate a complex distribution $p$ that is known up to a normalizing constant.

## Current status and TODOs

- [x] general interface development
- [x] documentation
- [ ] including more flow examples
- [ ] GPU compatibility
- [ ] benchmarking

## Related packages
- [Bijectors.jl](https://github.com/TuringLang/Bijectors.jl): a package for defining bijective transformations, which can be used for defining customized flow layers.
- [Flux.jl](https://fluxml.ai/Flux.jl/stable/)
- [Optimisers.jl](https://github.com/FluxML/Optimisers.jl)
- [AdvancedVI.jl](https://github.com/TuringLang/AdvancedVI.jl)


2 changes: 2 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
build/
site/
7 changes: 6 additions & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@ makedocs(;
repo="https://github.com/TuringLang/NormalizingFlows.jl/blob/{commit}{path}#{line}",
sitename="NormalizingFlows.jl",
format=Documenter.HTML(),
pages=["Home" => "index.md"],
pages=[
"Home" => "index.md",
"API" => "api.md",
"Example" => "example.md",
"Customize your own flow layer" => "customized_layer.md",
],
)

deploydocs(; repo="github.com/TuringLang/NormalizingFlows.jl", devbranch="main")
93 changes: 93 additions & 0 deletions docs/src/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
## API

```@index
```


## Main Function

```@docs
NormalizingFlows.train_flow
```

The flow object can be constructed by `transformed` function in `Bijectors.jl` package.
For example of Gaussian VI, we can construct the flow as follows:
```@julia
using Distributions, Bijectors
T= Float32
q₀ = MvNormal(zeros(T, 2), ones(T, 2))
flow = Bijectors.transformed(q₀, Bijectors.Shift(zeros(T,2)) ∘ Bijectors.Scale(ones(T, 2)))
```
To train the Gaussian VI targeting at distirbution $p$ via ELBO maiximization, we can run
```@julia
using NormalizingFlows

sample_per_iter = 10
flow_trained, stats, _ = train_flow(
elbo,
flow,
logp,
sample_per_iter;
max_iters=2_000,
optimiser=Optimisers.ADAM(0.01 * one(T)),
)
```
## Variational Objectives
We have implemented two variational objectives, namely, ELBO and the log-likelihood objective.
Users can also define their own objective functions, and pass it to the [`train_flow`](@ref) function.
`train_flow` will optimize the flow parameters by maximizing `vo`.
The objective function should take the following general form:
```julia
vo(rng, flow, args...)
```
where `rng` is the random number generator, `flow` is the flow object, and `args...` are the
additional arguments that users can pass to the objective function.

#### Evidence Lower Bound (ELBO)
By maximizing the ELBO, it is equivalent to minimizing the reverse KL divergence between $q_\theta$ and $p$, i.e.,
```math
\begin{aligned}
&\min _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Reverse KL)}\\
& = \max _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ
T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ
F_1(X)\right)\right] \quad \text{(ELBO)}
\end{aligned}
```
Reverse KL minimization is typically used for **Bayesian computation**,
where one only has access to the log-(unnormalized)density of the target distribution $p$ (e.g., a Bayesian posterior distribution),
and hope to generate approximate samples from it.

```@docs
NormalizingFlows.elbo
```
#### Log-likelihood

By maximizing the log-likelihood, it is equivalent to minimizing the forward KL divergence between $q_\theta$ and $p$, i.e.,
```math
\begin{aligned}
& \min_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Forward KL)} \\
& = \max_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)\right] \quad \text{(Expected log-likelihood)}
\end{aligned}
```
Forward KL minimization is typically used for **generative modeling**,
where one is given a set of samples from the target distribution $p$ (e.g., images)
and aims to learn the density or a generative process that outputs high quality samples.

```@docs
NormalizingFlows.loglikelihood
```


## Training Loop

```@docs
NormalizingFlows.optimize
```


## Utility Functions for Taking Gradient
```@docs
NormalizingFlows.grad!
NormalizingFlows.value_and_gradient!
```

Binary file added docs/src/banana.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/src/comparison.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
180 changes: 180 additions & 0 deletions docs/src/customized_layer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# Defining Your Own Flow Layer

In practice, user might want to define their own normalizing flow.
As briefly noted in [What are normalizing flows?](@ref), the key is to define a
customized normalizing flow layer, including its transformation and inverse,
as well as the log-determinant of the Jacobian of the transformation.
`Bijectors.jl` offers a convenient interface to define a customized bijection.
We refer users to [the documentation of
`Bijectors.jl`](https://turinglang.org/Bijectors.jl/dev/transforms/#Implementing-a-transformation)
for more details.
`Flux.jl` is also a useful package, offering a convenient interface to define neural networks.


In this tutorial, we demonstrate how to define a customized normalizing flow
layer -- an `Affine Coupling Layer` (Dinh *et al.*, 2016) -- using `Bijectors.jl` and `Flux.jl`.

## Affine Coupling Flow

Given an input vector $\boldsymbol{x}$, the general *coupling transformation* splits it into two
parts: $\boldsymbol{x}_{I_1}$ and $\boldsymbol{x}_{I\setminus I_1}$. Only one
part (e.g., $\boldsymbol{x}_{I_1}$) undergoes a bijective transformation $f$, noted as the *coupling law*,
based on the values of the other part (e.g., $\boldsymbol{x}_{I\setminus I_1}$), which remains unchanged.
```math
\begin{array}{llll}
c_{I_1}(\cdot ; f, \theta): & \mathbb{R}^d \rightarrow \mathbb{R}^d & c_{I_1}^{-1}(\cdot ; f, \theta): & \mathbb{R}^d \rightarrow \mathbb{R}^d \\
& \boldsymbol{x}_{I \backslash I_1} \mapsto \boldsymbol{x}_{I \backslash I_1} & & \boldsymbol{y}_{I \backslash I_1} \mapsto \boldsymbol{y}_{I \backslash I_1} \\
& \boldsymbol{x}_{I_1} \mapsto f\left(\boldsymbol{x}_{I_1} ; \theta\left(\boldsymbol{x}_{I\setminus I_1}\right)\right) & & \boldsymbol{y}_{I_1} \mapsto f^{-1}\left(\boldsymbol{y}_{I_1} ; \theta\left(\boldsymbol{y}_{I\setminus I_1}\right)\right)
\end{array}
```
Here $\theta$ can be an arbitrary function, e.g., a neural network.
As long as $f(\cdot; \theta(\boldsymbol{x}_{I\setminus I_1}))$ is invertible, $c_{I_1}$ is invertible, and the
Jacobian determinant of $c_{I_1}$ is easy to compute:
```math
\left|\text{det} \nabla_x c_{I_1}(x)\right| = \left|\text{det} \nabla_{x_{I_1}} f(x_{I_1}; \theta(x_{I\setminus I_1}))\right|
```

The affine coupling layer is a special case of the coupling transformation, where the coupling law $f$ is an affine function:
```math
\begin{aligned}
\boldsymbol{x}_{I_1} &\mapsto \boldsymbol{x}_{I_1} \odot s\left(\boldsymbol{x}_{I\setminus I_1}\right) + t\left(\boldsymbol{x}_{I \setminus I_1}\right) \\
\boldsymbol{x}_{I \backslash I_1} &\mapsto \boldsymbol{x}_{I \backslash I_1}
\end{aligned}
```
Here, $s$ and $t$ are arbitrary functions (often neural networks) called the "scaling" and "translation" functions, respectively.
They produce vectors of the
same dimension as $\boldsymbol{x}_{I_1}$.


## Implementing Affine Coupling Layer

We start by defining a simple 3-layer multi-layer perceptron (MLP) using `Flux.jl`,
which will be used to define the scaling $s$ and translation functions $t$ in the affine coupling layer.
```@example afc
using Flux

function MLP_3layer(input_dim::Int, hdims::Int, output_dim::Int; activation=Flux.leakyrelu)
return Chain(
Flux.Dense(input_dim, hdims, activation),
Flux.Dense(hdims, hdims, activation),
Flux.Dense(hdims, output_dim),
)
end
```

#### Construct the Object

Following the user interface of `Bijectors.jl`, we define a struct `AffineCoupling` as a subtype of `Bijectors.Bijector`.
The functions `parition` , `combine` are used to partition and recombine a vector into 3 disjoint subvectors.
And `PartitionMask` is used to store this partition rule.
These three functions are
all defined in `Bijectors.jl`; see the [documentaion](https://github.com/TuringLang/Bijectors.jl/blob/49c138fddd3561c893592a75b211ff6ad949e859/src/bijectors/coupling.jl#L3) for more details.

```@example afc
using Functors
using Bijectors
using Bijectors: partition, combine, PartitionMask

struct AffineCoupling <: Bijectors.Bijector
dim::Int
mask::Bijectors.PartitionMask
s::Flux.Chain
t::Flux.Chain
end

# to apply functions to the parameters that are contained in AffineCoupling.s and AffineCoupling.t,
# and to re-build the struct from the parameters, we use the functor interface of `Functors.jl`
# see https://fluxml.ai/Flux.jl/stable/models/functors/#Functors.functor
@functor AffineCoupling (s, t)

function AffineCoupling(
dim::Int, # dimension of input
hdims::Int, # dimension of hidden units for s and t
mask_idx::AbstractVector, # index of dimension that one wants to apply transformations on
)
cdims = length(mask_idx) # dimension of parts used to construct coupling law
s = MLP_3layer(cdims, hdims, cdims)
t = MLP_3layer(cdims, hdims, cdims)
mask = PartitionMask(dim, mask_idx)
return AffineCoupling(dim, mask, s, t)
end
```
By default, we define $s$ and $t$ using the `MLP_3layer` function, which is a
3-layer MLP with leaky ReLU activation function.

#### Implement the Forward and Inverse Transformations


```@example afc
function Bijectors.transform(af::AffineCoupling, x::AbstractVector)
# partition vector using 'af.mask::PartitionMask`
x₁, x₂, x₃ = partition(af.mask, x)
y₁ = x₁ .* af.s(x₂) .+ af.t(x₂)
return combine(af.mask, y₁, x₂, x₃)
end

function Bijectors.transform(iaf::Inverse{<:AffineCoupling}, y::AbstractVector)
af = iaf.orig
# partition vector using `af.mask::PartitionMask`
y_1, y_2, y_3 = partition(af.mask, y)
# inverse transformation
x_1 = (y_1 .- af.t(y_2)) ./ af.s(y_2)
return combine(af.mask, x_1, y_2, y_3)
end
```

#### Implement the Log-determinant of the Jacobian
Notice that here we wrap the transformation and the log-determinant of the Jacobian into a single function, `with_logabsdet_jacobian`.

```@example afc
function Bijectors.with_logabsdet_jacobian(af::AffineCoupling, x::AbstractVector)
x_1, x_2, x_3 = Bijectors.partition(af.mask, x)
y_1 = af.s(x_2) .* x_1 .+ af.t(x_2)
logjac = sum(log ∘ abs, af.s(x_2))
return combine(af.mask, y_1, x_2, x_3), logjac
end

function Bijectors.with_logabsdet_jacobian(
iaf::Inverse{<:AffineCoupling}, y::AbstractVector
)
af = iaf.orig
# partition vector using `af.mask::PartitionMask`
y_1, y_2, y_3 = partition(af.mask, y)
# inverse transformation
x_1 = (y_1 .- af.t(y_2)) ./ af.s(y_2)
logjac = -sum(log ∘ abs, af.s(y_2))
return combine(af.mask, x_1, y_2, y_3), logjac
end
```
#### Construct Normalizing Flow

Now with all the above implementations, we are ready to use the `AffineCoupling` layer for normalizing flow
by applying it to a base distribution $q_0$.

```@example afc
using Random, Distributions, LinearAlgebra
dim = 4
hdims = 10
Ls = [
AffineCoupling(dim, hdims, 1:2),
AffineCoupling(dim, hdims, 3:4),
AffineCoupling(dim, hdims, 1:2),
AffineCoupling(dim, hdims, 3:4),
]
ts = reduce(∘, Ls)
q₀ = MvNormal(zeros(Float32, dim), I)
flow = Bijectors.transformed(q₀, ts)
```
We can now sample from the flow:
```@example afc
x = rand(flow, 10)
```
And evaluate the density of the flow:
```@example afc
logpdf(flow, x[:,1])
```


## Reference
Dinh, L., Sohl-Dickstein, J. and Bengio, S., 2016. *Density estimation using real nvp.*
arXiv:1605.08803.
Binary file added docs/src/elbo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading