Skip to content

update manuals for Laplace approximation, optimization allows Jacobian adjustment #606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 3, 2023
Merged
2 changes: 1 addition & 1 deletion src/bibtex/all.bib
Original file line number Diff line number Diff line change
Expand Up @@ -1755,6 +1755,6 @@ @article{PullinEtAl:2021
author = {Pullin, Jeffrey and Gurrin, Lyle and Vukcevic, Damjan},
year = {2021},
journal = {arXiv},
volume = {2010.09335}
volume = {2010.09335},
url = {https://arxiv.org/abs/2010.09335},
}
1 change: 1 addition & 0 deletions src/cmdstan-guide/_bookdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ rmd_files: [
"optimize_config.Rmd",
"variational_config.Rmd",
"generate_quantities_config.Rmd",
"laplace_sample_config.Rmd",
"log_prob_config.Rmd",
"diagnose_config.Rmd",
"parallelization.Rmd",
Expand Down
28 changes: 3 additions & 25 deletions src/cmdstan-guide/command_line_options.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ Stan’s interfaces manage the chain identifier arguments automatically.

For complete reproducibility, every aspect of the environment needs to be locked down
from the OS and version to the C++ compiler and version to the version of Stan and all dependent libraries.
See the [Stan Reference Manual Reproducibility chapter](https://mc-stan.org/docs/reference-manual/reproducibility-chapter.html)
See the [Stan Reference Manual Reproducibility chapter](https://mc-stan.org/docs/reference-manual/reproducibility.html)
for further details.


Expand All @@ -158,31 +158,9 @@ for further details.
CmdStan provides a `help` and `help-all` mechanism that displays either the
available top-level or keyword-specific key-value argument pairs.
To display top-level help, call the CmdStan executable with keyword `help`:
```
Usage: ./bernoulli <arg1> <subarg1_1> ... <subarg1_m> ... <arg_n> <subarg_n_1> ... <subarg_n_m>

Begin by selecting amongst the following inference methods and diagnostics,
sample Bayesian inference with Markov Chain Monte Carlo
optimize Point estimation
variational Variational inference
diagnose Model diagnostics
generate_quantities Generate quantities of interest
log_prob Return the log density up to a constant and its gradients, given supplied parameters

Or see help information with
help Prints help
help-all Prints entire argument tree

Additional configuration available by specifying
id Unique process identifier
data Input data options
init Initialization method: "x" initializes randomly between [-x, x], "0" initializes to 0, anything else identifies a file of values
random Random number configuration
output File output options
num_threads Number of threads available to the program. To use this argument, re-compile this model with STAN_THREADS=true.

See ./bernoulli <arg1> [ help | help-all ] for details on individual arguments.

```
./bernoulli help
```

## Error messages and return codes
Expand Down
4 changes: 2 additions & 2 deletions src/cmdstan-guide/compiling_stan_programs.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ Identifier 'thata' not in scope.
make: *** [bernoulli.hpp] Error 1
```

Stan is a [strongly-typed language](https://mc-stan.org/docs/reference-manual/data-types-chapter.html);
Stan is a [strongly-typed language](https://mc-stan.org/docs/reference-manual/data-types.html);
and the compiler will throw an error if statements or expressions violate the type rules.
The following trivial program `foo.stan` contains an illegal assignment statement:
```stan
Expand All @@ -111,7 +111,7 @@ Semantic error in 'foo.stan', line 5, column 2 to column 12:
Ill-typed arguments supplied to assignment operator =: lhs has type int and rhs has type real
```

The [Stan Reference Manual](https://mc-stan.org/docs/reference-manual/language.html#language)
The [Stan Reference Manual](https://mc-stan.org/docs/reference-manual/language.html)
provides a complete specification of the Stan programming language. The
[Stan User's Guide](https://mc-stan.org/docs/stan-users-guide/understanding-stanc3-errors-and-warnings.html)
also contains a full description of the errors and warnings stanc can emit.
Expand Down
6 changes: 3 additions & 3 deletions src/cmdstan-guide/diagnose_utility.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ with possible ways to mitigate it.
To fully exercise the `diagnose` command, we run 4 chains to
sample from the Neal's funnel distribution,
discussed in the Stan User's Guide reparameterization section
https://mc-stan.org/docs/stan-users-guide/reparameterization-section.html.
https://mc-stan.org/docs/stan-users-guide/reparameterization.html.
This program defines a distribution which
exemplifies the difficulties of sampling from some hierarchical models:
```stan
Expand Down Expand Up @@ -162,7 +162,7 @@ write the model that is logically equivalent but simplifies the
geometry of the posterior distribution. This problem occurs frequently
with hierarchical models and one of the simplest examples is Neal's
Funnel, which is discussed in the
[reparameterization section](https://mc-stan.org/docs/stan-users-guide/reparameterization-section.html)
[reparameterization section](https://mc-stan.org/docs/stan-users-guide/reparameterization.html)
of the Stan User's Guide.

### Maximum treedepth exceeded
Expand All @@ -174,7 +174,7 @@ concern. Configuring the No-U-Turn-Sampler (the variant of
HMC used by Stan) requires putting a cap on the depth of the trees
that it evaluates during each iteration (for details on this see the
*Hamiltonian Monte Carlo Sampling* chapter in the
[Stan Reference Manual](https://mc-stan.org/docs/reference-manual/hmc-chapter.html).
[Stan Reference Manual](https://mc-stan.org/docs/reference-manual/hmc.html).
When the maximum allowed tree depth is reached it
indicates that NUTS is terminating prematurely to avoid excessively
long execution time.
Expand Down
6 changes: 4 additions & 2 deletions src/cmdstan-guide/index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,10 @@ http://mc-stan.org/
#### Licensing {-}

CmdStan, Stan, and the Stan Math Library are licensed under the new
BSD license (3-clause). See the Stan Reference Manual [Licenses section](https://mc-stan.org/docs/reference-manual/licensing-appendix.html) for licensing terms for Stan and the dependent packages Boost, Eigen,
Sundials, and Intel TBB.
BSD license (3-clause). See the Stan Reference Manual
[Licenses section](https://mc-stan.org/docs/reference-manual/licensing.html)
for licensing terms for Stan.


#### Stan documentation: user's guide and reference manuals {-}

Expand Down
112 changes: 112 additions & 0 deletions src/cmdstan-guide/laplace_sample_config.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Laplace sampling

The `laplace` method produces a sample from a normal approximation
centered at the mode of a distribution in the unconstrained space.
If the mode is a maximum a posteriori (MAP) estimate,
the samples provide an estimate of the mean and standard deviation
of the posterior distribution.
If the mode is a maximum likelihood estimate (MLE),
the sample provides an estimate of the standard error of the likelihood.
In general, the posterior mode in the unconstrained space doesn't correspond
to the mean (nor mode) in the constrained space, and thus the sample is needed
to infer the mean as well as the standard deviation.
(See [this case study](https://avehtari.github.io/casestudies/Jacobian/jacobian.html)
for a visual illustration.)

This is computationally inexpensive compared to exact Bayesian inference with MCMC.
The goodness of this estimate depends on both the estimate of the mode
and how much the true posterior in the unconstrained space resembles a Gaussian.


## Configuration

This method takes 2 arguments:

- `jacobian` - Whether or not the
[Jacobian adjustment](https://mc-stan.org/docs/stan-users-guide/changes-of-variables.html#changes-of-variables)
should be included in the gradient. The default value is 1 (include adjustment).
(Note: in optimization, the default value is `0`, for historical reasons.)

- `mode` - Input file of parameters values on the constrained scale.
When Stan's `optimize` method is used to estimate the modal values,
the value of boolean argument `jacobian` should be 0 if `optimize` was
run with default settings, i.e., the input is the MLE estimate;
if `optimize` was run with argument `jacobian=1`, then the `laplace`
method default setting, `jacobian=1`, should be used.



## CSV output

The output file consists of the following pieces of information:

- The full set of configuration options available for the `log_prob` method is
reported at the beginning of the output file as CSV comments.

- Output columns for all model parameters on the constrained scale,
followed by columns `log_p` and `log_q`,
the unnormalized log density and the unnormalized density of the
Laplace approximation, respectively.
These can be used for diagnostics and importance sampling.

## Example

To get an approximate estimate of the mode and standard deviation of the
example Bernoulli model given the example dataset:

- find the MAP estimate by running optimization with argument `jacobian=1`

- run the Laplace estimator using the MAP estimate as the `mode` argument.

Because the default output file name from all methods is `output.csv`,
a more informative name is used for the output of optimization.
We run the commands from the CmdStan home directory.
This results in a sample with mean 2.7 and standard deviation 0.12.
In comparison, running the NUTS-HMC sampler results in mean 2.6 and standard deviation 0.12.


```
./examples/bernoulli/bernoulli optimize jacobian=1 \
data file=examples/bernoulli/bernoulli.data.json \
output file=bernoulli_optimize_lbfgs.csv random seed=1234


./examples/bernoulli/bernoulli laplace mode=bernoulli_optimize_lbfgs.csv \
data file=examples/bernoulli/bernoulli.data.json random seed=1234
```

The header and first few data rows of the output sample are shown below.

```
# stan_version_major = 2
# stan_version_minor = 31
# stan_version_patch = 0
# model = bernoulli_model
# start_datetime = 2022-12-20 01:01:14 UTC
# method = laplace
# laplace
# mode = bernoulli_lbfgs.csv
# jacobian = 1 (Default)
# draws = 1000 (Default)
# id = 1 (Default)
# data
# file = examples/bernoulli/bernoulli.data.json
# init = 2 (Default)
# random
# seed = 875960551 (Default)
# output
# file = output.csv (Default)
# diagnostic_file = (Default)
# refresh = 100 (Default)
# sig_figs = -1 (Default)
# profile_file = profile.csv (Default)
# num_threads = 1 (Default)
# stanc_version = stanc3 v2.31.0-7-g20444266
# stancflags =
theta,log_p,log_q
0.0498545,-9.4562,-2.33997
0.182898,-6.9144,-0.0117349
0.376428,-7.18171,-0.746034
...
```

9 changes: 6 additions & 3 deletions src/cmdstan-guide/log_prob_config.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,19 @@ the results here are not compared with those from finite differences.
efficient way to calculate these quantities. It is provided only for
convenience and should not be used for serious computation.


## Configuration

This method takes 3 arguments:

- `jacobian` - Whether or not the [Jacobian adjustment for constrained](https://mc-stan.org/docs/stan-users-guide/changes-of-variables.html#changes-of-variables)
parameters should be included in the gradient. Default value is 1 (include adjustment).

- `constrained_params` - Input file (JSON or R dump) of parameter values on
constrained scale. These files are in the same format as those used in the
`inits` argument for MCMC.
- `constrained_params` - Input file of parameters values on the constrained scale.
A single set of constrained parameters can be specified using
[JSON](json_apdx.html) or [Rdump](rdump_apdx.html) format.
Alternatively, the input file can be set of draws in [StanCSV](stan_csv_apdx.html) format.


- `unconstrained_params` - Input file (JSON or R dump) of parameter values
on unconstrained scale. These files should contain a single variable, called
Expand Down
6 changes: 3 additions & 3 deletions src/cmdstan-guide/mcmc_config.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ The `sample` method provides Bayesian inference over the model conditioned on da
using Hamiltonian Monte Carlo (HMC) sampling. By default, the inference engine used is
the No-U-Turn sampler (NUTS), an adaptive form of Hamiltonian Monte Carlo sampling.
For details on HMC and NUTS, see the Stan Reference Manual chapter on
[MCMC Sampling](https://mc-stan.org/docs/reference-manual/hmc-chapter.html).
[MCMC Sampling](https://mc-stan.org/docs/reference-manual/hmc.html).

The full set of configuration options available for the `sample` method is
reported at the beginning of the sampler output file as CSV comments.
Expand Down Expand Up @@ -110,7 +110,7 @@ Since the default argument is `engaged=1`, this keyword-value pair can be omitte

There are two sets of adaptation sub-arguments: step size optimization parameters and the warmup schedule.
These are described in detail in the Reference Manual section
[Automatic Parameter Tuning](https://mc-stan.org/docs/2_23/reference-manual/hmc-algorithm-parameters.html).
[Automatic Parameter Tuning](https://mc-stan.org/docs/reference-manual/hmc-algorithm-parameters.html).

### Step size optimization configuration

Expand All @@ -136,7 +136,7 @@ Models with difficult posterior geometries may required increasing the `delta` a
closer to $1$; we recommend first trying to raise it to $0.9$ or at most $0.95$.
Values about $0.95$ are strong indication of bad geometry;
the better solution is to change the model geometry through
[reparameterization](https://mc-stan.org/docs/stan-users-guide/reparameterization-section.html)
[reparameterization](https://mc-stan.org/docs/stan-users-guide/reparameterization.html)
which could yield both more efficient and faster sampling.

- `gamma` - Adaptation regularization scale.
Expand Down
8 changes: 2 additions & 6 deletions src/cmdstan-guide/optimization_intro.Rmd
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# Optimization

The CmdStan executable can run Stan’s optimization algorithms
for penalized maximum likelihood estimation which provide a
deterministic method to find the posterior mode.
which provide a deterministic method to find the posterior mode.
If the posterior is not convex, there is no guarantee Stan
will be able to find the global mode as opposed to a local optimum
of log probability.
Expand Down Expand Up @@ -100,10 +99,7 @@ and a line for the values. Here, the header indicates the unnormalized
log probability with `lp__` and the model parameter
`theta`. The maximum log probability is -5.0 and the posterior
mode for `theta` is 0.20. The mode exactly matches what we would
expect from the data.^[The Jacobian adjustment included for the sampler's log
probability function is not applied during optimization, because it
can change the shape of the posterior and hence the solution.]

expect from the data.
Because the prior was uniform, the result 0.20 represents the maximum
likelihood estimate (MLE) for the very simple Bernoulli model. Note
that no uncertainty is reported.
Expand Down
20 changes: 19 additions & 1 deletion src/cmdstan-guide/optimize_config.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,28 @@ of default configuration options:
# tol_rel_grad = 10000000 (Default)
# tol_param = 1e-08 (Default)
# history_size = 5 (Default)
# jacobian = 0 (Default)
# iter = 2000 (Default)
# save_iterations = 0 (Default)
```

## Jacobian adjustments

The `jacobian` argument specifies whether or not the call to the model's
log probability function should include
the log absolute Jacobian determinant of inverse parameter transforms.
Without the Jacobian adjustment, optimization
returns the (regularized) maxiumum likelihood estimate (MLE),
$\mathrm{argmax}_{\theta}\ p(y | \theta)$,
the value which maximizes the likelihood of the data given the parameters,
(including prior terms).
Applying the Jacobian adjustment produces the maximum a posteriori estimate (MAP),
the maximum value of the posterior distribution,
$\mathrm{argmax}_{\theta}\ p(y | \theta)\,p(\theta)$.
By default this value is `0` (false),
do not include the Jacobian adjustment.


## Optimization algorithms

The `algorithm` argument specifies the optimization algorithm.
Expand All @@ -44,7 +62,7 @@ and also much faster than the other optimizers.
but has the advantage of setting its own stepsize.

See the Stan Reference Manual's
[Optimization chapter](https://mc-stan.org/docs/reference-manual/optimization-algorithms-chapter.html)
[Optimization chapter](https://mc-stan.org/docs/reference-manual/optimization-algorithms.html)
for a description of these algorithms.

All of the optimizers stream per-iteration intermediate approximations to the command line console.
Expand Down
2 changes: 1 addition & 1 deletion src/cmdstan-guide/parallelization.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Stan provides three ways of parallelizing execution of a Stan model:

In order to exploit multi-threading in a Stan model, the models must be
rewritten to use the `reduce_sum` and `map_rect` functions. For instructions
on how to rewrite Stan models to use these functions see [Stan's User guide chapter on parallelization](https://mc-stan.org/docs/stan-users-guide/parallelization-chapter.html), [the reduce_sum case study](https://mc-stan.org/users/documentation/case-studies/reduce_sum_tutorial.html) or the [Multithreading and Map-Reduce tutorial](https://github.com/rmcelreath/cmdstan_map_rect_tutorial).
on how to rewrite Stan models to use these functions see [Stan's User guide chapter on parallelization](https://mc-stan.org/docs/stan-users-guide/parallelization.html), [the reduce_sum case study](https://mc-stan.org/users/documentation/case-studies/reduce_sum_tutorial.html) or the [Multithreading and Map-Reduce tutorial](https://github.com/rmcelreath/cmdstan_map_rect_tutorial).

### Compiling

Expand Down
4 changes: 2 additions & 2 deletions src/cmdstan-guide/stansummary.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,13 @@ Therefore MCSE column is placed next to the sample mean column,
in order to make it easy to compare this sample with others.

For more information, see the
[Posterior Analysis](https://mc-stan.org/docs/reference-manual/analysis-chapter.html)
[Posterior Analysis](https://mc-stan.org/docs/reference-manual/analysis.html)
chapter of the Stan Reference Manual which describes both the theory and practice of MCMC
estimation techniques.
The summary statistics and the algorithms used to compute them are described in sections
[Notation for samples](https://mc-stan.org/docs/reference-manual/notation-for-samples-chains-and-draws.html)
and
[Effective Sample Size](https://mc-stan.org/docs/reference-manual/effective-sample-size-section.html).
[Effective Sample Size](https://mc-stan.org/docs/reference-manual/effective-sample-size.html).

## Building the stansummary command

Expand Down
2 changes: 1 addition & 1 deletion src/cmdstan-guide/variational_config.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ ADVI optimizes the ELBO in the real-coordinate space using
[stochastic gradient ascent](https://mc-stan.org/docs/reference-manual/stochastic-gradient-ascent.html).
The measures of convergence are similar to the
relative tolerance scheme of Stan's
[optimization algorithms](https://mc-stan.org/docs/reference-manual/optimization-algorithms-chapter.html).
[optimization algorithms](https://mc-stan.org/docs/reference-manual/optimization-algorithms.html).

The algorithm progression consists of an adaptation phase followed by a sampling phase.
The adaptation phase finds a good value for the step size scaling parameter `eta`.
Expand Down
1 change: 1 addition & 0 deletions src/reference-manual/_bookdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ rmd_files: [
"analysis.Rmd",
"optimization.Rmd",
"variational.Rmd",
"laplace.Rmd",
"diagnostics.Rmd",

"usage.Rmd",
Expand Down
Loading