Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Laplace approximation sampler #3146

Closed
bob-carpenter opened this issue Oct 14, 2022 · 4 comments
Closed

Laplace approximation sampler #3146

bob-carpenter opened this issue Oct 14, 2022 · 4 comments

Comments

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Oct 14, 2022

Summary:

Provide a service function for sampling from a Laplace approximation. This will allow cmdstanr to reproduce the Laplace approximation functionality of RStan.

Description:

The following signature should work. It requires an estimate theta of the mode (for example, as computed by optimization).

template <class Model>
int laplace_sample(const Model& m,
  bool include_jacobian,
  const Eigen::VectorXd& theta_hat, 
  int num_draws,
  unsigned int random_seed,
  int refresh,
  callbacks::interrupt& interrupt,
  callbacks::logger& logger,
  callbacks::writer& sample_writer);

The computation will follow RStan. The Hessian will be computed using central finite differences of the model functor. It cannot use autodiff because not all of our functions have second-order derivatives.

  • central finite differences Hessian functional: stan::math::finite_diff_hessian_auto in stan/math/rev/functor/finite_diff_hessian_auto.hpp

  • model functor: stan::model::model_functional in stan/model/model_functional.hpp

The algorithm has a high constant overhead, requiring a one-time cost of $\mathcal{O}(N^3)$ in $N$ dimensions to compute the Cholesky factor of the negative inverse Hessian. It also requires $2N$ gradient calculations to use as the basis, which scales at best as $\mathcal{O}(N^2)$ and is worse for models whose gradient calculation is super-linear in dimension.

The algorithm also a high per-draw overhead, requiring $N$ standard normal pseudorandom numbers and $\mathcal{O}(N^2)$ per draw (to multiply by Chokesky factor).

For $M$ draws, the total cost is proportional to $\mathcal{O}(N^3 + M \cdot N^2)$. It generates output of size $\mathcal{O}(M \cdot N)$.

Current Version:

v2.30.0

@bob-carpenter
Copy link
Contributor Author

This is not a duplicate of #2522, which is about specific functional forms. This one is black-box.

@bob-carpenter
Copy link
Contributor Author

The relevant RStan code we will follow is:

if (draws > 0 && is.matrix(R <- try(chol(-H)))) {
  K <- ncol(R)
  R_inv <- backsolve(R, diag(K))
  Z <- matrix(rnorm(K * draws), K, draws)
  theta_tilde <- t(theta + R_inv %*% Z)

Converting this to mathematical notation, first suppose that the mode is $\widehat{\theta}$ and $H(\widehat{\theta})$ is the Hessian at the mode calculated with finite differences. Then comes the high constant-overhead part, where we (once) set

$R^{-1} = \textrm{chol}(-H(\widehat{\theta})) \backslash \mathbf{1}$.

Then we can generate each draw $m$ on the unconstrained scale by sampling

$\theta^{\textrm{std}(m)} \sim \textrm{normal}(0, \textrm{I})$

and defining draw $m$ to be

$\theta^{(m)} = \widehat{\theta} + R^{-1} \cdot \theta^{\textrm{std}(m)}$

We then just need to inverse transform each $\theta^{(m)}$ back to the constrained scale, compute transformed parameters, and generate the generated quantities using the Stan model method write_array().

[GitHub hasn't sorted rendering for LaTeX properly, so it still looks like a ransom note!]

@avehtari
Copy link
Collaborator

I missed this earlier.

This will allow cmdstanr to reproduce the Laplace approximation functionality of RStan.

Currently, RStan is doing a wrong thing, that is, not using the Jacobian adjustment. See illustration of RStan approach and correct Laplace approximation in https://avehtari.github.io/casestudies/Jacobian/jacobian.html

The following signature should work. It requires an estimate theta of the mode (for example, as computed by optimization).

optimization is not using the Jacobian, and thus the estimate computed by optimization is not useful (see the notebook linked above). We want the mode of the Jacobian adjusted target in the unconstrained space.

Furthermore, it would be useful to be able to have the possibility to get the computed Hessian instead of sample

@bob-carpenter
Copy link
Contributor Author

Thanks, @avehtari. You're in luck. I just made a PR to add the Jacobian-adjusted optimizer, so we can get proper Laplace approximations over the Bayesian posterior. So this should all be in the next release of CmdStan and hopefully CmdStanPy and CmdStanR. So now we can get MLE plus std error approximation without Jacobian and MAP plus approximate posteriors in the Bayesian case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants