Switch to turn Jacobian on/off for optimization #2473

bob-carpenter · 2018-02-20T21:22:23Z

Summary:

Add an option to services to include or exclude log Jacobian determinant of the unconstrained to constrained transform.

Description

Right now, we automatically disable the Jacobian for optimization, which produces maximum posterior mode estimates on the constrained scale. This is the maximum a posteriori estimate most users are going to expect.

It would be nice to be able to do max posterior mode estimates on the unconstrained scale. The primary motivation is Laplace approximations on the unconstrained scale, which can be sampled from and then mapped back to the constrained scale.

This will have to start by adding a boolean argument apply_jacobian to the service function lbfgs() in lbfgs.hpp for turning the Jacobian off or on. The default value should be false.

Then, the logic will have to be traced down through the optimizer, which is an instance of optimization::BFGSLineSearch<Model, optimization::LBFGSUpdate<> > (confusingly defined in in bfgs.hpp, not in bfgs_linesearch.hpp).

From the line search, the flag has to get passed down to instantiate the call to log_prob_propto, which is now hard coded to log_prob_propto<false>. That will have to be done through the class ModelAdaptor, which should get a template parameter for applying Jacobian.

I'm not sure how high up that template parameter should go. At some point, the boolean variable in the services call will have to be used in a conditional to determine which version of log_prob_propto to call:

if (apply_jacobian)
  log_prob_propto<true>(...);
else
  log_prob_propto<false>(...);

because we can't use a dynamic variable as a template parameter.

Current Version:

v2.17.1

The text was updated successfully, but these errors were encountered:

mbrubake · 2018-02-21T05:13:54Z

Adding the Jacobian here won't do MAP as it is already doing MAP without the Jacobian. It would only be Maximum Likelihood if there were no priors specified in the model.

What exactly is the use case for this feature? I'm hard pressed to understand why someone would want this as the results depend on exactly how we define the constraining transformations which could be changed.

bob-carpenter · 2018-02-21T15:55:53Z

@avehtari requested this feature, so maybe he can comment.

It's a bit opaque to users what's going on now, so one thing we can do is provide better doc. I tried in a case study, but I think I need to try harder.

If I'm thinking about this right, what we have is:

no Jacobian: MAP for the constrained model
Jacobian: MAP for the unconstrained model

I agree that what most users are going to expect is what we have now. I'm guessing Aki may want the latter in order to do a Laplace approximation of the unconstrained posterior, draws from which would then be translated back to the constrained space as an approximate Bayesian method.

avehtari · 2018-02-21T17:06:57Z

no Jacobian: MAP for the constrained model

Jacobian: MAP for the unconstrained model

Good summary, which would good to have in the case study, too.

I agree that what most users are going to expect is what we have now.

I'm not the only one expecting Jacobian. It seems it's what machine learning people expect.

I'm guessing Aki may want the latter in order to do a Laplace approximation of the unconstrained posterior, draws from which would then be translated back to the constrained space as an approximate Bayesian method.

Yes. The posterior is often closer to normal distribution in unconstrained space. This will especially helpful when we get, for example, compound GLVM+Laplace functions. In these cases it's possible that marginal likelihood evaluations are costly, but the marginal posterior is low dimensional and close to normal in unconstrained space. This what INLA, GPstuff etc. use.

mbrubake · 2018-02-22T00:15:55Z

Right, ok, I get this. My main concern was the MLE vs MAP framing in the original issue description. Perhaps the flag could be something about "optimize posterior in constrained space" vs "optimize posterior in unconstrained space".

I can see why some ML people would expect this behaviour, but I still think it's the wrong behaviour for most "normal" (i.e., non-ML) users so we want to make sure the language/description of the option is super clear.

bob-carpenter · 2018-02-22T00:36:32Z

Thanks, @mbrubake, I'll clarify in the issue---it was my sloppy language.

avehtari · 2018-02-22T16:24:39Z

I can see why some ML people would expect this behaviour, but I still think it's the wrong behaviour for most "normal" (i.e., non-ML) users so we want to make sure the language/description of the option is super clear.

I did mention INLA, and INLA is definitely non-ML (and full of normal :)

Perhaps the flag could be something about "optimize posterior in constrained space" vs "optimize posterior in unconstrained space".

This is very good proposal.

ksvanhorn · 2018-03-12T15:59:22Z

Let me chime in here as another user. Sometimes I have to settle for MAP estimates for predictive tasks because I can't afford the computational burden of full MCMC, and VB is still experimental. I'm not trying to do a Laplace approximation, I just want a better point estimate for prediction. In this case I prefer a MAP estimate on the unconstrained space, since I assume that the posterior will be less skewed, and hence the MAP estimate will be more representative of the posterior.

bob-carpenter · 2018-03-12T17:03:56Z

Thanks for chiming in. It would be interesting to compare the differences from Andrew's perspective of taking MAP plus Hessian to be a kind of approximate Bayes. There are two different ways to do this. Stan doesn't give you an easy way of calculating the Hessian on the constrained space yet. That's another feature we'd like to be able to get to in order to provide traditional error approximations. We'll definitely get to this feature.

avehtari · 2018-03-16T11:46:53Z

It would be interesting to compare the differences from Andrew's perspective of taking MAP plus Hessian to be a kind of approximate Bayes.

With the new distributional approximation diagnostics https://arxiv.org/abs/1802.02538 we can also find out when the posterior is simple enough (normal enough) that MAP is good, and using PSIS we can then also correct the estimates. It's not going to be common that normal approximation at the mode would be good enough for IS, but in those specific cases the user would be certainly happy if they can be confident that there is no need to run MCMC for better accuracy.

mbrubake · 2018-03-16T13:54:57Z

One consideration worth mentioning in this context is that the results of "MAP in the unconstrained space" will depend on exactly how we parameterize the constraints. That means that results of this mode could change if the parameterization was changed under the hood. This has already happened once when we changed how unit_vector was parameterized. It's probably not going to happen a lot, but it's not inconceivable.

bob-carpenter · 2018-03-16T17:59:07Z

MAP with Jacobian on the unconstrained scale is easy to explain formally. I agree with Marcus that it's going to be a lot harder to get across conceptually. I was never really happy with the case study in which I tried to explain these things, but that's the kind of content I'd be aiming for in the manual, only done more clearly. The unit_vector thing was a bug fix---it never worked properly until the most recent update from Ben.

WardBrian · 2023-03-20T18:07:39Z

Closed by #3152

bob-carpenter added the feature label Feb 20, 2018

bob-carpenter added this to the 2.17.1++ milestone Feb 20, 2018

bob-carpenter assigned seantalts and mitzimorris Feb 20, 2018

martinmodrak mentioned this issue Jun 4, 2018

Consolidate and refactor output from algorithms #2534

Open

seantalts modified the milestones: 2.18.0, 2.18.0++ Jul 13, 2018

mitzimorris removed this from the 2.18.1 milestone Dec 23, 2018

alashworth mentioned this issue Mar 12, 2019

Switch to turn Jacobian on/off for optimization alashworth/test-issue-import#184

Open

WardBrian closed this as completed Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to turn Jacobian on/off for optimization #2473

Switch to turn Jacobian on/off for optimization #2473

bob-carpenter commented Feb 20, 2018 •

edited

Loading

mbrubake commented Feb 21, 2018

bob-carpenter commented Feb 21, 2018

avehtari commented Feb 21, 2018

mbrubake commented Feb 22, 2018

bob-carpenter commented Feb 22, 2018

avehtari commented Feb 22, 2018

ksvanhorn commented Mar 12, 2018 •

edited

Loading

bob-carpenter commented Mar 12, 2018 via email

avehtari commented Mar 16, 2018

mbrubake commented Mar 16, 2018

bob-carpenter commented Mar 16, 2018 via email

WardBrian commented Mar 20, 2023

Switch to turn Jacobian on/off for optimization #2473

Switch to turn Jacobian on/off for optimization #2473

Comments

bob-carpenter commented Feb 20, 2018 • edited Loading

Summary:

Description

Current Version:

mbrubake commented Feb 21, 2018

bob-carpenter commented Feb 21, 2018

avehtari commented Feb 21, 2018

mbrubake commented Feb 22, 2018

bob-carpenter commented Feb 22, 2018

avehtari commented Feb 22, 2018

ksvanhorn commented Mar 12, 2018 • edited Loading

bob-carpenter commented Mar 12, 2018 via email

avehtari commented Mar 16, 2018

mbrubake commented Mar 16, 2018

bob-carpenter commented Mar 16, 2018 via email

WardBrian commented Mar 20, 2023

bob-carpenter commented Feb 20, 2018 •

edited

Loading

ksvanhorn commented Mar 12, 2018 •

edited

Loading