-
-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to turn Jacobian on/off for optimization #2473
Comments
Adding the Jacobian here won't do MAP as it is already doing MAP without the Jacobian. It would only be Maximum Likelihood if there were no priors specified in the model. What exactly is the use case for this feature? I'm hard pressed to understand why someone would want this as the results depend on exactly how we define the constraining transformations which could be changed. |
@avehtari requested this feature, so maybe he can comment. It's a bit opaque to users what's going on now, so one thing we can do is provide better doc. I tried in a case study, but I think I need to try harder. If I'm thinking about this right, what we have is:
I agree that what most users are going to expect is what we have now. I'm guessing Aki may want the latter in order to do a Laplace approximation of the unconstrained posterior, draws from which would then be translated back to the constrained space as an approximate Bayesian method. |
Good summary, which would good to have in the case study, too.
I'm not the only one expecting Jacobian. It seems it's what machine learning people expect.
Yes. The posterior is often closer to normal distribution in unconstrained space. This will especially helpful when we get, for example, compound GLVM+Laplace functions. In these cases it's possible that marginal likelihood evaluations are costly, but the marginal posterior is low dimensional and close to normal in unconstrained space. This what INLA, GPstuff etc. use. |
Right, ok, I get this. My main concern was the MLE vs MAP framing in the original issue description. Perhaps the flag could be something about "optimize posterior in constrained space" vs "optimize posterior in unconstrained space". I can see why some ML people would expect this behaviour, but I still think it's the wrong behaviour for most "normal" (i.e., non-ML) users so we want to make sure the language/description of the option is super clear. |
Thanks, @mbrubake, I'll clarify in the issue---it was my sloppy language. |
I did mention INLA, and INLA is definitely non-ML (and full of normal :)
This is very good proposal. |
Let me chime in here as another user. Sometimes I have to settle for MAP estimates for predictive tasks because I can't afford the computational burden of full MCMC, and VB is still experimental. I'm not trying to do a Laplace approximation, I just want a better point estimate for prediction. In this case I prefer a MAP estimate on the unconstrained space, since I assume that the posterior will be less skewed, and hence the MAP estimate will be more representative of the posterior. |
Thanks for chiming in. It would be interesting to compare the differences from Andrew's perspective of taking MAP plus Hessian to be a kind of approximate Bayes. There are two different ways to do this.
Stan doesn't give you an easy way of calculating the Hessian on the constrained space yet. That's another feature we'd like to be able to get to in order to provide traditional error approximations.
We'll definitely get to this feature.
|
With the new distributional approximation diagnostics https://arxiv.org/abs/1802.02538 we can also find out when the posterior is simple enough (normal enough) that MAP is good, and using PSIS we can then also correct the estimates. It's not going to be common that normal approximation at the mode would be good enough for IS, but in those specific cases the user would be certainly happy if they can be confident that there is no need to run MCMC for better accuracy. |
One consideration worth mentioning in this context is that the results of "MAP in the unconstrained space" will depend on exactly how we parameterize the constraints. That means that results of this mode could change if the parameterization was changed under the hood. This has already happened once when we changed how |
MAP with Jacobian on the unconstrained scale is easy to explain formally. I agree with Marcus that it's going to be a lot harder to get across conceptually. I was never really happy with the case study in which I tried to explain these things, but that's the kind of content I'd be aiming for in the manual, only done more clearly.
The unit_vector thing was a bug fix---it never worked properly until the most recent update from Ben.
|
Closed by #3152 |
Summary:
Add an option to services to include or exclude log Jacobian determinant of the unconstrained to constrained transform.
Description
Right now, we automatically disable the Jacobian for optimization, which produces maximum posterior mode estimates on the constrained scale. This is the maximum a posteriori estimate most users are going to expect.
It would be nice to be able to do max posterior mode estimates on the unconstrained scale. The primary motivation is Laplace approximations on the unconstrained scale, which can be sampled from and then mapped back to the constrained scale.
This will have to start by adding a boolean argument
apply_jacobian
to the service functionlbfgs()
inlbfgs.hpp
for turning the Jacobian off or on. The default value should befalse
.Then, the logic will have to be traced down through the optimizer, which is an instance of
optimization::BFGSLineSearch<Model, optimization::LBFGSUpdate<> >
(confusingly defined in inbfgs.hpp
, not inbfgs_linesearch.hpp)
.From the line search, the flag has to get passed down to instantiate the call to
log_prob_propto
, which is now hard coded tolog_prob_propto<false>
. That will have to be done through the classModelAdaptor
, which should get a template parameter for applying Jacobian.I'm not sure how high up that template parameter should go. At some point, the boolean variable in the services call will have to be used in a conditional to determine which version of
log_prob_propto
to call:because we can't use a dynamic variable as a template parameter.
Current Version:
v2.17.1
The text was updated successfully, but these errors were encountered: