-
Notifications
You must be signed in to change notification settings - Fork 2.2k
WIP Improve NUTS tuning #1738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Improve NUTS tuning #1738
Conversation
|
|
||
|
|
||
| def sample(draws, step=None, init='advi', n_init=200000, start=None, | ||
| def sample(draws, step=None, n_advi=200000, n_nuts=2000, start=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little uncomfortable encoding particular algorithm names in the arguments of sample. Can we just not condition internally depending on which step is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'm starting to think about this as init-programs. This new one uses ADVI+NUTS. Perhaps just an init_kwargs dict that can get passed is most general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. It limits what we can provide in terms of a docstring, but keeps things general. Another option would be a more explicit init_params argument that could vary depending on the init method chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep exactly. There could be different init-functions and we always forward **init_params.
|
@twiecki I think this can be close now? |
|
close in favor of #2327 |


I started experimenting with better NUTS initialization. What stan does is to run a NUTS chain and then estimate the posterior covariance from that run. This cov matrix is used as the mass matrix for the actual NUTS run.
This PR removes some options of initialization (like MAP) which I think are useless but adds a single path. We also use LedoitWolf regularization here like STAN does. I get better sampling for the tricky hierarchical GLM model.
This PR also makes obvious that NUTS' hyperparameters can't be properly set, as internally we only compute the integration times in log-space, but allow passing in non-log, so this setting of NUTS parameters externally is a bit clunky.
Suggestions welcome.