-
Notifications
You must be signed in to change notification settings - Fork 48
Draft of Nutpie/Nuts-rs mass matrix adaption #312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
From a very quick look:
If you experiment with those, I'd love to see the results. :-) I don't think the cost of updating the mass matrix will matter that much in real word problems however. This still only happens once per draw, so every couple of gradient evals. And gradient evals are usually significantly more expensive than a diag mass matrix update. |
|
Quick update:
Results: Poor! I'm unable to get Nutpie to sample properly.
Initial results for the Diamonds model below (incl. the code to reproduce the example) Code to reproduce the above chart |
|
I don't know for sure, bad step size adaptation might explain what you are seeing. If the mass matrix itself is good, but the final step size doesn't match the mass matrix, you might see lot's of divergences. And shouldn't this be a final step size adaptation, instead of mass matrix adaptation? |
Good tip! Thanks - I'll look into it.
I've removed these files, they were just background artefacts. The one you references is a chatGPT re-write of your codebase - I think the line you referenced is this one. The actual implementation is here, where it first updates the step size (same as default in this package) and then we update the mass_matrix as per Nutpie here (it's decoupled from the This |
|
@yebai: This may be the easiest thing for me to review or reimplement. |
|
Sure, please feel free to give it a go. |
|
hi there! Sorry, but this is abandoned-- when I benchmarked it, I didnt see enough benefits to complete it. So probably not worth reviewing? |
No worries! I'd agree with @aseyboldt that there has to be something going wrong with the step size adaptation - even with a garbage (but finite) mass matrix, the sampler should never diverge for the diamonds posterior, as it's (IIRC) essentially a normal posterior.
I'll double check things, and if it's too much work to fix things I might just use this PR as a template :) In any case, thanks for the work so far. (Tagging @sethaxen in case he has comments) |
|
I'd think this could be closed then - the full additions intended to be implemented here would be in two separate PRs, part 1 would be #473, and part 2 (allowing for different warm-up schedules) would be (potentially) forthcoming. |



NOT READY!
Quick draft of the new mass matrix adaption begin tested/used by PyMC team for discussion. Attempt at #311
Notes:
It's a draft -- my VS Code made several formatting changes that I'd need to unwind + I need to take out changes to Project.toml (used for running examples)