-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve tuning by skipping the first samples + add new experimental tuning method #5004
Conversation
@@ -342,6 +360,8 @@ def __init__( | |||
|
|||
def add_sample(self, x, weight): | |||
x = np.asarray(x) | |||
if weight != 1: | |||
raise ValueError("weight is unused and broken") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ValueError("weight is unused and broken") | |
raise ValueError("Setting weight != 1 is not supported.") |
Or maybe we should just remove it all-together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pymc3/tests/test_sampling.py
Outdated
@@ -87,7 +87,7 @@ def test_sample(self): | |||
|
|||
def test_sample_init(self): | |||
with self.model: | |||
for init in ("advi", "advi_map", "map"): | |||
for init in ("advi", "advi_map", "map", "jitter+adapt_diag_grad"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add all the others here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Needs a line in the release-notes. |
This comment has been minimized.
This comment has been minimized.
I noticed something similar when debugging |
This comment has been minimized.
This comment has been minimized.
def add_sample(self, x, weight): | ||
x = np.asarray(x) | ||
if weight != 1: | ||
raise ValueError("Setting weight != 1 is not supported.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def add_sample(self, x, weight): | |
x = np.asarray(x) | |
if weight != 1: | |
raise ValueError("Setting weight != 1 is not supported.") | |
def add_sample(self, x, weight=None): | |
if weight is not None: | |
warning.warn( | |
"Setting weight is no longer supported and and will raise an error in the future.", | |
DeprecationWarning, | |
) | |
x = np.asarray(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a hard break is fine here. This really was internal, unused and wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I would suggest removing the weight argument altogether
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of comments
Thanks @aseyboldt |
Tuning the mass matrix starts right away in the current implementation, but in most models during the first couple of samples we are only moving to the typical set, so we do not get information about the posterior variance at all. In the worst case we learn a mass matrix that doesn't match the posterior at all, so that sampling the the first adaptation window will be very slow (you can see this a slowdown of sampling after step 100). Usually, we will recover from this, but it seems to be better to just skip those samples during adaptation in the first place.
In an example model by @ricardoV94 we can see this behavior clearly when we look at the distance of the currently used mass matrix to the final mass matrix:
This PR also contains an experimental tuning implementation using gradients and samples that can be enabled by
init="jitter+adapt_diag_grad"
. During tests on a few models this seems to be more stable than the only sample based tuning system we use right now, but there are also a few cases where it performs worse. For posteriors that are normal it should converge to the same mass matrix as our current implementation (and much faster), but for non-normal posteriors the result can differ. Unfortunately I don't know of any other way to tell which is better other than trying it on a large number of models.An example notebook can be found here:
https://gist.github.com/aseyboldt/7897fbddacacaa0c86efc917afe9ce3f