-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Metropolis cope better with multiple dimensions #5823
Make Metropolis cope better with multiple dimensions #5823
Conversation
a7ab0e4
to
cec9b5c
Compare
Codecov Report
@@ Coverage Diff @@
## main #5823 +/- ##
==========================================
- Coverage 89.40% 89.38% -0.02%
==========================================
Files 74 74
Lines 13769 13783 +14
==========================================
+ Hits 12310 12320 +10
- Misses 1459 1463 +4
|
cec9b5c
to
616cd90
Compare
616cd90
to
53fa596
Compare
# This used to be a scrict zero equality! | ||
assert np.isclose(Q_1_0.mean(axis=1), 0.0, atol=1e-4) | ||
assert np.isclose(Q_2_1.mean(axis=1), 0.0, atol=1e-4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know who is maintaining the MLDA step method, but I had to change this strict equality... Is that okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nobody is actively maintaining it. Also, didn't we want to move it to pymc-experimental
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is still maintained as it was ported by the original team to V4
pymc/step_methods/metropolis.py
Outdated
return scale * 1.1 | ||
|
||
return scale | ||
return np.exp(np.log(scale) + (acc_rate - 0.234)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aloctavodia Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, also the initial guess for the scale should be something like this
Lines 461 to 462 in 5703a9d
ndim = self.tempered_posterior.shape[1] | |
self.proposal_scales = np.full(self.draws, min(1, 2.38**2 / ndim)) |
We can make the scaling changes in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's do it in a separate PR, also because of the MLDA complaining here
53fa596
to
0d8d49b
Compare
I am going to temporarily revert the last commit to see if the tests pass. The Metropolis related tests seem to be performing a bit poorer with the proposed alternative tune rule (or it could very well be just be a seeding thing) |
Dirichlet.dist(a=np.ones(3) * (np.arange(20) + 1)[:, None], shape=(2, 20, 3)), | ||
), | ||
) | ||
def test_elemwise_update(self, batched_dist): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run these new tests 10 times each, with the current new algorithm and the one proposed by @aloctavodia that we use in SMC. Things look sensible in both cases:
10 runs each with different seeds
Tuning method based on switch statement:
1. (0D Bin) max_rhat(mean=1.005, std=0.005), min_ess=(mean=327.4, std=29.9)
2. (1D Bin) max_rhat(mean=1.031, std=0.007), min_ess=(mean=115.5, std=20.9)
3. (2D Bin) max_rhat(mean=1.021, std=0.005), min_ess=(mean=180.0, std=22.5)
4. (2D Dir) max_rhat(mean=1.041, std=0.010), min_ess=(mean=91.2, std=24.7)
5. (3D Dir) max_rhat(mean=1.037, std=0.008), min_ess=(mean=110.4, std=24.4)
Tuning method based on distance from 0.234:
1. (0D Bin) max_rhat(mean=1.009, std=0.008), min_ess=(mean=231.3, std=45.6) # Worse
2. (1D Bin) max_rhat(mean=1.026, std=0.008), min_ess=(mean=206.9, std=21.5) # Better
3. (2D Bin) max_rhat(mean=1.026, std=0.005), min_ess=(mean=188.7, std=22.3)
4. (2D Dir) max_rhat(mean=1.043, std=0.012), min_ess=(mean=98.9, std=27.8)
5. (3D Dir) max_rhat(mean=1.039, std=0.009), min_ess=(mean=102.6, std=21.5)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the MLDA complains a lot about the SMC-like tuning... specially this test fails even if I increase the number of draws: https://github.com/ricardoV94/pymc/blob/3a84db6b91035a7c7f68633bdbb18c5f11efd46f/pymc/tests/test_step.py#L1184
Maybe I am just misunderstanding what that test is supposed to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the standard deviations I'm not that impressed thb.
(Considering that the code is now more complicated..)
Or do you have another application that motivated this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelosthege this is not the change from now vs before the PR, it's the change between this PR and an alternative procedure that @aloctavodia suggested during this PR.
Before this PR there were zero accepted transitions in all the new tests (except the first scalar case)
I updated the headings to make it obvious
Metropolis now updates each dimension sequentially and tunes a proposal scale parameter per dimension
f27dc98
to
f006eda
Compare
@aloctavodia I removed the new tune strategy. I'll open a new PR for that. Do you care to review again? |
# is not safe for discrete multivariate distributions (looking at you Multinomial), | ||
# due to high dependency among the support dimensions. For continuous multivariate | ||
# distributions we assume they are being transformed in a way that makes each | ||
# dimension semi-independent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# dimension semi-independent. | |
# dimension semi-independent. | |
# Consequently we'd like to tune the scaling based on | |
# acceptance rate in each dimension independently. |
# remember initial settings before tuning so they can be reset | ||
self._untuned_settings = dict( | ||
scaling=self.scaling, steps_until_tune=tune_interval, accepted=self.accepted | ||
# Metropolis will try to handle one batched dimension at a time This, however, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Metropolis will try to handle one batched dimension at a time This, however, | |
# Metropolis will try to handle one batched dimension at a time. This, however, |
if self.elemwise_update: | ||
dims = int(sum(np.prod(ivs) for ivs in initial_values_shape)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Should this be an internal attribute?
- Could you come up with a more informative name?
Right now (reading the diff top to bottom) I'm confused because this smells like CompoundStep
, but it's only about elementwise tuning, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment below. CompoundStep
has nothing to do with this.
for i in self.enum_dims: | ||
q_temp[i] = q[i] | ||
accept_rate_i = self.delta_logp(q_temp, q0) | ||
q_temp_, accepted_i = metrop_select(accept_rate_i, q_temp, q0) | ||
q_temp[i] = q_temp_[i] | ||
self.accept_rate_iter[i] = accept_rate_i | ||
self.accepted_iter[i] = accepted_i | ||
self.accepted_sum[i] += accepted_i | ||
q = q_temp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this doing what we usually do with CompoundStep
?
If not, maybe explain what the if/else blocks do in a code comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CompoundStep assigns one variable per step, here we are sampling one dimension within a variable (or within multiple variables) semi-independently
), | ||
), | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, can't we do this in a for
iteration with a break
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably index, that should be faster for large arrays because all branches of np.where
are evaluated by default
Dirichlet.dist(a=np.ones(3) * (np.arange(20) + 1)[:, None], shape=(2, 20, 3)), | ||
), | ||
) | ||
def test_elemwise_update(self, batched_dist): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the standard deviations I'm not that impressed thb.
(Considering that the code is now more complicated..)
Or do you have another application that motivated this?
# This used to be a scrict zero equality! | ||
assert np.isclose(Q_1_0.mean(axis=1), 0.0, atol=1e-4) | ||
assert np.isclose(Q_2_1.mean(axis=1), 0.0, atol=1e-4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nobody is actively maintaining it. Also, didn't we want to move it to pymc-experimental
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Metropolis now updates each dimension sequentially and tunes a proposal scale parameter per dimension
The performance of the Metropolis sampler with multiple dimensions was incredibly poor as highlighted in this old closed issue #4223. After these changes, the performance should now be similar between the scalar and batched cases.
Some care had to be taken towards multivariate discrete distributions where we cannot safely update one dimension at a time (e.g., multinomial), because any intermediate change would result in -inf logp. We should probably implement a smarter sampler just for this distribution, but for the time being we avoid doing elemwise updates when multivariate discrete variables are assigned.