Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Metropolis cope better with multiple dimensions #5823

Merged
merged 2 commits into from
May 31, 2022

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented May 30, 2022

Metropolis now updates each dimension sequentially and tunes a proposal scale parameter per dimension

The performance of the Metropolis sampler with multiple dimensions was incredibly poor as highlighted in this old closed issue #4223. After these changes, the performance should now be similar between the scalar and batched cases.

Some care had to be taken towards multivariate discrete distributions where we cannot safely update one dimension at a time (e.g., multinomial), because any intermediate change would result in -inf logp. We should probably implement a smarter sampler just for this distribution, but for the time being we avoid doing elemwise updates when multivariate discrete variables are assigned.

@codecov
Copy link

codecov bot commented May 30, 2022

Codecov Report

Merging #5823 (f27dc98) into main (57654dc) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head f27dc98 differs from pull request most recent head f006eda. Consider uploading reports for the commit f006eda to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5823      +/-   ##
==========================================
- Coverage   89.40%   89.38%   -0.02%     
==========================================
  Files          74       74              
  Lines       13769    13783      +14     
==========================================
+ Hits        12310    12320      +10     
- Misses       1459     1463       +4     
Impacted Files Coverage Δ
pymc/step_methods/metropolis.py 83.36% <100.00%> (+0.38%) ⬆️
pymc/step_methods/mlda.py 96.37% <100.00%> (ø)
pymc/parallel_sampling.py 86.46% <0.00%> (-1.00%) ⬇️
pymc/step_methods/hmc/base_hmc.py 89.68% <0.00%> (-0.80%) ⬇️
pymc/distributions/logprob.py 97.65% <0.00%> (ø)
pymc/aesaraf.py 91.95% <0.00%> (+0.06%) ⬆️

@ricardoV94 ricardoV94 force-pushed the improve_multidim_metropolis branch from cec9b5c to 616cd90 Compare May 30, 2022 16:55
@ricardoV94 ricardoV94 force-pushed the improve_multidim_metropolis branch from 616cd90 to 53fa596 Compare May 31, 2022 07:24
Comment on lines +1345 to +1347
# This used to be a scrict zero equality!
assert np.isclose(Q_1_0.mean(axis=1), 0.0, atol=1e-4)
assert np.isclose(Q_2_1.mean(axis=1), 0.0, atol=1e-4)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know who is maintaining the MLDA step method, but I had to change this strict equality... Is that okay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nobody is actively maintaining it. Also, didn't we want to move it to pymc-experimental?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is still maintained as it was ported by the original team to V4

return scale * 1.1

return scale
return np.exp(np.log(scale) + (acc_rate - 0.234))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aloctavodia Is this correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, also the initial guess for the scale should be something like this

pymc/pymc/smc/smc.py

Lines 461 to 462 in 5703a9d

ndim = self.tempered_posterior.shape[1]
self.proposal_scales = np.full(self.draws, min(1, 2.38**2 / ndim))
(for problem with a few dimensions is the same as starting from 1)

We can make the scaling changes in a different PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's do it in a separate PR, also because of the MLDA complaining here

@ricardoV94 ricardoV94 requested a review from aloctavodia May 31, 2022 07:26
@ricardoV94 ricardoV94 force-pushed the improve_multidim_metropolis branch from 53fa596 to 0d8d49b Compare May 31, 2022 07:36
@ricardoV94 ricardoV94 marked this pull request as ready for review May 31, 2022 07:36
@ricardoV94
Copy link
Member Author

ricardoV94 commented May 31, 2022

I am going to temporarily revert the last commit to see if the tests pass. The Metropolis related tests seem to be performing a bit poorer with the proposed alternative tune rule (or it could very well be just be a seeding thing)

@ricardoV94 ricardoV94 marked this pull request as draft May 31, 2022 10:01
Dirichlet.dist(a=np.ones(3) * (np.arange(20) + 1)[:, None], shape=(2, 20, 3)),
),
)
def test_elemwise_update(self, batched_dist):
Copy link
Member Author

@ricardoV94 ricardoV94 May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I run these new tests 10 times each, with the current new algorithm and the one proposed by @aloctavodia that we use in SMC. Things look sensible in both cases:

10 runs each with different seeds

Tuning method based on switch statement:

1. (0D Bin) max_rhat(mean=1.005, std=0.005), min_ess=(mean=327.4, std=29.9)
2. (1D Bin) max_rhat(mean=1.031, std=0.007), min_ess=(mean=115.5, std=20.9)
3. (2D Bin) max_rhat(mean=1.021, std=0.005), min_ess=(mean=180.0, std=22.5)
4. (2D Dir) max_rhat(mean=1.041, std=0.010), min_ess=(mean=91.2, std=24.7)
5. (3D Dir) max_rhat(mean=1.037, std=0.008), min_ess=(mean=110.4, std=24.4)

Tuning method based on distance from 0.234:

1. (0D Bin) max_rhat(mean=1.009, std=0.008), min_ess=(mean=231.3, std=45.6)  # Worse
2. (1D Bin) max_rhat(mean=1.026, std=0.008), min_ess=(mean=206.9, std=21.5)  # Better
3. (2D Bin) max_rhat(mean=1.026, std=0.005), min_ess=(mean=188.7, std=22.3)
4. (2D Dir) max_rhat(mean=1.043, std=0.012), min_ess=(mean=98.9, std=27.8)
5. (3D Dir) max_rhat(mean=1.039, std=0.009), min_ess=(mean=102.6, std=21.5)

Copy link
Member Author

@ricardoV94 ricardoV94 May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the MLDA complains a lot about the SMC-like tuning... specially this test fails even if I increase the number of draws: https://github.com/ricardoV94/pymc/blob/3a84db6b91035a7c7f68633bdbb18c5f11efd46f/pymc/tests/test_step.py#L1184

Maybe I am just misunderstanding what that test is supposed to check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the standard deviations I'm not that impressed thb.
(Considering that the code is now more complicated..)

Or do you have another application that motivated this?

Copy link
Member Author

@ricardoV94 ricardoV94 May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelosthege this is not the change from now vs before the PR, it's the change between this PR and an alternative procedure that @aloctavodia suggested during this PR.

Before this PR there were zero accepted transitions in all the new tests (except the first scalar case)

I updated the headings to make it obvious

Metropolis now updates each dimension sequentially and tunes a proposal scale parameter per dimension
@ricardoV94 ricardoV94 force-pushed the improve_multidim_metropolis branch from f27dc98 to f006eda Compare May 31, 2022 13:52
@ricardoV94 ricardoV94 marked this pull request as ready for review May 31, 2022 13:53
@ricardoV94
Copy link
Member Author

@aloctavodia I removed the new tune strategy. I'll open a new PR for that. Do you care to review again?

# is not safe for discrete multivariate distributions (looking at you Multinomial),
# due to high dependency among the support dimensions. For continuous multivariate
# distributions we assume they are being transformed in a way that makes each
# dimension semi-independent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# dimension semi-independent.
# dimension semi-independent.
# Consequently we'd like to tune the scaling based on
# acceptance rate in each dimension independently.

# remember initial settings before tuning so they can be reset
self._untuned_settings = dict(
scaling=self.scaling, steps_until_tune=tune_interval, accepted=self.accepted
# Metropolis will try to handle one batched dimension at a time This, however,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Metropolis will try to handle one batched dimension at a time This, however,
# Metropolis will try to handle one batched dimension at a time. This, however,

Comment on lines +213 to +214
if self.elemwise_update:
dims = int(sum(np.prod(ivs) for ivs in initial_values_shape))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should this be an internal attribute?
  2. Could you come up with a more informative name?

Right now (reading the diff top to bottom) I'm confused because this smells like CompoundStep, but it's only about elementwise tuning, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below. CompoundStep has nothing to do with this.

Comment on lines +268 to +276
for i in self.enum_dims:
q_temp[i] = q[i]
accept_rate_i = self.delta_logp(q_temp, q0)
q_temp_, accepted_i = metrop_select(accept_rate_i, q_temp, q0)
q_temp[i] = q_temp_[i]
self.accept_rate_iter[i] = accept_rate_i
self.accepted_iter[i] = accepted_i
self.accepted_sum[i] += accepted_i
q = q_temp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this doing what we usually do with CompoundStep?

If not, maybe explain what the if/else blocks do in a code comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompoundStep assigns one variable per step, here we are sampling one dimension within a variable (or within multiple variables) semi-independently

),
),
),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, can't we do this in a for iteration with a break?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably index, that should be faster for large arrays because all branches of np.where are evaluated by default

Dirichlet.dist(a=np.ones(3) * (np.arange(20) + 1)[:, None], shape=(2, 20, 3)),
),
)
def test_elemwise_update(self, batched_dist):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the standard deviations I'm not that impressed thb.
(Considering that the code is now more complicated..)

Or do you have another application that motivated this?

Comment on lines +1345 to +1347
# This used to be a scrict zero equality!
assert np.isclose(Q_1_0.mean(axis=1), 0.0, atol=1e-4)
assert np.isclose(Q_2_1.mean(axis=1), 0.0, atol=1e-4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nobody is actively maintaining it. Also, didn't we want to move it to pymc-experimental?

Copy link
Member

@aloctavodia aloctavodia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@twiecki twiecki merged commit 1e7d91f into pymc-devs:main May 31, 2022
@ricardoV94 ricardoV94 deleted the improve_multidim_metropolis branch June 6, 2023 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants