Refactor SMC and properly compute marginal likelihood #3124

aloctavodia · 2018-07-28T23:29:43Z

This PR introduces many changes in the SMC sampler, as a result the code is simpler (although many features are still missing compared to current SMC) the marginal likelihood is computed properly, I think this could also help with the implementation of SMC-ABC. And the sampler should be faster, although I did not run benchmarks.

Test will certainly fail at this stage I did not even try to check them. This code should not work with variables with shape larger than 1 :-)

I tested using a Beta-binomial model because is simple and the analytical computation on the marginal likelihood is also simple.

In case someone wants to try:

data = np.repeat([1, 0], [50, 50])

def bern_beta(prior, data):
    a, b = prior
    z = np.sum(data)
    N = len(data)
    p_data = np.exp(betaln(a + z, b+N-z) - betaln(a, b))
    return p_data

marginals = []
traces = []

a_prior_0 = 2.
b_prior_0 = 5.
a_prior_1 = 10.
b_prior_1 = 10.
for alfa, beta in ((a_prior_0, b_prior_0), (a_prior_1, b_prior_1)):
    with pm.Model() as model:
        a = pm.Beta('a', alfa, beta)
        y = pm.Bernoulli('y', a, observed=data)
        trace = pm.sample(2000, step=pm.SMC(), random_seed=42)
        marginals.append(model.marginal_likelihood)
        traces.append(trace)
        trace = pm.sample(1000)
        traces.append(trace)
print(marginals[1] / marginals[0])
print(bern_beta([a_prior_1, b_prior_1], data) / bern_beta([a_prior_0, b_prior_0], data))

I get a Bayes factor of 3.35 (with SMC) and 3.40 (analytically). I tried different combinations of priors and results look OK. I get larger variance when one of the priors is very different from the posterior and the other is very close (like (1, 10) (50, 50)), but I think that is expected. Intuitively if the prior and posterior are very similar you will get very few stages (even just one) and hence the estimation should be more noisy and if the prior and the posterior are very different you have to take many intermediate steps and if you do not take enough estimation will also will be noisy (but nor sure about all this I should check something I read once about variance of marginal likelihood estimated by SMC).

This is the result of the previous model comparing the estimation by NUTS and SMC (I know is a simple model, but it could not work :-) results are also close to the analytic posterior)

The code is clunky and feedback very much appreciated!

junpenglao · 2018-08-10T05:37:12Z

pymc3/step_methods/smc.py

-
-def _initial_population(samples, chains, model, variables):
+# FIXME!!!!
+def _initial_population(samples, model, variables):


You can use the sample_prior_predictive function here right?

That's an option. Currently I need an array of samples not a dictionary, but this is something I need to change in order to work with multidimensional variables.

junpenglao · 2018-08-10T05:39:18Z

pymc3/step_methods/smc.py

+    beta : float
+        tempering parameter of the current stage
+    weights : numpy array
+        Importance weights (floats)


need update (beta also appear twice above)

twiecki · 2018-08-10T10:12:04Z

my favorite kind of PR.

junpenglao · 2018-08-16T20:14:54Z

pymc3/step_methods/smc.py

-# FIXME!!!!
-def _initial_population(samples, model, variables):
+
+def _initial_population(chains, model, variables):


Any reason not using sample_prior_predictive?

not really, I will change it.

junpenglao · 2018-08-17T04:22:38Z

LGTM. This is awesome @aloctavodia! Great work ;-)

I will test it on some of my own models in https://github.com/junpenglao/GLMM-in-Python/blob/master/pymc3_different_sampling.py and https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/From_the_blog/Marginal%20Likelihood%20in%20Python.ipynb and report back.

junpenglao · 2018-08-17T04:34:45Z

Dont forget release note.

junpenglao · 2018-08-17T05:19:14Z

Q: I see the sampling is alternating between a fast stag and a slow stag, why is that?

Sample initial stage: ...
Beta: 0.065125 Stage: 0
100%|██████████| 10000/10000 [00:02<00:00, 4300.95it/s]
Beta: 0.172089 Stage: 1
100%|██████████| 10000/10000 [00:15<00:00, 649.18it/s]
Beta: 0.340472 Stage: 2
100%|██████████| 10000/10000 [00:01<00:00, 5214.46it/s]
Beta: 0.540063 Stage: 3
100%|██████████| 10000/10000 [00:15<00:00, 663.20it/s]
Beta: 0.771252 Stage: 4
100%|██████████| 10000/10000 [00:01<00:00, 5239.14it/s]
Beta: 1.000000 Stage: 5
100%|██████████| 10000/10000 [00:19<00:00, 507.09it/s]

junpenglao · 2018-08-17T05:22:43Z

Marginal likelihood estimation is way more accurate, but still different compare to the result from the bridge sampler (see last few cells of the notebook)

aloctavodia · 2018-08-17T11:17:55Z

The number of steps at each stage (and the proposal distribution) are tune taking into account the acceptance rate of the previous stage, thus not all stages take the same time. I guess oscillation are the result of the tuning predicting and easier than real stage. I will keep testing but it seems that using the tuning is effectively reducing the time without sacrificing the quality of the results.

I have been comparing the marginal likelihood with the analytical result of the beta-binomial model, at this point it seems to me that any difference is just due to the SMC approximation (and the intrinsically difficulty of computing the marginal likelihood). Nevertheless, I am almost sure I read something about improving/stabilizing the computation of marginal likelihood using SMC (I just need to remember where!). I would say SMC produce a marginal likelihoods estimation around the true value minus/plus a 20% error, This is an impression from running a couple of examples. Maybe there is a way to report an estimation of the variance or confidence interval (without running SMC several times!)

aloctavodia · 2018-08-17T11:21:48Z

ohhh, I will run your example too and see if there is something I can do to improve the code. I guess we can merge this at this point (I will add a warning message to the sampler) and then I/we can try to polish any rough edges before 3.6

junpenglao · 2018-08-17T11:47:51Z

yep that sounds like a good plan.

[WIP] refactor SMC and properly compute marginal likelihood

902b183

junpenglao reviewed Aug 10, 2018

View reviewed changes

generalize to variables with shape > 1

4c01c55

junpenglao reviewed Aug 16, 2018

View reviewed changes

aloctavodia added 4 commits August 16, 2018 17:34

fix tests

953417a

use sample_prior_predictive to generate initial population

b886adc

fix tests, for real

2a72790

make test more robust, just in case

7f8a40c

junpenglao changed the title ~~[WIP] refactor SMC and properly compute marginal likelihood~~ Refactor SMC and properly compute marginal likelihood Aug 17, 2018

junpenglao added the request discussion label Aug 17, 2018

aloctavodia added 2 commits August 17, 2018 09:02

add experimental warning message add change to release note

05dddb1

Merge branch 'master' into smc_ml

bba5916

junpenglao force-pushed the smc_ml branch from 61fd0ad to bba5916 Compare August 18, 2018 06:19

junpenglao merged commit 096d6fa into pymc-devs:master Aug 18, 2018

junpenglao removed the request discussion label Aug 18, 2018

aloctavodia deleted the smc_ml branch November 17, 2018 14:53

reshamas mentioned this pull request Jul 13, 2022

update notebook rendering: Bayes Factor pymc-devs/pymc-examples#395

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor SMC and properly compute marginal likelihood #3124

Refactor SMC and properly compute marginal likelihood #3124

aloctavodia commented Jul 28, 2018 •

edited

Loading

junpenglao Aug 10, 2018

aloctavodia Aug 10, 2018

junpenglao Aug 10, 2018

twiecki commented Aug 10, 2018

junpenglao Aug 16, 2018

aloctavodia Aug 16, 2018

junpenglao commented Aug 17, 2018

junpenglao commented Aug 17, 2018

junpenglao commented Aug 17, 2018

junpenglao commented Aug 17, 2018

aloctavodia commented Aug 17, 2018

aloctavodia commented Aug 17, 2018

junpenglao commented Aug 17, 2018

Refactor SMC and properly compute marginal likelihood #3124

Refactor SMC and properly compute marginal likelihood #3124

Conversation

aloctavodia commented Jul 28, 2018 • edited Loading

junpenglao Aug 10, 2018

Choose a reason for hiding this comment

aloctavodia Aug 10, 2018

Choose a reason for hiding this comment

junpenglao Aug 10, 2018

Choose a reason for hiding this comment

twiecki commented Aug 10, 2018

junpenglao Aug 16, 2018

Choose a reason for hiding this comment

aloctavodia Aug 16, 2018

Choose a reason for hiding this comment

junpenglao commented Aug 17, 2018

junpenglao commented Aug 17, 2018

junpenglao commented Aug 17, 2018

junpenglao commented Aug 17, 2018

aloctavodia commented Aug 17, 2018

aloctavodia commented Aug 17, 2018

junpenglao commented Aug 17, 2018

aloctavodia commented Jul 28, 2018 •

edited

Loading