idioms for mixture models and inference convergence/speed?

I'm having trouble fitting simple Gaussian mixture models in PyMC3. On only 500 data points, default inference (NUTS with advi) takes 3-4 minutes. Metropolis inference doesn't seem to work well on this problem. It's unclear if this is a failure (or bug) of inference, or if it's an issue in the way I've defined the model. The code is [here](https://gist.github.com/yarden/4025ba0e621bb708993a6e436a1b62a3) and the core definition of the mixture uses ``NormalMixture`` (as was suggested by @twiecki [here](https://github.com/pymc-devs/pymc3/issues/1621)):

```
   with pm.Model() as basic_model:
        weights = pm.Dirichlet("weights",
                               np.ones_like(np.ones(num_mixtures)))
        # uniform prior on mean (mu)
        mu = pm.Uniform("mu", mu_min, mu_max, shape=(num_mixtures,))
        # uniform prior on standard deviation (sd)
        sd = pm.Uniform("sd", sd_min, sd_max, shape=(num_mixtures,))
        obs = pm.NormalMixture("obs", weights, mu, sd=sd,#tau=tau,
                               observed=data)
```


Part of the confusion for me in debugging this is that there are several very different ways to define (finite) mixture models in PyMC3 in the documentation and other resources, and it's not clear which one is the "preferred" one -- or at least, which inference approach should be used with each model writing idiom. The different ways of writing simple mixtures I spotted with PyMC3 were:

1. Using explicit categorical variables for assignments (with ``pm.Categorical``), which appears to make the simple ``Metropolis()`` step method fail (as described [here](http://blog.booleanbiotech.com/static/Mixture%20of%20Gaussians%20in%20PyMC3.html) in detail). (It seems that with categorical variables for assignments in the model, one needs either Gibbs sampling or [categorical variable specific steps](https://github.com/pymc-devs/pymc3/issues/443#issuecomment-131806612) to make sampling-based inference work?)
2. Using [``DensityDist`` and custom log-likelihood functions](https://pymc-devs.github.io/pymc3/notebooks/gaussian-mixture-model-advi.html) -- here, ADVI inference is used.
3. Using categorical  [``pm.Potential`` functions](https://pymc-devs.github.io/pymc3/notebooks/gaussian_mixture_model.html) (to solve label-switching problem, it seems).
4. Using``NormalMixture`` (or ``Mixture`` class), without explicitly modeling categorical assignments, as suggested by @twiecki [here](https://github.com/pymc-devs/pymc3/issues/1621).

All these approaches encode essentially the same Gaussian mixture model. It'd be helpful to have some guidelines on the "canonical" solution here, because these models are so frequently used and familiar, and some suggestions about the right inference method to use with each. For instance, why is the default (NUTS + advi) so slow given the way I've written the mixture model? 

Thanks very much.

Best, Yarden

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

idioms for mixture models and inference convergence/speed? #1776

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

idioms for mixture models and inference convergence/speed? #1776

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions