Skip to content

idioms for mixture models and inference convergence/speed? #1776

@yarden

Description

@yarden

I'm having trouble fitting simple Gaussian mixture models in PyMC3. On only 500 data points, default inference (NUTS with advi) takes 3-4 minutes. Metropolis inference doesn't seem to work well on this problem. It's unclear if this is a failure (or bug) of inference, or if it's an issue in the way I've defined the model. The code is here and the core definition of the mixture uses NormalMixture (as was suggested by @twiecki here):

   with pm.Model() as basic_model:
        weights = pm.Dirichlet("weights",
                               np.ones_like(np.ones(num_mixtures)))
        # uniform prior on mean (mu)
        mu = pm.Uniform("mu", mu_min, mu_max, shape=(num_mixtures,))
        # uniform prior on standard deviation (sd)
        sd = pm.Uniform("sd", sd_min, sd_max, shape=(num_mixtures,))
        obs = pm.NormalMixture("obs", weights, mu, sd=sd,#tau=tau,
                               observed=data)

Part of the confusion for me in debugging this is that there are several very different ways to define (finite) mixture models in PyMC3 in the documentation and other resources, and it's not clear which one is the "preferred" one -- or at least, which inference approach should be used with each model writing idiom. The different ways of writing simple mixtures I spotted with PyMC3 were:

  1. Using explicit categorical variables for assignments (with pm.Categorical), which appears to make the simple Metropolis() step method fail (as described here in detail). (It seems that with categorical variables for assignments in the model, one needs either Gibbs sampling or categorical variable specific steps to make sampling-based inference work?)
  2. Using DensityDist and custom log-likelihood functions -- here, ADVI inference is used.
  3. Using categorical pm.Potential functions (to solve label-switching problem, it seems).
  4. UsingNormalMixture (or Mixture class), without explicitly modeling categorical assignments, as suggested by @twiecki here.

All these approaches encode essentially the same Gaussian mixture model. It'd be helpful to have some guidelines on the "canonical" solution here, because these models are so frequently used and familiar, and some suggestions about the right inference method to use with each. For instance, why is the default (NUTS + advi) so slow given the way I've written the mixture model?

Thanks very much.

Best, Yarden

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions