-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
I'm having trouble fitting simple Gaussian mixture models in PyMC3. On only 500 data points, default inference (NUTS with advi) takes 3-4 minutes. Metropolis inference doesn't seem to work well on this problem. It's unclear if this is a failure (or bug) of inference, or if it's an issue in the way I've defined the model. The code is here and the core definition of the mixture uses NormalMixture (as was suggested by @twiecki here):
with pm.Model() as basic_model:
weights = pm.Dirichlet("weights",
np.ones_like(np.ones(num_mixtures)))
# uniform prior on mean (mu)
mu = pm.Uniform("mu", mu_min, mu_max, shape=(num_mixtures,))
# uniform prior on standard deviation (sd)
sd = pm.Uniform("sd", sd_min, sd_max, shape=(num_mixtures,))
obs = pm.NormalMixture("obs", weights, mu, sd=sd,#tau=tau,
observed=data)
Part of the confusion for me in debugging this is that there are several very different ways to define (finite) mixture models in PyMC3 in the documentation and other resources, and it's not clear which one is the "preferred" one -- or at least, which inference approach should be used with each model writing idiom. The different ways of writing simple mixtures I spotted with PyMC3 were:
- Using explicit categorical variables for assignments (with
pm.Categorical), which appears to make the simpleMetropolis()step method fail (as described here in detail). (It seems that with categorical variables for assignments in the model, one needs either Gibbs sampling or categorical variable specific steps to make sampling-based inference work?) - Using
DensityDistand custom log-likelihood functions -- here, ADVI inference is used. - Using categorical
pm.Potentialfunctions (to solve label-switching problem, it seems). - Using
NormalMixture(orMixtureclass), without explicitly modeling categorical assignments, as suggested by @twiecki here.
All these approaches encode essentially the same Gaussian mixture model. It'd be helpful to have some guidelines on the "canonical" solution here, because these models are so frequently used and familiar, and some suggestions about the right inference method to use with each. For instance, why is the default (NUTS + advi) so slow given the way I've written the mixture model?
Thanks very much.
Best, Yarden