Closed
Description
Right now the Mixture class allows to mix distributions of any kind, leading to improper logp evaluations where probability densities are mixed with probability masses.
norm = pm.Normal.dist(0, 1)
pois = pm.Poisson.dist(1)
mix = pm.Mixture.dist([0.5, 0.5], comp_dists=[norm, pois], shape=1)
# Observed integer could only have come from Poisson component, so logp should be:
np.log(0.5 * np.exp(pois.logp(0))).eval() # -1.69
# But it's higher:
mix.logp(0).eval() # -0.958
# Observed float could only have come from Normal component, so logp should be:
np.log(0.5 * np.exp(norm.logp(1/3))).eval() # -1.66
# But it's higher:
mix.logp(1/3).eval() # -0.93
# This is not a problem when evaluating points where the domains do not overlap:
mix.logp(-1).eval() # -2.11
np.log(0.5 * np.exp(norm.logp(-1))).eval() # -2.11
Questions related to the mixture of continuous and discrete distributions crop up now and then on the Discourse:
- https://discourse.pymc.io/t/sampling-from-a-learned-mixture-of-zeros-and-lognormal/3671
- https://discourse.pymc.io/t/zero-inflated-student-t/6170
- https://discourse.pymc.io/t/zero-inflated-normal/6857
- https://discourse.pymc.io/t/mixture-of-continuous-and-discrete-logp/2392
The STAN forum has an informative discussion on this:
It seems to me like this is an area where we could easily nudge users in the right direction, by raising an informative value error.