Zero inflated binomial #2251

fonnesbeck · 2017-05-31T21:23:20Z

Adds zero-inflated binomial distribution, following the other zero-inflated distributions in discrete.py. Will eventually be superseded by #2246 but that PR is not yet working. This will make the main zero-inflated distributions available for 3.1.

aseyboldt · 2017-05-31T22:00:44Z

pymc3/distributions/discrete.py

+    p : float
+        Probability of success in each trial (0 < p < 1).
+    psi : float
+        Expected proportion of Poisson variates (0 < psi < 1)


Poisson -> Binomial

aseyboldt · 2017-05-31T22:00:47Z

pymc3/distributions/discrete.py

+        g = generate_samples(stats.binom.rvs, n, p,
+                             dist_shape=self.shape,
+                             size=size)
+        sampled = g * (np.random.random(np.squeeze(g.shape)) < psi)


What is the squeeze about? Doesn't this break broadcasting with g?

Doesn't appear to. I'm just mirroring what is happening in the other ZI dists.

It passes the test_distribution_random test.

aseyboldt · 2017-05-31T22:47:46Z

pymc3/distributions/discrete.py

+    def logp(self, value):
+        return tt.switch(value > 0,
+                         tt.log(self.psi) + self.bin.logp(value),
+                         tt.log((1. - self.psi) + self.psi * tt.pow(1 - self.p, self.n)))


Does this work numerically for the gradient? Maybe a test that this is reasonable for corner cases for p and psi and large n would be nice.

Isnt that what the tests do?

I think this work for the gradient but is numerically unstable. Why not use the log-sum-exp trick as in pm.Mixture? https://github.com/pymc-devs/pymc3/blob/master/pymc3/distributions/mixture.py#L110-L115

Ah, right. I was just following the implementations of the other ZI distributions, but they should be "robustified", yes.

Should be good to go now

aseyboldt · 2017-06-01T16:25:39Z

pymc3/distributions/discrete.py

+        n = self.n
+
+        logp_val = tt.switch(value > 0,
+                 logsumexp(tt.log(psi) + self.bin.logp(value)),


Shouldn't this still be just tt.log(self.psi) + self.bin.logp(value)?

Doesn't logsumexp help here?

The exp is missing :-)
It helps if you have an expression like log(exp(a) + exp(b)). But we are only doing log(a + b) here.

aseyboldt · 2017-06-01T16:35:35Z

pymc3/distributions/discrete.py

+
+        logp_val = tt.switch(value > 0,
+                 logsumexp(tt.log(psi) + self.bin.logp(value)),
+                 logsumexp(tt.log((1. - psi) + psi * tt.pow(1 - p, n))))


I think this should be
logaddexp(tt.log1p(-psi), tt.log(psi) + n * tt.log1p(-p))
where

def logaddexp(a, b): diff = b - a return tt.switch(diff > 0, b + tt.log1p(tt.exp(-diff)), a + tt.log1p(tt.exp(diff)))

You are correct.

Actually, our logsumexp doesn't have the same signature, but you are right in principle.

I think we should just add the logaddexp function. This is nicer (and I think faster) if we have only two variables and comes in handy quite often. I'm surprised that I can't find this in theano already....

In fact, I think logsumexp is used incorrectly in Mixture, if I am reading it correctly.

logsumexp is getting a weight vector [psi, 1.-psi], maybe that's why?

aseyboldt · 2017-06-01T16:37:32Z

pymc3/distributions/discrete.py

    """

-    def __init__(self, theta, psi, *args, **kwargs):
+    def __init__(self, psi, theta, *args, **kwargs):


isn't this a backward incompatible change?

This harmonizes them with the convention in Mixture, which is where they will eventually end up. So, a break is coming in one place or the other.

Then we should definitely put that in the release notes. And maybe also print a warning until 3.2?

junpenglao · 2017-06-01T16:52:37Z

Agree with @aseyboldt, the logp is off, for example:

x = np.concatenate([np.random.poisson(4, size=180), np.zeros(20)])
with pm.Model() as model0:
    ψ = pm.Beta('ψ', 1., 1.)
    θ = pm.Gamma('θ', 1., 1.)
    like = ZeroInflatedPoisson('like', psi=ψ, theta=θ, observed=x)
    tr0 = pm.sample(3000, init=None, njobs=2)
pm.traceplot(tr0);

gives:

fonnesbeck · 2017-06-01T18:33:01Z

OK, getting something more reasonable now:

aseyboldt · 2017-06-01T21:10:11Z

LGTM

junpenglao · 2017-06-02T05:16:36Z

Tested locally, works perfect (even the case i showed you few days ago in a nb @fonnesbeck)

fonnesbeck added 6 commits May 27, 2017 14:46

Added zero-inflated binomial distribution

b407bdb

Fixed syntax error

524b88e

Added ZeroInflatedBinomial to global namespace

17b27b0

Fixed attribute error in call to scipy.stats

4fd07b8

Imported ZeroInflatedBinomial in test_distributions

999be1e

Merge branch 'master' into zero_inflated_binomial

0929f56

aseyboldt reviewed May 31, 2017

View reviewed changes

fonnesbeck added 2 commits May 31, 2017 18:44

Fixed cut and paste typo

a2f589c

Robustified logp calculations in zero-inflated distributions

a99e429

aseyboldt reviewed Jun 1, 2017

View reviewed changes

fonnesbeck added 4 commits June 1, 2017 13:17

Added logaddexp

beb489d

Fixing logp methods of zero-inflated distributions

b7daa3e

Replaced stray occurrence of logsumexp with logaddexp

30d4a0a

Replaced stray log with log1p

5184768

fonnesbeck merged commit 0974ecc into master Jun 2, 2017

fonnesbeck deleted the zero_inflated_binomial branch June 2, 2017 12:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero inflated binomial #2251

Zero inflated binomial #2251

fonnesbeck commented May 31, 2017

aseyboldt May 31, 2017

aseyboldt May 31, 2017

fonnesbeck May 31, 2017

fonnesbeck May 31, 2017

aseyboldt Jun 1, 2017

aseyboldt May 31, 2017

fonnesbeck May 31, 2017

junpenglao Jun 1, 2017

fonnesbeck Jun 1, 2017

fonnesbeck Jun 1, 2017

aseyboldt Jun 1, 2017

fonnesbeck Jun 1, 2017

aseyboldt Jun 1, 2017

aseyboldt Jun 1, 2017

fonnesbeck Jun 1, 2017

fonnesbeck Jun 1, 2017

aseyboldt Jun 1, 2017

fonnesbeck Jun 1, 2017

junpenglao Jun 1, 2017

aseyboldt Jun 1, 2017

fonnesbeck Jun 1, 2017

aseyboldt Jun 1, 2017

junpenglao commented Jun 1, 2017

fonnesbeck commented Jun 1, 2017

aseyboldt commented Jun 1, 2017

junpenglao commented Jun 2, 2017

Zero inflated binomial #2251

Zero inflated binomial #2251

Conversation

fonnesbeck commented May 31, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junpenglao commented Jun 1, 2017

fonnesbeck commented Jun 1, 2017

aseyboldt commented Jun 1, 2017

junpenglao commented Jun 2, 2017