Refactor missing discrete dists #4684

ricardoV94 · 2021-05-11T15:07:00Z

This PR refactors all missing univariate discrete distributions.

I decided to refactor the Constant distribution after some debate on Slack as to whether this should be deprecated or not. Its direct use might be pretty problematic without a custom step algorithm, but could still be useful indirectly, such as in mixture distributions. I made the dtype of the RV be floatX to accept non-integer outputs (even though this distribution should be treated as a pmf, see Mixture should not allow mixing of discrete and continuous distributions #4511 for why this might matter)
I created separate RandomOps for the ZeroInflated* distributions for now. Similarly to Replace custom RandomVariable for WeibullBetaRV #4683, this should only be done temporarily until we have Mixture Distributions working on V4 (it seems such approach was already attempted in the past WIP: Convert zero-inflated distributions to Mixture subclasses #2246, WIP Implement ZeroInflatedPoisson as a subclass of Mixture #1459)

Depending on what your PR does, here are a few things you might want to address in the description:

what are the (breaking) changes that this PR makes?
important background, or details about the implementation
are the changes—especially new features—covered by tests and docstrings?
linting/style checks have been run
consider adding/updating relevant example notebooks
right before it's ready to merge, mention the PR in the RELEASE-NOTES.md

ricardoV94 · 2021-05-11T16:01:50Z

pymc3/distributions/discrete.py

+    @classmethod
+    def rng_fn(cls, rng, c, size=None):
+        ones = np.ones_like(c)
+        return rng.randint(ones, ones + 1, size=size) * c


There must be a less hacking way to get a filled array given c and size, that works as other random generators would. np.full(size, c) doesn't seem to cut it, because it works with shape and not with size.

Isn't shape and size the same thing here?

No because c could be a vector and size could be None for instance

Okay, actually I think the only problem was size=None. I added a check for that condition...

twiecki

This looks great!

twiecki · 2021-05-11T17:21:35Z

Thanks for taking this on @ricardoV94. Separately, when I tried V4 it seems like other continuous distributions are also missing, like StudentT?

ricardoV94 · 2021-05-11T17:25:27Z

Yes they are missing.

Only distributions that have a rv_op at the top have been refactored. StudentT hasn't yet.

I won't have more time this week, so if someone wants to open a PR for the missing continuous distributions it would be great. The logic should be pretty similar to what is done in this PR with the custom RandomOps

brandonwillard

Looks good, but we really shouldn't rely on Distribution.logp functions, since those aren't officially part of the Distribution API—they're just a means of conveniently specifying dispatches.

pymc3/distributions/discrete.py

ricardoV94 · 2021-05-12T09:23:01Z

Looks good, but we really shouldn't rely on Distribution.logp functions, since those aren't officially part of the Distribution API—they're just a means of conveniently specifying dispatches.

@brandonwillard I am not sure what you have in mind. This looks wrong, right?

# at.log(psi) + Poisson.logp(value, theta)
at.log(psi) + _logp(poisson, value, {value: value}, theta)

I also tried this, but it fails following a change_rv_size operation:

at.log(psi) + logpt(Poisson.dist(theta), value)

brandonwillard · 2021-05-12T17:24:43Z

@brandonwillard I am not sure what you have in mind. This looks wrong, right?
# at.log(psi) + Poisson.logp(value, theta),
at.log(psi) + _logp(poisson, value, {value: value}, theta),

No, that looks like the correct approach.

ricardoV94 · 2021-05-12T17:43:13Z

Could you give some intuition why this is better than the original approach?

brandonwillard · 2021-05-12T18:33:55Z

Could you give some intuition why this is better than the original approach?

As I stated in my original review comments, these Distribution.logp functions are not a part of the Distribution API and they are neither designed nor intended to be used in this way. Their availability is questionable and subject to arbitrary change, so we don't want to use them outside of their intended purpose.

More importantly, we do not want to rely on the Distribution types in general, and especially not for log-likelihoods; they're currently just a legacy transition artifact. This is clearly demonstrated by the lack of Distribution in the log-likelihood code itself.

Simply put, this approach unnecessarily adds Distributions into the log-likelihood framework.

If this sort of use case needs to be simplified, it can be done in better ways that are based only on the underlying dispatch.

twiecki · 2021-05-14T06:35:36Z

🥳

ricardoV94 force-pushed the refactor_discrete_dists branch from 12bfdfd to df0222e Compare May 11, 2021 15:58

ricardoV94 commented May 11, 2021

View reviewed changes

ricardoV94 force-pushed the refactor_discrete_dists branch from df0222e to cbd72dd Compare May 11, 2021 16:07

twiecki previously approved these changes May 11, 2021

View reviewed changes

twiecki requested a review from brandonwillard May 11, 2021 17:21

brandonwillard suggested changes May 11, 2021

View reviewed changes

ricardoV94 dismissed twiecki’s stale review via 6aae83e May 12, 2021 09:09

ricardoV94 force-pushed the refactor_discrete_dists branch 2 times, most recently from 6aae83e to e8ec87d Compare May 12, 2021 09:15

ricardoV94 added 10 commits May 12, 2021 13:45

Fix Uniform logp regression from pymc-devs#4541

fa0e930

Refactor DiscreteUniform

03e1df5

Refactor Constant

ea7afba

Refactor OrderedLogistic

2a1071d

Refactor OrderedProbit

9c40f8f

Add missing discrete distributions to API rst

5c723b3

Refactor ZeroInflatedPoisson

ce252fa

Refactor ZeroInflatedBinomial

7a7405f

Refactor ZeroInflatedNegativeBinomial

43f75dd

Update several test xfails

d13e871

ricardoV94 force-pushed the refactor_discrete_dists branch from e8ec87d to d13e871 Compare May 12, 2021 11:45

ricardoV94 added 2 commits May 13, 2021 12:26

Use _logp and _logcdf dispatcher in ZeroInflated* methods

a616e0a

Fix check_logcdf test regression

bc89287

ricardoV94 requested a review from brandonwillard May 13, 2021 10:31

brandonwillard approved these changes May 13, 2021

View reviewed changes

brandonwillard merged commit faed5f1 into pymc-devs:v4 May 13, 2021

brandonwillard mentioned this pull request May 14, 2021

Revert breaking size changes #4693

Merged

ricardoV94 deleted the refactor_discrete_dists branch September 23, 2021 08:44

Uh oh!

Refactor missing discrete dists #4684

Refactor missing discrete dists #4684

Uh oh!

Conversation

ricardoV94 commented May 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 May 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

twiecki May 11, 2021

Choose a reason for hiding this comment

Uh oh!

ricardoV94 May 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 May 12, 2021

Choose a reason for hiding this comment

Uh oh!

twiecki left a comment

Choose a reason for hiding this comment

Uh oh!

twiecki commented May 11, 2021

Uh oh!

ricardoV94 commented May 11, 2021

Uh oh!

brandonwillard left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ricardoV94 commented May 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonwillard commented May 12, 2021

Uh oh!

ricardoV94 commented May 12, 2021

Uh oh!

brandonwillard commented May 12, 2021

Uh oh!

twiecki commented May 14, 2021

Uh oh!

Uh oh!

ricardoV94 commented May 11, 2021 •

edited

Loading

ricardoV94 May 11, 2021 •

edited

Loading

ricardoV94 May 11, 2021 •

edited

Loading

brandonwillard left a comment •

edited

Loading

ricardoV94 commented May 12, 2021 •

edited

Loading