Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor missing discrete dists #4684

Merged
merged 12 commits into from
May 13, 2021

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented May 11, 2021

This PR refactors all missing univariate discrete distributions.

  1. I decided to refactor the Constant distribution after some debate on Slack as to whether this should be deprecated or not. Its direct use might be pretty problematic without a custom step algorithm, but could still be useful indirectly, such as in mixture distributions. I made the dtype of the RV be floatX to accept non-integer outputs (even though this distribution should be treated as a pmf, see Mixture should not allow mixing of discrete and continuous distributions #4511 for why this might matter)

  2. I created separate RandomOps for the ZeroInflated* distributions for now. Similarly to Replace custom RandomVariable for WeibullBetaRV #4683, this should only be done temporarily until we have Mixture Distributions working on V4 (it seems such approach was already attempted in the past WIP: Convert zero-inflated distributions to Mixture subclasses #2246, WIP Implement ZeroInflatedPoisson as a subclass of Mixture #1459)


Depending on what your PR does, here are a few things you might want to address in the description:

@classmethod
def rng_fn(cls, rng, c, size=None):
ones = np.ones_like(c)
return rng.randint(ones, ones + 1, size=size) * c
Copy link
Member Author

@ricardoV94 ricardoV94 May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There must be a less hacking way to get a filled array given c and size, that works as other random generators would. np.full(size, c) doesn't seem to cut it, because it works with shape and not with size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't shape and size the same thing here?

Copy link
Member Author

@ricardoV94 ricardoV94 May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No because c could be a vector and size could be None for instance

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, actually I think the only problem was size=None. I added a check for that condition...

twiecki
twiecki previously approved these changes May 11, 2021
Copy link
Member

@twiecki twiecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

@twiecki
Copy link
Member

twiecki commented May 11, 2021

Thanks for taking this on @ricardoV94. Separately, when I tried V4 it seems like other continuous distributions are also missing, like StudentT?

@ricardoV94
Copy link
Member Author

Yes they are missing.

Only distributions that have a rv_op at the top have been refactored. StudentT hasn't yet.

I won't have more time this week, so if someone wants to open a PR for the missing continuous distributions it would be great. The logic should be pretty similar to what is done in this PR with the custom RandomOps

Copy link
Contributor

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but we really shouldn't rely on Distribution.logp functions, since those aren't officially part of the Distribution API—they're just a means of conveniently specifying dispatches.

pymc3/distributions/discrete.py Outdated Show resolved Hide resolved
pymc3/distributions/discrete.py Outdated Show resolved Hide resolved
pymc3/distributions/discrete.py Outdated Show resolved Hide resolved
pymc3/distributions/discrete.py Outdated Show resolved Hide resolved
pymc3/distributions/discrete.py Outdated Show resolved Hide resolved
pymc3/distributions/discrete.py Outdated Show resolved Hide resolved
@ricardoV94 ricardoV94 force-pushed the refactor_discrete_dists branch 2 times, most recently from 6aae83e to e8ec87d Compare May 12, 2021 09:15
@ricardoV94
Copy link
Member Author

ricardoV94 commented May 12, 2021

Looks good, but we really shouldn't rely on Distribution.logp functions, since those aren't officially part of the Distribution API—they're just a means of conveniently specifying dispatches.

@brandonwillard I am not sure what you have in mind. This looks wrong, right?

# at.log(psi) + Poisson.logp(value, theta)
at.log(psi) + _logp(poisson, value, {value: value}, theta)

I also tried this, but it fails following a change_rv_size operation:

at.log(psi) + logpt(Poisson.dist(theta), value)

@brandonwillard
Copy link
Contributor

@brandonwillard I am not sure what you have in mind. This looks wrong, right?

# at.log(psi) + Poisson.logp(value, theta),
at.log(psi) + _logp(poisson, value, {value: value}, theta),

No, that looks like the correct approach.

@ricardoV94
Copy link
Member Author

Could you give some intuition why this is better than the original approach?

@brandonwillard
Copy link
Contributor

Could you give some intuition why this is better than the original approach?

As I stated in my original review comments, these Distribution.logp functions are not a part of the Distribution API and they are neither designed nor intended to be used in this way. Their availability is questionable and subject to arbitrary change, so we don't want to use them outside of their intended purpose.

More importantly, we do not want to rely on the Distribution types in general, and especially not for log-likelihoods; they're currently just a legacy transition artifact. This is clearly demonstrated by the lack of Distribution in the log-likelihood code itself.

Simply put, this approach unnecessarily adds Distributions into the log-likelihood framework.

If this sort of use case needs to be simplified, it can be done in better ways that are based only on the underlying dispatch.

@brandonwillard brandonwillard merged commit faed5f1 into pymc-devs:v4 May 13, 2021
@twiecki
Copy link
Member

twiecki commented May 14, 2021

🥳

@ricardoV94 ricardoV94 deleted the refactor_discrete_dists branch September 23, 2021 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants