-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add Dirichlet-multinomial distribution. #3639
Conversation
How can this be tested? |
Need test +1. I think we can follow the test for BetaBinomial here. |
We should probably fix this, but you'll also have to add a line in |
pymc3/distributions/multivariate.py
Outdated
|
||
Parameters | ||
---------- | ||
alpha : one- or two-dimensional array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, sphinx won't format these correctly with the :
set apart by spaces. I think you need to change these to, for example alpha: one- or two-dimensional array
or the online docs will come out wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm matching the style found in all of the other docstrings for distribution classes. Is this wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree all the other docs in this file are doing this. I think it is ok if you leave it, and we can file an issue to fix these all in one go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(@rpgoldman if that's ok with you)
Why do we have the assignment to |
@rpgoldman -- that's exactly what we should do, but it is not in the scope of this PR, I think. |
Should I rebase this branch with master as I develop the tests, and, if so, how do I then update the PR? |
You don't have to, but it reduces the chance of merge conflicts. Yeah, if you rebase off master and push to your branch, it will update the PR, and run the test suite on CI. Please do ping when it is ready for another review! |
As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.
59f03e0
to
e932ae6
Compare
Codecov Report
@@ Coverage Diff @@
## master #3639 +/- ##
==========================================
- Coverage 89.93% 89.84% -0.09%
==========================================
Files 134 134
Lines 20429 20458 +29
==========================================
+ Hits 18373 18381 +8
- Misses 2056 2077 +21
|
As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.
e932ae6
to
46ceefb
Compare
Just pushed some (relatively messy) commits that include at least basic tests. (@ColCarroll and others) Right now I can see a few reasons this PR is still a WIP, but I'd welcome comments.
Nonetheless, this implementation seems to do what it says on the label. I'd welcome feedback, but may not have time in the next month or two to implement the fancy, shape-handling logic that I see for e.g. the Multinomial distribution. Happy to squash and clean up some commits if that's wanted. |
This looks pretty good to merge, what do you think @ColCarroll? |
This would still be a nice addition IMO! What should we do to push it over the finish line? |
As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.
Closing in favor of #4373. |
As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.
As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.
* Add implementation of DM distribution. * Fix class name mistake. * Add DM dist to exported multivariate distributions. * Export DirichletMultinomial in pymc3.distributions As suggested in #3639 (comment) Also see: #3639 (comment) but this seems to be part of a broader discussion. * Attempt at matching Multinomial initialization. * Add some simple tests for DM. * Correctly deal with 1d n and 2d alpha. * Fix typo in DM random. * Fix faulty tests for DM. * Drop redundant initialization test for DM. * Add test that DM is normalized for n=1 case. * Add DM test case based on BetaBinomial. * Update pymc3/distributions/multivariate.py * - Infer shape by default (copied code from Dirichlet Distribution) - Add default shape in `test_distributions_random.py` * - Use size information in random method - Change random unittests * - Restore merge accidental deletions * - Underscore missing * - More merge cleaning * Bring DirichletMultinomial initialization into alignment with Multinomial. * Align all DM tests with Multinomial. * Align DirichletMultinomial random implementation with Multinomial. * Match DM random method to Multinomial implementation. * Change alpha -> a Remove _repr_latex_ * Run pre-commit * Keep standard order of methods random and logp * Update docstrings for valid input types. Progress on batch test. * Add new test to ensure DM matches BetaBinom * Change DM alpha -> a in docstrings. * Test two additional parameterization shapes in `test_dirichlet_multinomial_random`. * Revert debugging comments. * Revert unrelated changes. * Fix minor Black inconsistency. * Drop no-longer-functional reshaping code. * Assert shape of random samples is as expected. * Explicitly test random sample shapes, including batch dimensions. * Sort imports. * Simplify _random It should be okay to not explicitly change the input dtype as in the multinomial, because the input to the np.random.dirichlet should be safe (it's fine to have float32 to float64 overflow from 1.00 to 1.01..., underflow from 0.01, to 0.0 would still be problematic, but we don't know if this is an issue yet...). The output of the numpy.random.dirichlet to numpy.random.multinomial should be safe since it is already in float64 by then. We still need to convert to the previous dtype, since numpy changes it by default. size_ argument was no longer being used. * Reorder tests more logically * Refactor tests Merged mode tests since shape must be given explicitly anyway Moved test_dirichlet_multinomial_random to test_distributions_random.py and renamed it to test_dirichlet_multinomial_shapes * Require shape argument Also allow more forgiveness if user passes lists instead of arrays (WIP/suggestion only) * Remove unused import `to_tuple` * Simplify logic to handle list as input for `a` * Raise ShapeError in random() * Finish batch and repr unittests * Add note about mode * Tiny rewording * Change mode to _defaultval * Revert comment for Multinomial mode * Update shape check logic * Add DM to release notes. * Minor docstring revisions as suggested by @AlexAndorra. * Revise the revision. * Add comment clarifying bounds checking in logp() * Address review suggestions * Update `matches_beta_binomial` to take into consideration float precision * Add DM to multivariate distributions docs. Co-authored-by: Byron Smith <me@byronjsmith.com> Co-authored-by: Colin <ColCarroll@users.noreply.github.com>
Dirichlet multinomial distribution.
The self.random implementation is non-standard, not well tested, and may be broken, and the documentation is currently lacking. But the log-likelihood has been working for me for a few years without obvious issues.