Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add Dirichlet-multinomial distribution. #3639

Closed
wants to merge 14 commits into from

Conversation

bsmith89
Copy link
Contributor

@bsmith89 bsmith89 commented Oct 1, 2019

Dirichlet multinomial distribution.

The self.random implementation is non-standard, not well tested, and may be broken, and the documentation is currently lacking. But the log-likelihood has been working for me for a few years without obvious issues.

@twiecki
Copy link
Member

twiecki commented Oct 2, 2019

How can this be tested?

@twiecki twiecki closed this Oct 2, 2019
@twiecki twiecki reopened this Oct 2, 2019
@twiecki twiecki added the WIP label Oct 2, 2019
@junpenglao
Copy link
Member

Need test +1. I think we can follow the test for BetaBinomial here.

@ColCarroll
Copy link
Member

We should probably fix this, but you'll also have to add a line in pymc3/distributions/__init__.py to import this...


Parameters
----------
alpha : one- or two-dimensional array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, sphinx won't format these correctly with the : set apart by spaces. I think you need to change these to, for example alpha: one- or two-dimensional array or the online docs will come out wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm matching the style found in all of the other docstrings for distribution classes. Is this wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree all the other docs in this file are doing this. I think it is ok if you leave it, and we can file an issue to fix these all in one go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@rpgoldman if that's ok with you)

@rpgoldman
Copy link
Contributor

We should probably fix this, but you'll also have to add a line in pymc3/distributions/__init__.py to import this...

Why do we have the assignment to __add__ in this file, if we aren't going to simply do from .multinomial import * in distributions/__init__.py?

@ColCarroll
Copy link
Member

@rpgoldman -- that's exactly what we should do, but it is not in the scope of this PR, I think.

@bsmith89
Copy link
Contributor Author

bsmith89 commented Dec 7, 2019

Should I rebase this branch with master as I develop the tests, and, if so, how do I then update the PR?

@ColCarroll
Copy link
Member

You don't have to, but it reduces the chance of merge conflicts.

Yeah, if you rebase off master and push to your branch, it will update the PR, and run the test suite on CI.

Please do ping when it is ready for another review!

bsmith89 added a commit to bsmith89/pymc3 that referenced this pull request Dec 7, 2019
As suggested in
pymc-devs#3639 (comment)

Also see:
pymc-devs#3639 (comment)
but this seems to be part of a broader discussion.
@bsmith89 bsmith89 force-pushed the dirichlet-multinomial branch from 59f03e0 to e932ae6 Compare December 7, 2019 03:14
@codecov
Copy link

codecov bot commented Dec 7, 2019

Codecov Report

Merging #3639 into master will decrease coverage by 0.08%.
The diff coverage is 21.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3639      +/-   ##
==========================================
- Coverage   89.93%   89.84%   -0.09%     
==========================================
  Files         134      134              
  Lines       20429    20458      +29     
==========================================
+ Hits        18373    18381       +8     
- Misses       2056     2077      +21
Impacted Files Coverage Δ
pymc3/distributions/__init__.py 100% <100%> (ø) ⬆️
pymc3/distributions/multivariate.py 76.78% <18.51%> (-2.16%) ⬇️
pymc3/tuning/starting.py 77.86% <0%> (-1.3%) ⬇️
pymc3/variational/inference.py 79.54% <0%> (-0.11%) ⬇️
pymc3/tests/test_step.py 100% <0%> (ø) ⬆️
pymc3/tests/test_tuning.py 100% <0%> (ø) ⬆️
pymc3/smc/smc.py 92.6% <0%> (ø) ⬆️
pymc3/sampling.py 83.09% <0%> (+0.08%) ⬆️
pymc3/parallel_sampling.py 86.71% <0%> (+0.09%) ⬆️
pymc3/step_methods/arraystep.py 94.32% <0%> (+0.7%) ⬆️
... and 1 more

@bsmith89 bsmith89 force-pushed the dirichlet-multinomial branch from e932ae6 to 46ceefb Compare December 16, 2019 23:20
@bsmith89
Copy link
Contributor Author

Just pushed some (relatively messy) commits that include at least basic tests.

(@ColCarroll and others)

Right now I can see a few reasons this PR is still a WIP, but I'd welcome comments.

  1. This implementation of the DM distribution only handles a 1d vector for n and a 2d vector for alpha. Some users may want to be able to specify a scalar n or higher numbers of dimensions. I'm having trouble figuring out best to make it polymorphic over possible dimensions of these parameters.
  2. This implementation seems to require that the shape passed explicitly to __init__, even if it could be inferred from n.shape and alpha.shape. It's not clear to me what best practices are for passing the inferred shape to super().__init__.
  3. Tests are not as extensive as for other distributions.

Nonetheless, this implementation seems to do what it says on the label. I'd welcome feedback, but may not have time in the next month or two to implement the fancy, shape-handling logic that I see for e.g. the Multinomial distribution.

Happy to squash and clean up some commits if that's wanted.

@twiecki
Copy link
Member

twiecki commented Jul 27, 2020

This looks pretty good to merge, what do you think @ColCarroll?

@AlexAndorra
Copy link
Contributor

This would still be a nice addition IMO! What should we do to push it over the finish line?
Just rebase on master and let CI run, or are there still some important features to add?

ricardoV94 pushed a commit to ricardoV94/pymc that referenced this pull request Dec 22, 2020
As suggested in
pymc-devs#3639 (comment)

Also see:
pymc-devs#3639 (comment)
but this seems to be part of a broader discussion.
@ricardoV94 ricardoV94 mentioned this pull request Dec 22, 2020
15 tasks
@twiecki
Copy link
Member

twiecki commented Dec 23, 2020

Closing in favor of #4373.

@twiecki twiecki closed this Dec 23, 2020
bsmith89 added a commit to bsmith89/pymc3 that referenced this pull request Dec 29, 2020
As suggested in
pymc-devs#3639 (comment)

Also see:
pymc-devs#3639 (comment)
but this seems to be part of a broader discussion.
ricardoV94 pushed a commit to ricardoV94/pymc that referenced this pull request Jan 4, 2021
As suggested in
pymc-devs#3639 (comment)

Also see:
pymc-devs#3639 (comment)
but this seems to be part of a broader discussion.
AlexAndorra pushed a commit that referenced this pull request Jan 16, 2021
* Add implementation of DM distribution.

* Fix class name mistake.

* Add DM dist to exported multivariate distributions.

* Export DirichletMultinomial in pymc3.distributions

As suggested in
#3639 (comment)

Also see:
#3639 (comment)
but this seems to be part of a broader discussion.

* Attempt at matching Multinomial initialization.

* Add some simple tests for DM.

* Correctly deal with 1d n and 2d alpha.

* Fix typo in DM random.

* Fix faulty tests for DM.

* Drop redundant initialization test for DM.

* Add test that DM is normalized for n=1 case.

* Add DM test case based on BetaBinomial.

* Update pymc3/distributions/multivariate.py

* - Infer shape by default (copied code from Dirichlet Distribution)
- Add default shape in `test_distributions_random.py`

* - Use size information in random method
- Change random unittests

* - Restore merge accidental deletions

* - Underscore missing

* - More merge cleaning

* Bring DirichletMultinomial initialization into alignment with Multinomial.

* Align all DM tests with Multinomial.

* Align DirichletMultinomial random implementation with Multinomial.

* Match DM random method to Multinomial implementation.

* Change alpha -> a
Remove _repr_latex_

* Run pre-commit

* Keep standard order of methods random and logp

* Update docstrings for valid input types.
Progress on batch test.

* Add new test to ensure DM matches BetaBinom

* Change DM alpha -> a in docstrings.

* Test two additional parameterization shapes in `test_dirichlet_multinomial_random`.

* Revert debugging comments.

* Revert unrelated changes.

* Fix minor Black inconsistency.

* Drop no-longer-functional reshaping code.

* Assert shape of random samples is as expected.

* Explicitly test random sample shapes, including batch dimensions.

* Sort imports.

* Simplify _random

It should be okay to not explicitly change the input dtype as in the multinomial, because the input to the np.random.dirichlet should be safe (it's fine to have float32 to float64 overflow from 1.00 to 1.01..., underflow from 0.01, to 0.0 would still be problematic, but we don't know if this is an issue yet...). The output of the numpy.random.dirichlet to numpy.random.multinomial should be safe since it is already in float64 by then. We still need to convert to the previous dtype, since numpy changes it by default.

size_ argument was no longer being used.

* Reorder tests more logically

* Refactor tests

Merged mode tests since shape must be given explicitly anyway
Moved test_dirichlet_multinomial_random to test_distributions_random.py and renamed it to test_dirichlet_multinomial_shapes

* Require shape argument

Also allow more forgiveness if user passes lists instead of arrays (WIP/suggestion only)

* Remove unused import `to_tuple`

* Simplify logic to handle list as input for `a`

* Raise ShapeError in random()

* Finish batch and repr unittests

* Add note about mode

* Tiny rewording

* Change mode to _defaultval

* Revert comment for Multinomial mode

* Update shape check logic

* Add DM to release notes.

* Minor docstring revisions as suggested by @AlexAndorra.

* Revise the revision.

* Add comment clarifying bounds checking in logp()

* Address review suggestions

* Update `matches_beta_binomial` to take into consideration float precision

* Add DM to multivariate distributions docs.

Co-authored-by: Byron Smith <me@byronjsmith.com>
Co-authored-by: Colin <ColCarroll@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants