Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak DirichletMultinomial logp and refactor some multivariate logp tests #5234

Merged
merged 6 commits into from
Dec 8, 2021

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Dec 2, 2021

Couple of tweaks/fixes to the logp /docstrings of Multinomial and DirichletMultinomial

Summary:

  1. Remove unnecessary constraints on n and a in DirichletMultinomial, related to discussion in Adds tests and mode for dirichlet multinomial distribution #5225. This also solves a minor issue where the logp was returning a 1D vector for the base case instead of just a scalar.
  2. Remove similar old reference about constraints on n and p in the docstrings of the Multinomial
  3. Simplify vectorized logp tests for the two distributions as well as the Dirichlet

Also:
4. Remove couple old/redundant random tests

@codecov
Copy link

codecov bot commented Dec 2, 2021

Codecov Report

Merging #5234 (0549f93) into main (a16ec4a) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5234      +/-   ##
==========================================
+ Coverage   78.95%   78.98%   +0.02%     
==========================================
  Files          88       88              
  Lines       14232    14231       -1     
==========================================
+ Hits        11237    11240       +3     
+ Misses       2995     2991       -4     
Impacted Files Coverage Δ
pymc/distributions/multivariate.py 72.52% <100.00%> (+0.61%) ⬆️
pymc/parallel_sampling.py 86.33% <0.00%> (-1.00%) ⬇️
pymc/distributions/continuous.py 96.72% <0.00%> (+0.12%) ⬆️

@ricardoV94
Copy link
Member Author

ricardoV94 commented Dec 2, 2021

I don't get why we would automatically normalize the vector p in the Multinomial, but I'm happy to revert those changes if they cause too much trouble. It hides the same problems that were raised in this old issue for the Categorical: #2082 (comment)

@ricardoV94 ricardoV94 requested a review from lucianopaz December 2, 2021 14:55
@AlexAndorra
Copy link
Contributor

Agreed, the implicit normalization is sneaky and can cause issues down the road

@ricardoV94
Copy link
Member Author

If we changed this, it would make sense to also not normalize things in the Categorical by default...

@ricardoV94 ricardoV94 force-pushed the fix_dm_mn_logps branch 2 times, most recently from aefc1a5 to 8b5820e Compare December 6, 2021 17:44
@ricardoV94 ricardoV94 changed the title Tweaks to Multinomial and DirichletMultinomial logps Tweaks DirichletMultinomial logp and refactor some multivariate logp tests Dec 6, 2021
@ricardoV94 ricardoV94 changed the title Tweaks DirichletMultinomial logp and refactor some multivariate logp tests Tweak DirichletMultinomial logp and refactor some multivariate logp tests Dec 6, 2021
@ricardoV94
Copy link
Member Author

I reverted the changes to the Multinomial normalization, I'll open a discussion on this for the Multinomial and the Categorical. In the meantime this can hopefully be merged.

Probability of each one of the different outcomes. Elements must
be non-negative and sum to 1 along the last axis. They will be
automatically rescaled otherwise.
n: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But ncan be a vector too, can't it? E.g if the number of trials vary by observation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah n can be a vector or a matrix, anything as long as dimensions broadcast properly. I put the int to indicate the base case, but perhaps the best is to remove any information of shape and only mention the meaning of the parameters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not int, vector, matrix ? That's more explicit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find that very helpful. Other multivariate distributions use an ambiguous "array" and most univariate distributions say only "int" or "float" even though they also support vectors and tensors of any shape.

a : one- or two-dimensional array
Dirichlet parameter. Elements must be strictly positive.
The number of categories is given by the length of the last axis.
n : int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark as for Multinomial

Copy link
Contributor

@AlexAndorra AlexAndorra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, those tests are much simpler now 🤩
LGTM @ricardoV94 -- I just had a question about the n parameter. Once that's settled we can merge

@ricardoV94
Copy link
Member Author

@AlexAndorra I added a commit that harmonizes the docstrings of the 3 distributions. The emphasis is on the dtype of each parameter and the special meaning of the last axis of a and p, which determines the number of categories.

I rather not make it explicitly that n can be multidimensional, and let users expect this to work as it does in other distributions. Otherwise I think it creates more confusion to have, p: int, vector, matrix, ..., since these can be of arbitrary dimensionality as long as they broadcast well with a.

Copy link
Contributor

@AlexAndorra AlexAndorra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that looks better now, thanks @ricardoV94

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants