Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase unittest check_logcdf coverage and fix issues with some distribution methods #4393

Merged
merged 10 commits into from
Jan 2, 2021

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Dec 29, 2020

Changes:
This PR proposes two changes to check_logcdf to:

  1. Test that values below or above finite domain edges are properly evaluated to -inf and 0.
  2. Test that multiple values can be computed in one call, or if not, that a standard informative TypeError is returned.

This extended test revealed some issues with current implementations of the logcdf method, which are also fixed by this PR:

  1. Uniform: Would incorrectly evaluate values above upper to -inf
  2. HalfNormal: Would return nan for negative values. It was also unclear to me what the previous logcdf logic was doing, as it seems to me that tt.lt(z, -1.0) could never evaluate to True given valid positive values and sigma. As such, I simplified the code as well as correcting the invalid logcdf evaluation for negative values. Am I missing something? (pinging @domenzain)
  3. Gamma, and InverseGamma were currently failing with invalid parameters due to InverseGamma logcdf method fails with invalid parameters when array is used #4340 and Return NaN in C implementations of SciPy Ops aesara-devs/aesara#224. I added a dirty hack to hide the issue for now. It can be simplified once those issues are solved.
  4. Beta and StudentT logcdf now raises informative TypeError when asked to evaluate multiple values (Beta logcdf method fails with array of values #4342)

As suggested by @AlexAndorra in #4387 (comment), the docstrings of the logcdf methods were updated to better reflect when multiple values can or cannot be evaluated.

It was necessary to change the test domains of the Flat and HalfFlat distributions to their actual domains.

This test is also relevant for the current PR #4387, were these same changes were helpful in debugging issues with the implementations of the logcdf methods. I thought it was best to have this as a separate PR as that one is quite large already.

Finally, I think this does meaningfully increase the overhead of the unit tests since, at most, only three extra logp evaluations are being done for each distribution (one for each finite edge of a domain plus a (2,) array for multiple values).

@codecov
Copy link

codecov bot commented Dec 29, 2020

Codecov Report

Merging #4393 (4929b64) into master (e2ce815) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4393      +/-   ##
==========================================
+ Coverage   88.09%   88.13%   +0.03%     
==========================================
  Files          88       88              
  Lines       14538    14550      +12     
==========================================
+ Hits        12807    12823      +16     
+ Misses       1731     1727       -4     
Impacted Files Coverage Δ
pymc3/distributions/discrete.py 96.00% <ø> (+1.00%) ⬆️
pymc3/distributions/continuous.py 94.45% <100.00%> (+0.07%) ⬆️

@ricardoV94
Copy link
Member Author

Failing test is irrelevant (flaky random MvNormal issues)

@ricardoV94 ricardoV94 force-pushed the increase_logcdf_coverage branch from 1223c4e to 02a8619 Compare December 29, 2020 19:03
Copy link
Contributor

@AlexAndorra AlexAndorra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good, thanks @ricardoV94 ! I'll leave this open 24 hours before merging, in case someone wants to comment 😉

Copy link
Contributor

@domenzain domenzain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ricardoV94,

I've left a couple of comments in a code review for your changes.
The extra hoops for the logic in the original implementation of the logcdf for the HalfNormal distribution were actually about the erfc function for input outside reasonable parameters for the distribution itself.

Best,

pymc3/distributions/continuous.py Show resolved Hide resolved
pymc3/distributions/continuous.py Outdated Show resolved Hide resolved
pymc3/distributions/continuous.py Show resolved Hide resolved
pymc3/distributions/continuous.py Outdated Show resolved Hide resolved
pymc3/distributions/continuous.py Show resolved Hide resolved
@ricardoV94
Copy link
Member Author

ricardoV94 commented Dec 29, 2020

I've left a couple of comments in a code review for your changes.

Thank you for your review.

I disagree with the ValueError suggestion. This is an issue with the current implementation of the C implementation of the gamma functions in Theano.

My solution does not give an unexpected behavior. It gives the same results one gets when inputting invalid parameters/ values in the logcdf method of other distributions (i.e. -inf), which is also the behavior of Scipy. It exists simply to avoid the debilitating C-assertion that exits the Python process altogether when called with invalid parameters. However, if you see a more clean way to intercept/avoid that exception, that could be an alternative as well :)

I do agree with having a more useful comment describing the nature of the issue in the code. I will add it tomorrow!

@ricardoV94 ricardoV94 marked this pull request as draft December 31, 2020 11:40
…roperly evaluated.

Fix issues with `Uniform`, `HalfNormal`, `Gamma`, and `InverseGamma` distributions.
Add more informative comment for Gamma and InverseGamma hack.
Update Release note.
@ricardoV94 ricardoV94 force-pushed the increase_logcdf_coverage branch from 02a8619 to 07f09ad Compare January 2, 2021 11:23
@ricardoV94 ricardoV94 force-pushed the increase_logcdf_coverage branch from 7308533 to d0316b5 Compare January 2, 2021 11:45
@ricardoV94
Copy link
Member Author

Last couple of revisions are complete. I updated the first message to highlight everything that this PR does in one place, but here are the new changes:

  1. Add new test in check_logcdf to check whether method works with multiple values or, if not, whether it returns an informative standard TypeError.
  2. Changed Beta and StudentT acconding to 1.
  3. Updated docstrings as suggested by @AlexAndorra in Implement logcdf method for discrete distributions #4387 (comment)
  4. Updated comments as suggested by @domenzain

@ricardoV94 ricardoV94 marked this pull request as ready for review January 2, 2021 11:54
@ricardoV94 ricardoV94 marked this pull request as draft January 2, 2021 16:32
Move new checks to `check_logcdf`.
@ricardoV94 ricardoV94 force-pushed the increase_logcdf_coverage branch from bd28eb9 to 8c51062 Compare January 2, 2021 17:07
@ricardoV94 ricardoV94 marked this pull request as ready for review January 2, 2021 19:28
@ricardoV94
Copy link
Member Author

I had to do 2 small changes:

  1. Skip upper bound check for Distributions with Nat domains (they don't have inf as the upper bound, but should be ignored anyway)
  2. Tweak the logcdf of the discrete distribution.

It is ready for review :)

Copy link
Contributor

@AlexAndorra AlexAndorra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good now, thanks @ricardoV94 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants