Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HyperGeometric Distribution to pymc3.distributions.discrete #4108 #4249

Merged
merged 11 commits into from
Dec 3, 2020

Conversation

Harivallabha
Copy link
Contributor

Creating a separate PR and linking it to #4108 because I had a hard time rebasing the old commits on top of master, too many conflicts. So rather than rebase, I just pulled latest master and again added these commits on top of it.

  • Included suggestions by @tirthasheshpatel. Thank you! :D

  • Removed _repr_latex_ implementations.

@Spaak

twiecki
twiecki previously approved these changes Nov 24, 2020
@twiecki twiecki dismissed their stale review November 24, 2020 07:20

pre-commit failed

@Spaak Spaak added this to the 3.10 milestone Nov 24, 2020
@codecov
Copy link

codecov bot commented Nov 24, 2020

Codecov Report

Merging #4249 (3e9c589) into master (6ef0b2d) will increase coverage by 0.00%.
The diff coverage is 92.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4249   +/-   ##
=======================================
  Coverage   87.59%   87.60%           
=======================================
  Files          88       88           
  Lines       14316    14341   +25     
=======================================
+ Hits        12540    12563   +23     
- Misses       1776     1778    +2     
Impacted Files Coverage Δ
pymc3/distributions/discrete.py 96.78% <91.66%> (-0.43%) ⬇️
pymc3/distributions/__init__.py 100.00% <100.00%> (ø)

Copy link
Contributor

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Harivallabha

Pre-commit failures can be fixed with

pre-commit run --files pymc3/distributions/__init__.py pymc3/distributions/discrete.py # list other modified files here

, or by enabling pre-commit locally:

pre-commit install

so the checks run automatically when you commit

@@ -737,6 +742,9 @@ def ref_rand(size, alpha, mu):
def test_geometric(self):
pymc3_random_discrete(pm.Geometric, {"p": Unit}, size=500, fails=50, ref_rand=nr.geometric)

def test_hypergeometric(self):
pymc3_random_discrete(pm.HyperGeometric, {"N": Nat, "n": Nat, "k": Nat}, size=500, fails=50, ref_rand=nr.hypergeometric)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.random.hypergeometric takes arguments named differently than N, n, and k - you may need to define a helper ref_rand function here, as is done above in test_negative_binomial

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

- betaln(n - value + 1, bad - n + value + 1)
- betaln(tot + 1, 1)
)
lower = tt.switch(tt.gt(n - N + k, 0), n - N + k, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit of a theano noob, but would

Suggested change
lower = tt.switch(tt.gt(n - N + k, 0), n - N + k, 0)
lower = tt.clip(n - N + k, 0, n - N + k)

work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that diverges from the value returned by scipy.stats.hypergeom; However removing the bound entirely, and just returning the result seems to match

@Harivallabha Harivallabha force-pushed the Add-HyperGeometric-Dist-2 branch from fc8351c to 9882ff6 Compare November 25, 2020 20:50
@MarcoGorelli MarcoGorelli self-requested a review November 26, 2020 10:11
@Spaak
Copy link
Member

Spaak commented Dec 2, 2020

@Harivallabha great to see this moving forward; if I understand the code correctly we're almost there, right? Any thoughts on the failing tests?

(So far this PR is still listed to go into the 3.10 release, which I hope we'll be able to make soon. If you think it's better to wait, then let me know of course.)

@twiecki
Copy link
Member

twiecki commented Dec 2, 2020

I think we can leave this as option for 3.10 but not block on it.

@twiecki twiecki removed this from the 3.10 milestone Dec 2, 2020
@MarcoGorelli
Copy link
Contributor

Any thoughts on the failing tests?

Just for reference: the CI has been flaky recently (seems fixed now though!), but the error here seems related to the changes:

________________ TestScalarParameterSamples.test_hypergeometric ________________
[gw0] linux -- Python 3.6.11 /usr/share/miniconda/envs/testenv/bin/python

self = <pymc3.tests.test_distributions_random.TestScalarParameterSamples object at 0x7f701ddb5cf8>

    def test_hypergeometric(self):
        def ref_rand(size, N, k, n):
            return st.hypergeom.rvs(M=N, n=k, N=n, size=size)
    
        pymc3_random_discrete(
            pm.HyperGeometric,
            {"N": Nat, "k": Nat, "n": Nat},
            size=500,
            fails=100,
>           ref_rand=ref_rand,
        )

pymc3/tests/test_distributions_random.py:754: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

dist = <class 'pymc3.distributions.discrete.HyperGeometric'>
paramdomains = {'N': <pymc3.tests.test_distributions.Domain object at 0x7f708b011e10>, 'k': <pymc3.tests.test_distributions.Domain object at 0x7f708b011e10>, 'n': <pymc3.tests.test_distributions.Domain object at 0x7f708b011e10>}
valuedomain = <pymc3.tests.test_distributions.Domain object at 0x7f708acfbe10>
ref_rand = <function TestScalarParameterSamples.test_hypergeometric.<locals>.ref_rand at 0x7f701ddf3e18>
size = 500, alpha = 0.05, fails = 100

    def pymc3_random_discrete(
        dist, paramdomains, valuedomain=Domain([0]), ref_rand=None, size=100000, alpha=0.05, fails=20
    ):
        model = build_model(dist, valuedomain, paramdomains)
        domains = paramdomains.copy()
        for pt in product(domains, n_samples=100):
            pt = pm.Point(pt, model=model)
            p = alpha
            # Allow Chisq test to fail (i.e., the samples be different)
            # a certain number of times.
            f = fails
            while p <= alpha and f > 0:
                o = model.named_vars["value"].random(size=size, point=pt)
                e = ref_rand(size=size, **pt)
                o = np.atleast_1d(o).flatten()
                e = np.atleast_1d(e).flatten()
                observed = dict(zip(*np.unique(o, return_counts=True)))
                expected = dict(zip(*np.unique(e, return_counts=True)))
                for e in expected.keys():
                    expected[e] = (observed.get(e, 0), expected[e])
                k = np.array([v for v in expected.values()])
                if np.all(k[:, 0] == k[:, 1]):
                    p = 1.0
                else:
                    _, p = st.chisquare(k[:, 0], k[:, 1])
                f -= 1
>           assert p > alpha, str(pt)
E           AssertionError: {'N': array(1), 'k': array(1), 'n': array(1)}
E           assert nan > 0.05

pymc3/tests/test_distributions_random.py:111: AssertionError

@Harivallabha
Copy link
Contributor Author

Yes, please don't block on this.

So the match_with_scipy tests are passing but the random tests are still failing.

The reason for the failing random tests is that scipy.stats.hypergeom.rvs(M=N, n=k, N=n, size=size) returns a ValueError when N>M. So, for instance, when N, k, n = draw_values([self.N, self.k, self.n], point=point, size=size) returns [1, 1, 2], we have M < N (1 < 2), which results in a ValueError when the scipy rvs is called.

When I handle this, I keep getting the AssertionError (nan > 0.05) in pymc3_random_discrete that Marco has pointed out above. I'll push the latest code so that you can reproduce the error, but the random test is still failing, and I'm not sure why I'm getting the above AssertionError.

@Harivallabha Harivallabha force-pushed the Add-HyperGeometric-Dist-2 branch from 9882ff6 to 6cd35fd Compare December 3, 2020 13:05
Comment on lines 751 to 757
pymc3_random_discrete(
pm.HyperGeometric,
{"N": Nat, "k": Nat, "n": Nat},
size=500,
fails=50,
ref_rand=ref_rand,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pymc3_random_discrete(
pm.HyperGeometric,
{"N": Nat, "k": Nat, "n": Nat},
size=500,
fails=50,
ref_rand=ref_rand,
)
pymc3_random_discrete(
pm.HyperGeometric,
{"N": Domain([10, 11, 12, 13], "int64"),
"k": Domain([4, 5, 6, 7], "int64"),
"n": Domain([6, 7, 8, 9], "int64")},
size=500,
fails=50,
ref_rand=ref_rand,
)

You can create your own domain like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should avoid the error that you are facing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying domain fixed it. Wonderful, thanks!


======== =============================

Support :math:`x in [max(0, n - \mathbb{N} + k), min(k, n)]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Support :math:`x in [max(0, n - \mathbb{N} + k), min(k, n)]`
Support :math:`x \in \left[\max(0, n - N + k), \min(k, n)\right]`

Nitpick for proper latex rendering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@tirthasheshpatel tirthasheshpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very small final nitpick. Otherwise, it looks good to me!

pymc3/distributions/discrete.py Outdated Show resolved Hide resolved
Harivallabha and others added 2 commits December 3, 2020 21:24
Remove stray newline

Co-authored-by: Tirth Patel <tirthasheshpatel@gmail.com>
Copy link
Member

@Spaak Spaak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@Spaak Spaak added this to the 3.10 milestone Dec 3, 2020
@Spaak Spaak merged commit 198d13e into pymc-devs:master Dec 3, 2020
@Spaak
Copy link
Member

Spaak commented Dec 3, 2020

Thanks @Harivallabha and @tirthasheshpatel! 🎉 (Edit: and @MarcoGorelli of course! :) )

@twiecki
Copy link
Member

twiecki commented Dec 3, 2020

Great job everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants