Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dist_math.incomplete_beta is very slow #4420

Closed
ricardoV94 opened this issue Jan 16, 2021 · 5 comments · Fixed by #4857
Closed

dist_math.incomplete_beta is very slow #4420

ricardoV94 opened this issue Jan 16, 2021 · 5 comments · Fixed by #4857

Comments

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 16, 2021

I notice this when running test_distributions.py. Any distribution logcdf method that uses incomplete_beta takes ages to run.

Right now it is used in the logcdf methods of Beta, StudentT, Binomial, NegativeBinomial, and ZeroInflated versions of the last two distributions.

Is there a reason why this cannot be implemented as a Theano C Op? Scipy has C implementation here, which seems to be doing exactly the same as our algorithm in dist_math.py:
https://github.com/scipy/scipy/blob/master/scipy/special/cephes/incbet.c

In addition, this would probably make it trivial to implement a vectorized version for tensors, allowing the logcdf methods of the distributions above to evaluate more than one value at a time.

@ricardoV94
Copy link
Member Author

ricardoV94 commented Feb 1, 2021

Looking at this #4454 CI run: https://github.com/pymc-devs/pymc3/pull/4454/checks?check_run_id=1807188616

The top 5 slowest distribution tests include all of the ones mentioned above (with the notable exception of the StudentT) and together took nearly 1 hour to complete.

@ricardoV94
Copy link
Member Author

The StutentT is not in that list because the number of values tested by the logcdf was already reduced to 10 before:

https://github.com/pymc-devs/pymc3/blob/89916ad724ae88ebb4c1af3d65919b87b318f180/pymc3/tests/test_distributions.py#L1131-L1137

@junpenglao
Copy link
Member

junpenglao commented Feb 3, 2021

Is the gradient available? Looks like it might be much faster create a theano Op to call scipy directly similar to
https://github.com/pymc-devs/pymc3/blob/3cd9eb3004956f3624f187ff428b8e33dc4aa729/pymc3/distributions/dist_math.py#L369-L382

@ricardoV94
Copy link
Member Author

I am not sure about the gradient, I will do some research. If anyone knows please chime in :)

@ricardoV94
Copy link
Member Author

ricardoV94 commented Feb 3, 2021

I found some formulae for the partial derivatives here: https://functions.wolfram.com/GammaBetaErf/Beta3/20/01/

The partial derivative with respect to x is trivial (given the beta is defined as an integral up to x)

The partial derivatives with respect to a, and b are more tricky. They require the Hypergeometric function 3F2, which is not given by scipy AFAIK. It seems to exist in mpmath though: https://mpmath.org/doc/current/functions/hypergeometric.html#hyp3f2.

ricardoV94 added a commit to ricardoV94/pymc that referenced this issue Jun 4, 2021
ricardoV94 added a commit to ricardoV94/pymc that referenced this issue Jun 4, 2021
ricardoV94 added a commit to ricardoV94/pymc that referenced this issue Jun 10, 2021
ricardoV94 added a commit to ricardoV94/pymc that referenced this issue Jul 12, 2021
ricardoV94 added a commit to ricardoV94/pymc that referenced this issue Jul 12, 2021
ricardoV94 added a commit to ricardoV94/pymc that referenced this issue Jul 12, 2021
ricardoV94 added a commit to ricardoV94/pymc that referenced this issue Jul 13, 2021
ricardoV94 added a commit to ricardoV94/pymc that referenced this issue Jul 13, 2021
ricardoV94 added a commit to ricardoV94/pymc that referenced this issue Jul 13, 2021
ricardoV94 added a commit to ricardoV94/pymc that referenced this issue Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment