-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use julia implementations for pdfs and some cdf-like functions #113
Conversation
Codecov Report
@@ Coverage Diff @@
## master #113 +/- ##
==========================================
+ Coverage 39.30% 48.64% +9.34%
==========================================
Files 13 13
Lines 430 516 +86
==========================================
+ Hits 169 251 +82
- Misses 261 265 +4
Continue to review full report at Codecov.
|
I've added methods for Poisson via the definitions for gamma so this now supersedes #112. It requires JuliaRegistries/General#37083 for the tests to pass so I'll restart once that PR is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments but I'm too tired for a thorough review now. I'll check it more thoroughly tomorrow.
@@ -3,6 +3,7 @@ uuid = "4c63d2b9-4356-54db-8cca-17b64c39e42c" | |||
version = "0.9.8" | |||
|
|||
[deps] | |||
HypergeometricFunctions = "34004b35-14d8-5ef3-9330-4cdb6864b03a" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires a [compat]
entry.
This also pulls in some additional dependencies since HypergeometricFunctions depends on DualNumbers, it seems? I am just wondering if people are fine with this since my ChainRules PR (#106) was not received positively mainly due to the additional dependency on ChainRulesCore.
On a side note, HypergeometricFunctions would be needed in SpecialFunctions to resolve JuliaMath/SpecialFunctions.jl#317 and JuliaMath/SpecialFunctions.jl#160. There was some discussion about depending on HypergeometricFunctions in JuliaMath/SpecialFunctions.jl#175, but I assume for the very same reasons not everyone would welcome such a dependency in SpecialFunctions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a compat entry (or CompatHelper will do it).
I can try to measure the effect on load time which I assume is the main concern here. I doubt that it will be much and more generally, I think that HypergeomericFunctions is a natural dependency here. Many CDFs have hypergeometric representations so I think this is a bit different from the AD discussion. I don't think it would make sense to have the CDFs defined in a separate package. (I think I'm in favor of #106 btw)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that probably it is more central to the package than AD (even though, of course, I'm in favor of the ChainRules PR 😉 ). I am fine with the dependency but I wanted to raise it due to the reactions in the other PR.
src/distrs/beta.jl
Outdated
# The log version is currently based on non-log version. When the cdf is very small we shift | ||
# to a implementation based on the hypergeometric function ₂F₁ to avoid underflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reference for these cutoffs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not. My thinking is that using the non-log version is fine as long as it doesn't underflow. I realize that I'm using floatmin()
instead of eps()
in a similar situation for one of the other distributions.
Let me butt in here to comment. Andreas has functions (pdf cdf ccdf logpdf logccdf invcdf invccdf invlogcdf invlogccdf) for Beta, Binomial, Chi Squared, F dist, Gamma, Normal, Poisson, and Student's T distribution. A long ago PR in Distributions.jl has functions for Beta, Binomial, Cauchy, Chi Squared, F dist, Gamma, Geometric, HyperGeometric, logistic, lognormal, negative binomial, normal, poisson, student's t, uniform, weibull, and rayleigh distributions. These pull requests always seem to stall. Review comments from devmotion tend to be along the lines of 1) small code optimizations, 2) questions on the "source" to certain algorithm constants. Multiply that by 8 for the all the distributions that Andreas is promoting. It results in a lot of little issues to fix/improve/modify. Review comments are valuable and it is leads to improved code, but one must manage the feedback process. Perhaps it would be better to do a separate pull request for each distribution. It is more manageable to fix/modify one function at a time, rather than a set of 8. Learnings from fixing/modifying the first distribution would carry on to the second and subsequent distributions, so hopefully the process would be smoother. It is valuable to reference the code back to research papers; however, lacking that, we could depend on the test results matching the existing R functions. Also, one might consider a parallel release strategy. Keep the R functions for now. Release a pure Julia version in parallel -- this is the "Beta test" -- let the community use it, study it, and identify problems (such as corner cases). After the Beta period, swap them for the R functions. A long term goal has been to eliminate the dependence on the R functions. Don't let the perfect be the enemy of the good. These functions need to make it over the finish line. |
I think it's fine. @devmotion comments here are very valid and should be addressed. The main complication is that some of the changes require adjustments in SpecialFunctions which makes it slightly more work but it's not that much. I had to look into some other things for a while but I'll get this completed. |
Tried to fix support for betalogpdf, cdf can use a similar fix I think |
Chain rules for partial derivatives of incomplete beta. Code is by @arzwa and taken from https://github.com/arzwa/IncBetaDer/blob/main/src/beta_inc_grad.jl Related to effort to provide julia implementations of pdf/cdf like functions (JuliaStats/StatsFuns.jl#113) and JuliaStats/Distributions.jl#1334
Hello, PS: I have little experience in writing numerical software but am fluent in julia; given Rmath/similar implementations, I would be happy to port it to native julia :). |
Yay! All tests pass now and I believe I have addressed the review comments so I think this one is ready. It only took me about eight months. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the fixes in SpecialFunctions are necessary for the tests to pass, I guess it would be reasonable to update the SpecialFunctions compat entry (maybe the fixes should be backported to SpecialFunctions 1 if we want to keep support for SpecialFunctions 1.X)?
I theory we should probably backport the SpecialFunctions PRs since I consider all of them bug fixes. |
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
@devmotion The fixes have been backported in JuliaRegistries/General#54891 so are we good to go here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Great PR 🎉
* Use Julia implementations for (log)pdfs. Adjust Rmath tests to use testsets to give more informative errors. * Use Julia implementations for most cdf-like functions of the Chisq distribution. * Change Beta distribution to be using native implementations. * Switch binomial cdfs to use native implementations (but not the inverses) * Use native implementations for most cdf like gamma functions * Just use the gamma defitions for the chisq * Use the beta definitions for binomial * Add native implementions for Poisson's cdf-like functions * Rely on SpecialFunctions promotion handling in beta functions * Rely on gamma_inc_inv promotion in gammainv(c)cdf * Apply suggestions from code review * Add compat for HypergeometricFunctions * Handle negative arguments in Poisson (log)(c)cdf * Handle negative x in Gamma (log)(c)cdf * Handle arguments outside support for Beta and Fdist (log)(c)cdf * Fix Fdist quantile functions * Handle arguments out of support in Binomial (log)(c)cdf and apply the floor function to make the results correct for non-integer arguments * Avoid accidental promotion to Float64 in fdist(log)(c)cdf * Fix typo in "x" calculation for fdist(log)(c)cdf * Drop redundant low tolerance tests for Beta(1000, 2). We are now generally more accurate than Rmath so failure are because of Rmath. * Handle some non-finite edge cases in gamma and binomial * Only switch to log-version of betalogcdf when the result is tiny * Handle the degenerate cases in beta(log)(c)cdf when alpha is zero * Handle the degenerate case of the gamma distribution when k==0. Also, only calculate the log(c)cdf based on the hypergeometric function when p is zero or subnormal * Address review comments * Avoid rational return type in gamma(log)(c)cdf * Handle the case when alpha and beta are zero in the betacdf
* Use julia implementations for pdfs and some cdf-like functions (#113) * Use Julia implementations for (log)pdfs. Adjust Rmath tests to use testsets to give more informative errors. * Use Julia implementations for most cdf-like functions of the Chisq distribution. * Change Beta distribution to be using native implementations. * Switch binomial cdfs to use native implementations (but not the inverses) * Use native implementations for most cdf like gamma functions * Just use the gamma defitions for the chisq * Use the beta definitions for binomial * Add native implementions for Poisson's cdf-like functions * Rely on SpecialFunctions promotion handling in beta functions * Rely on gamma_inc_inv promotion in gammainv(c)cdf * Apply suggestions from code review * Add compat for HypergeometricFunctions * Handle negative arguments in Poisson (log)(c)cdf * Handle negative x in Gamma (log)(c)cdf * Handle arguments outside support for Beta and Fdist (log)(c)cdf * Fix Fdist quantile functions * Handle arguments out of support in Binomial (log)(c)cdf and apply the floor function to make the results correct for non-integer arguments * Avoid accidental promotion to Float64 in fdist(log)(c)cdf * Fix typo in "x" calculation for fdist(log)(c)cdf * Drop redundant low tolerance tests for Beta(1000, 2). We are now generally more accurate than Rmath so failure are because of Rmath. * Handle some non-finite edge cases in gamma and binomial * Only switch to log-version of betalogcdf when the result is tiny * Handle the degenerate cases in beta(log)(c)cdf when alpha is zero * Handle the degenerate case of the gamma distribution when k==0. Also, only calculate the log(c)cdf based on the hypergeometric function when p is zero or subnormal * Address review comments * Avoid rational return type in gamma(log)(c)cdf * Handle the case when alpha and beta are zero in the betacdf * Fix compat entries * Remove precompile statement * Update version number Co-authored-by: Andreas Noack <andreas@noack.dk>
I think we should begin to use Julia implementations when they are available. There is is really no need to call out to Rmath when we have implementations available. Eventually, we should then also be able to drop Rmath as a package dependency and just use it for testing although there is still some work to do. However, SpecialFunctions now has some of the more tricky functions available so we can get quite far. This has a conflict with #112 so it would be good to get than one merged first and I can update this PR afterwards.