-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up median(Binomial) #1926
Comments
Good point. It should take no more than one cdf evaluation, right? And by the theorem you quote, sometimes no evaluations. function median(rv::Binomial)
bound = min(rv.p, 1-rv.p)
rv_mean = mean(rv)
floor_mean = floor(Int, rv_mean)
ceil_mean = ceil(Int, rv_mean)
if abs(floor_mean - rv_mean) <= bound
floor_mean
elseif abs(ceil_mean - rv_mean) <= bound
ceil_mean
elseif cdf(rv, floor_mean) > 0.5
floor_mean
else
ceil_mean
end
end
|
Follow suggestion from JuliaStats#1926 to reduce `cdf(Binomial)` evaluations.
I thought it would be most important to fix the bug, and that we should improve performance separately 🙂 I had actually seen this result on Wikipedia, together with a few other properties of the median: https://en.m.wikipedia.org/wiki/Binomial_distribution#Median But I wasn't sure if that list is conclusive or whether there are additional properties that we could exploit, so I assumed it might take a bit more time to discuss and release such performance improvements. |
* Speed up median(Binomial) Follow suggestion from #1926 to reduce `cdf(Binomial)` evaluations. * Simplify bound condition Co-authored-by: Andreas Noack <andreas@noack.dk> * Fix name of argument variable Co-authored-by: David Widmann <devmotion@users.noreply.github.com> * Simplify b Co-authored-by: Andreas Noack <andreas@noack.dk> * Fix cummulative probability check Co-authored-by: David Widmann <devmotion@users.noreply.github.com> * Incorporate suggestions from PR review * Further simplifications following suggestions * Add more tests * Fixed checks for median bounds * Simplified tests a bit * Update src/univariate/discrete/binomial.jl Co-authored-by: David Widmann <devmotion@users.noreply.github.com> * Update src/univariate/discrete/binomial.jl Co-authored-by: David Widmann <devmotion@users.noreply.github.com> * Update src/univariate/discrete/binomial.jl Co-authored-by: David Widmann <devmotion@users.noreply.github.com> * Avoid test memory allocation with an iterator Co-authored-by: David Widmann <devmotion@users.noreply.github.com> --------- Co-authored-by: Andreas Noack <andreas@noack.dk> Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
While it's good that the result is now correct, it is at a significant price. The evaluation of the cdf is quite costly and I think the
quantile
approach might end up calling it multiple times. Of courseq=0.5
might be special cased in RMath but compareIt might be possible, at least in most cases, see https://onlinelibrary.wiley.com/doi/10.1111/j.1467-9574.1980.tb00681.x, to compute the median without evaluating the cdf or even the pdf.
The old version just used
round
but, as pointed out in #1923, that wasn't always correct. The paper helps usIn #1923, the example had
X = Binomial(25, 0.3)
so the condition is clearly not satisfied sinceso we could just do a single (c)cdf evaluation and adjust the value if needed in the cases where the condition of Theorem 1 isn't satisfied. It is likely to be much faster than
quantile(..., 0.5)
The text was updated successfully, but these errors were encountered: