Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up median(Binomial) #1926

Closed
andreasnoack opened this issue Dec 14, 2024 · 2 comments · Fixed by #1928
Closed

Speed up median(Binomial) #1926

andreasnoack opened this issue Dec 14, 2024 · 2 comments · Fixed by #1928

Comments

@andreasnoack
Copy link
Member

andreasnoack commented Dec 14, 2024

While it's good that the result is now correct, it is at a significant price. The evaluation of the cdf is quite costly and I think the quantile approach might end up calling it multiple times. Of course q=0.5 might be special cased in RMath but compare

# Old version
julia> @btime median(X)
  15.030 ns (0 allocations: 0 bytes)
8

julia> @btime quantile(X, 0.5)
  242.180 ns (0 allocations: 0 bytes)
7

It might be possible, at least in most cases, see https://onlinelibrary.wiley.com/doi/10.1111/j.1467-9574.1980.tb00681.x, to compute the median without evaluating the cdf or even the pdf.

The old version just used round but, as pointed out in #1923, that wasn't always correct. The paper helps us

Skærmbillede 2024-12-14 kl  16 09 24

In #1923, the example had X = Binomial(25, 0.3) so the condition is clearly not satisfied since

julia> mean(X)
7.5

so we could just do a single (c)cdf evaluation and adjust the value if needed in the cases where the condition of Theorem 1 isn't satisfied. It is likely to be much faster than quantile(..., 0.5)

@marcusps
Copy link
Contributor

Good point. It should take no more than one cdf evaluation, right? And by the theorem you quote, sometimes no evaluations.

function median(rv::Binomial)
    bound = min(rv.p, 1-rv.p)
    rv_mean = mean(rv)
    floor_mean = floor(Int, rv_mean)
    ceil_mean = ceil(Int, rv_mean)
    if abs(floor_mean - rv_mean) <= bound
        floor_mean
    elseif abs(ceil_mean - rv_mean) <= bound
        ceil_mean
    elseif cdf(rv, floor_mean) > 0.5
        floor_mean
    else
        ceil_mean
    end
end

marcusps added a commit to marcusps/Distributions.jl that referenced this issue Dec 15, 2024
Follow suggestion from JuliaStats#1926 to reduce `cdf(Binomial)` evaluations.
@devmotion
Copy link
Member

I thought it would be most important to fix the bug, and that we should improve performance separately 🙂 I had actually seen this result on Wikipedia, together with a few other properties of the median: https://en.m.wikipedia.org/wiki/Binomial_distribution#Median But I wasn't sure if that list is conclusive or whether there are additional properties that we could exploit, so I assumed it might take a bit more time to discuss and release such performance improvements.

devmotion added a commit that referenced this issue Dec 17, 2024
* Speed up median(Binomial)

Follow suggestion from #1926 to reduce `cdf(Binomial)` evaluations.

* Simplify bound condition

Co-authored-by: Andreas Noack <andreas@noack.dk>

* Fix name of argument variable

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

* Simplify b

Co-authored-by: Andreas Noack <andreas@noack.dk>

* Fix cummulative probability check

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

* Incorporate suggestions from PR review

* Further simplifications following suggestions

* Add more tests

* Fixed checks for median bounds

* Simplified tests a bit

* Update src/univariate/discrete/binomial.jl

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

* Update src/univariate/discrete/binomial.jl

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

* Update src/univariate/discrete/binomial.jl

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

* Avoid test memory allocation with an iterator

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

---------

Co-authored-by: Andreas Noack <andreas@noack.dk>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants