Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Real arguments in R functions and use Julia implementations only #125

Merged
merged 6 commits into from
Sep 28, 2021

Conversation

devmotion
Copy link
Member

@devmotion devmotion commented Sep 6, 2021

This PR is based on #124 and contains some parts of @andreasnoack's PR #113. I thought it would be cleaner to open a new PR instead of adding commits to #124.

The main motivation for this PR is that I am not satisfied with the approach in #124. I agree with @nalimilan that the type union is somewhat arbitrary and ideally we would just allow Real and let convert(Float64, ...) (implicitly in the ccall) handle the arguments when we call into Rmath. The main issue is that currently some generic Julia implementations exist in StatsFuns that already allow Real but they are used only as fallbacks and not for arguments of type Float64 or Int.

In this PR,

  • R functions are defined for arguments of type Real,
  • and the Julia implementations do not extend the corresponding function in the RFunctions submodule anymore but are separate functions of the same name to which also Float64 and Int are dispatched.

The second point makes it also a bit easier to ensure that we actually compare the R with the Julia implementations in the tests since currently e.g. RFunctions.betalogpdf and betalogpdf are the same function and for arguments of type Union{Float64,Int} always the R implementation is used.

Currently, the Julia implementations do not correspond to the R versions exactly, e.g. betapdf errors for values of x outside of [0, 1] whereas RFunctions.betapdf returns 0. I used some of @andreasnoack's changes in #113 and tried to handle values outside of the support in this PR. I also extended the tests similar to #113, in addition to the tests with Float16 and Float32 from #124, to make sure that the Julia implementations are accurate enough. Maybe some additional tests should be added though.

The main difference from #113 is that this PR does not add any new Julia implementations (e.g. for CDF or other distributions). It only updates the existing implementations such that they can be used instead of the R functions also with Float64 and Int.

A positive side-effect of using the Julia implementation for Float64 is that it seems to improve performance. The example below is motivated by @cscherrer's observation (https://twitter.com/ChadScherrer/status/1433143576943337477) that MeasureTheory's pdf for student t distributions is faster than tdistpdf.

Current release

julia> using StatsFuns, BenchmarkTools, Random

julia> N = 100_000;

julia> out = Vector{Float64}(undef, N);

julia> @btime map!(betapdf, $out, $(randexp(N)), $(randexp(N)), $(rand(N)));
  32.881 ms (0 allocations: 0 bytes)

julia> @btime map!(betalogpdf, $out, $(randexp(N)), $(randexp(N)), $(rand(N)));
  29.748 ms (0 allocations: 0 bytes)

julia> @btime map!(tdistpdf, $out, $(randexp(N)), $(randn(N)));
  35.008 ms (0 allocations: 0 bytes)

julia> @btime map!(tdistlogpdf, $out, $(randexp(N)), $(randn(N)));
  31.259 ms (0 allocations: 0 bytes)

This PR

julia> using StatsFuns, BenchmarkTools, Random

julia> N = 100_000;

julia> out = Vector{Float64}(undef, N);

julia> @btime map!(betapdf, $out, $(randexp(N)), $(randexp(N)), $(rand(N)));
  10.250 ms (0 allocations: 0 bytes)

julia> @btime map!(betalogpdf, $out, $(randexp(N)), $(randexp(N)), $(rand(N)));
  9.373 ms (0 allocations: 0 bytes)

julia> @btime map!(tdistpdf, $out, $(randexp(N)), $(randn(N)));
  7.560 ms (0 allocations: 0 bytes)

julia> @btime map!(tdistlogpdf, $out, $(randexp(N)), $(randn(N)));
  6.860 ms (0 allocations: 0 bytes)

@codecov-commenter
Copy link

codecov-commenter commented Sep 7, 2021

Codecov Report

Merging #125 (66bb491) into master (fe65c00) will increase coverage by 7.17%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #125      +/-   ##
==========================================
+ Coverage   30.23%   37.41%   +7.17%     
==========================================
  Files          12       12              
  Lines         377      417      +40     
==========================================
+ Hits          114      156      +42     
+ Misses        263      261       -2     
Impacted Files Coverage Δ
src/chainrules.jl 100.00% <ø> (ø)
src/distrs/beta.jl 100.00% <100.00%> (ø)
src/distrs/binom.jl 100.00% <100.00%> (ø)
src/distrs/chisq.jl 100.00% <100.00%> (ø)
src/distrs/fdist.jl 100.00% <100.00%> (ø)
src/distrs/gamma.jl 100.00% <100.00%> (ø)
src/distrs/pois.jl 100.00% <100.00%> (ø)
src/distrs/tdist.jl 100.00% <100.00%> (ø)
src/rmath.jl 100.00% <100.00%> (ø)
src/distrs/norm.jl 97.29% <0.00%> (+2.70%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fe65c00...66bb491. Read the comment docs.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. I guess that switching progressively to Julia implementations can also make the process easier than waiting for a complete switch. I'll leave @andreasnoack comment on these implementations though.

src/distrs/pois.jl Show resolved Hide resolved
src/chainrules.jl Show resolved Hide resolved
((1f0, 1f0), 0f0:0.01f0:1f0),
((1.0, 1.0), 0f0:0.01f0:1f0),
((Float16(1), Float16(1)), Float16(0):Float16(0.01):Float16(1)),
((1f0, 1f0), Float16(0):Float16(0.01):Float16(1)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also test other types such as integers, rationals and/or a big type. Also, do we have tests somewhere mixing different input types to check that promotion works?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll add integers and rationals as well. Promotions are already tested, e.g. here Float32 and Float16 and Float32 and Float64 above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more tests with integers and rationals. I also added tests for the values outside of the support.

@devmotion
Copy link
Member Author

Bump 🙂

It would also be nice to merge this PR since it fixes the remaining test errors in JuliaStats/Distributions.jl#1387.

@devmotion
Copy link
Member Author

@andreasnoack it would be great if you could have a look at this PR 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants