Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fast multiplicative inverses #15357

Merged
merged 4 commits into from
Mar 21, 2016
Merged

Add fast multiplicative inverses #15357

merged 4 commits into from
Mar 21, 2016

Conversation

timholy
Copy link
Sponsor Member

@timholy timholy commented Mar 4, 2016

This is part 1 of 2 towards a new version of ReshapedArrays (#10507). I'm making them separate PRs because part 1 is best reviewed by the "numbers" crowd, and part 2 by the "arrays" crowd. I committed this as @simonster, since the core code here was developed by him.

I benchmarked this on 4 different machines against libdivide, which claims to have emphasized performance. We're doing pretty well, especially on master:

libdivide (g++ -O3) julia-0.4 div julia-0.4 fastdiv julia-0.5 div julia-0.5 fastdiv
core i7 L640 (Westmere) 30.4ms 114ms 40.5ms
core i7 5500U (Broadwell-U) 15.8ms 92ms 23.2ms 91ms 16.2ms
xeon ES-2650 (Sandybridge) 34.2ms 184ms 34.6ms 104ms 23.7ms
opteron 8439 SE (Istanbul) 95.7ms 180ms 53.1ms

julia-0.5 isn't installed on the machines missing numbers (one does not have enough disk space left for a separate build, and the other is Ubuntu 12.04 which has troubles building master).

The tests are in this gist: https://gist.github.com/timholy/1d21885f00ac066d7237.

@timholy
Copy link
Sponsor Member Author

timholy commented Mar 4, 2016

More results on the Broadwell, for different denominators:

d libdivide (g++ -O3) julia-0.5 div julia-0.5 fastdiv
1 13.2ms 88.8ms 6.5ms
2 14.2ms 91.0ms 15.7ms
3 17.0ms 91.7ms 15.8ms
4 14.1ms 91.6ms 15.7ms
5 14.4ms 91.0ms 15.7ms
6 17.0ms 91.1ms 15.7ms
7 14.4ms 91.0ms 15.7ms
8 14.2ms 91.7ms 15.8ms
9 17.1ms 90.8ms 15.7ms
10 14.2ms 91.0ms 15.7ms

using Base: LinearFast, LinearSlow, tail
export multiplicativeinverse

unsigned_type(::Int8) = UInt8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these should just be methods of unsigned{T<:Signed}(::Type{T}) ?

@tkelman
Copy link
Contributor

tkelman commented Mar 5, 2016

Is there an algorithmic reference for this technique? The comments here aren't especially useful at explaining how this works.

@timholy
Copy link
Sponsor Member Author

timholy commented Mar 5, 2016

@tkelman review comments addressed.

I made several improvements in performance. The cost of constructing one of these objects is now equivalent to approximately 14 div operations, or about half of what it used to be. This is likely to still be quite noticeable for creating small ReshapedArrays, but I suspect this is the best we can do.

@musm
Copy link
Contributor

musm commented Mar 5, 2016

Perhaps not fully relevant but potentially useful for other tricks: http://www.jjj.de/fxt/fxtbook.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants