Skip to content

Conversation

@timholy
Copy link
Member

@timholy timholy commented Mar 4, 2016

This is part 1 of 2 towards a new version of ReshapedArrays (#10507). I'm making them separate PRs because part 1 is best reviewed by the "numbers" crowd, and part 2 by the "arrays" crowd. I committed this as @simonster, since the core code here was developed by him.

I benchmarked this on 4 different machines against libdivide, which claims to have emphasized performance. We're doing pretty well, especially on master:

libdivide (g++ -O3) julia-0.4 div julia-0.4 fastdiv julia-0.5 div julia-0.5 fastdiv
core i7 L640 (Westmere) 30.4ms 114ms 40.5ms
core i7 5500U (Broadwell-U) 15.8ms 92ms 23.2ms 91ms 16.2ms
xeon ES-2650 (Sandybridge) 34.2ms 184ms 34.6ms 104ms 23.7ms
opteron 8439 SE (Istanbul) 95.7ms 180ms 53.1ms

julia-0.5 isn't installed on the machines missing numbers (one does not have enough disk space left for a separate build, and the other is Ubuntu 12.04 which has troubles building master).

The tests are in this gist: https://gist.github.com/timholy/1d21885f00ac066d7237.

@timholy
Copy link
Member Author

timholy commented Mar 4, 2016

More results on the Broadwell, for different denominators:

d libdivide (g++ -O3) julia-0.5 div julia-0.5 fastdiv
1 13.2ms 88.8ms 6.5ms
2 14.2ms 91.0ms 15.7ms
3 17.0ms 91.7ms 15.8ms
4 14.1ms 91.6ms 15.7ms
5 14.4ms 91.0ms 15.7ms
6 17.0ms 91.1ms 15.7ms
7 14.4ms 91.0ms 15.7ms
8 14.2ms 91.7ms 15.8ms
9 17.1ms 90.8ms 15.7ms
10 14.2ms 91.0ms 15.7ms

using Base: LinearFast, LinearSlow, tail
export multiplicativeinverse

unsigned_type(::Int8) = UInt8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these should just be methods of unsigned{T<:Signed}(::Type{T}) ?

@tkelman
Copy link
Contributor

tkelman commented Mar 5, 2016

Is there an algorithmic reference for this technique? The comments here aren't especially useful at explaining how this works.

@timholy
Copy link
Member Author

timholy commented Mar 5, 2016

@tkelman review comments addressed.

I made several improvements in performance. The cost of constructing one of these objects is now equivalent to approximately 14 div operations, or about half of what it used to be. This is likely to still be quite noticeable for creating small ReshapedArrays, but I suspect this is the best we can do.

@musm
Copy link
Contributor

musm commented Mar 5, 2016

Perhaps not fully relevant but potentially useful for other tricks: http://www.jjj.de/fxt/fxtbook.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants