Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max() could be faster/inlining ternary operations #3030

Closed
timholy opened this issue May 6, 2013 · 6 comments
Closed

max() could be faster/inlining ternary operations #3030

timholy opened this issue May 6, 2013 · 6 comments
Labels
performance Must go faster
Milestone

Comments

@timholy
Copy link
Member

timholy commented May 6, 2013

Currently we call openlibm's fmax() for max(Float64,Float64), and use the version at the end of promotion.jl for integer types. But try these experiments:

# If you copy the RHS of these definitions in place of the call to
# mymax() below, you get a substantial performance boost
# compared to max()
mymax{T<:FloatingPoint}(x::T, y::T) = (x > y) ? x : (isnan(x) ? y : (isnan(y) ? x : y))
mymax(x::Integer, y::Integer) = (x > y) ? x : y

function compare_max{T<:Number}(n1::Vector{T}, n2::Vector{T})
    mx1 = zero(T)
    @time begin
    for i = 1:length(n1)
        mx1 += max(n1[i], n2[i])
    end
    end
    mx2 = zero(T)
    @time begin
    for i = 1:length(n1)
        x = n1[i]
        y = n2[i]
        mx2 += mymax(x, y)
    end
    end
    @assert mx1 == mx2
    @show mx1
end

On my machine,

  • for Float64, the manually-inlined version runs in about 80% of the time of the max() version
  • for Int, the manually-inlined version runs in about 50% of the time of the max() version

Since the function definitions are beat by manual inlining, it seems that ternary operations are not fully inlined. This is especially obvious for the case of Int, where the definition of mymax() is identical to that in promotion.jl.

Related to #2741.

@simonster
Copy link
Member

MATLAB is more than 20x faster here:

julia> A = randn(10000, 10000);

julia> @time max(A);
elapsed time: 0.549925797 seconds (187556 bytes allocated)

julia> @time max(A);
elapsed time: 0.551343924 seconds (64 bytes allocated)
>> A = randn(10000, 10000);        
>> tic; max(max(A)); toc;
Elapsed time is 0.023501 seconds.

@JeffBezanson
Copy link
Member

That is probably because we are calling fmax.

@simonster
Copy link
Member

It's mostly from calling fmax, but not entirely:

julia> A = randn(10000, 10000);

julia> function mymax(A::Matrix{Float64})
         x = A[1]
         for i = 2:length(A)
           @inbounds y = A[i]
           x = (x > y) ? x : (isnan(x) ? y : (isnan(y) ? x : y))
         end
         x
       end;

julia> @time mymax(A);
elapsed time: 0.077274143 seconds (96100 bytes allocated)

julia> @time mymax(A);
elapsed time: 0.074994453 seconds (64 bytes allocated)

Much better, but still not great.

@simonster
Copy link
Member

Didn't see 13f2f05, but it's slightly slower than the above, probably due to bounds checks

@timholy
Copy link
Member Author

timholy commented Aug 8, 2013

Yes, I've gradually learned to work around openlibm for really performance-critical code. As you nicely illustrate with your benchmarks, the gap is big enough to be important. In my own private code tree, have a small but growing collection of simple macros (which automatically inline) that do these types of things. @vtjnash's work on #3796 should, when merged, make these unnecessary.

@simonster, as far as the remaining performance gap goes: max is also easily farmed-out to multiple threads. Did you run those benchmarks with singleCompThread?

@simonster
Copy link
Member

Ah, you're right. With -singleCompThread

>> tic; max(max(A)); toc
Elapsed time is 0.074805 seconds.

which is the same as above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

3 participants