-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
max() could be faster/inlining ternary operations #3030
Comments
MATLAB is more than 20x faster here: julia> A = randn(10000, 10000);
julia> @time max(A);
elapsed time: 0.549925797 seconds (187556 bytes allocated)
julia> @time max(A);
elapsed time: 0.551343924 seconds (64 bytes allocated) >> A = randn(10000, 10000);
>> tic; max(max(A)); toc;
Elapsed time is 0.023501 seconds. |
That is probably because we are calling |
It's mostly from calling julia> A = randn(10000, 10000);
julia> function mymax(A::Matrix{Float64})
x = A[1]
for i = 2:length(A)
@inbounds y = A[i]
x = (x > y) ? x : (isnan(x) ? y : (isnan(y) ? x : y))
end
x
end;
julia> @time mymax(A);
elapsed time: 0.077274143 seconds (96100 bytes allocated)
julia> @time mymax(A);
elapsed time: 0.074994453 seconds (64 bytes allocated) Much better, but still not great. |
Didn't see 13f2f05, but it's slightly slower than the above, probably due to bounds checks |
Yes, I've gradually learned to work around openlibm for really performance-critical code. As you nicely illustrate with your benchmarks, the gap is big enough to be important. In my own private code tree, have a small but growing collection of simple macros (which automatically inline) that do these types of things. @vtjnash's work on #3796 should, when merged, make these unnecessary. @simonster, as far as the remaining performance gap goes: |
Ah, you're right. With
which is the same as above. |
Currently we call openlibm's
fmax()
formax(Float64,Float64)
, and use the version at the end ofpromotion.jl
for integer types. But try these experiments:On my machine,
Float64
, the manually-inlined version runs in about 80% of the time of themax()
versionInt
, the manually-inlined version runs in about 50% of the time of themax()
versionSince the function definitions are beat by manual inlining, it seems that ternary operations are not fully inlined. This is especially obvious for the case of
Int
, where the definition ofmymax()
is identical to that inpromotion.jl
.Related to #2741.
The text was updated successfully, but these errors were encountered: