You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
mapreduce can be much slower than the equivalent for-loop. A small (real-life) example:
functiondsum(A::Matrix)
z =zero(A[1,1])
n = Base.LinAlg.checksquare(A)
B =Vector{typeof(z)}(n)
@inboundsfor j in1:n
B[j] =mapreduce(k -> A[j,k]*A[k,j], +, z, 1:j)
end
B
endfunctiondfor(A::Matrix)
z =zero(A[1,1])
n = Base.LinAlg.checksquare(A)
B =Vector{typeof(z)}(n)
@inboundsfor j in1:n
d = z
for k in1:j
d += A[j,k]*A[k,j]
end
B[j] = d
end
B
end
A =randn(127,127)
time(median(@benchmarkdsum(A)))/time(median(@benchmarkdfor(A)))
gives me a performance ratio of about x50 on Julia 0.5, juliabox.com. I think this could be because the for-loop can be automatically simd, and the mapreduce isn't? When A = randn(N,N) and N is 16, the gap is around x75, and for N = 10000, the gap is around x25. Replacing the array access A[j,k] with A[rand(1:size(A,1)),rand(1:size(A,2))] destroys the performance on both, but the ratio becomes x1.
Is simd the reason why one is x50 faster?
Should this be described in Performance Tips? mapreduce underlies sum, so this could be a popular trap that isn't currently mentioned
Would this be a useful benchmark? on nanosoldier?
Could the performance gap be smaller?
(Benchmarking mapreduce versus for-loops without array access, I still see a x2 performance gap. E.g. mapreduce(identity, +, 0, i for i in 1:n) versus the equivalent integer-summing for loop. It looks like this gap used to be smaller? Worth another benchmark in CI?)
The text was updated successfully, but these errors were encountered:
functiondsum(A::Matrix)
z =zero(A[1,1])
n = Base.LinAlg.checksquare(A)
B =Vector{typeof(z)}(n)
@inboundsfor j::Intin1:n
B[j] =_help(A, j, z)
end
B
end_help(A, j, z) =mapreduce(k -> A[j,k]*A[k,j], +, z, 1:j)
mapreduce
can be much slower than the equivalentfor
-loop. A small (real-life) example:gives me a performance ratio of about x50 on Julia 0.5, juliabox.com. I think this could be because the
for
-loop can be automaticallysimd
, and the mapreduce isn't? WhenA = randn(N,N)
andN
is16
, the gap is around x75, and forN = 10000
, the gap is around x25. Replacing the array accessA[j,k]
withA[rand(1:size(A,1)),rand(1:size(A,2))]
destroys the performance on both, but the ratio becomes x1.simd
the reason why one is x50 faster?mapreduce
underliessum
, so this could be a popular trap that isn't currently mentioned(Benchmarking
mapreduce
versusfor
-loops without array access, I still see a x2 performance gap. E.g.mapreduce(identity, +, 0, i for i in 1:n)
versus the equivalent integer-summingfor
loop. It looks like this gap used to be smaller? Worth another benchmark in CI?)The text was updated successfully, but these errors were encountered: