-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could @inbounds
communicate its (removed) invariants to LLVM?
#39340
Comments
Since BTW, interestingly, my Windows machine gave a different result. julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
julia> @btime test(xs, eachcol);
33.650 ms (5 allocations: 112 bytes)
julia> @btime test(xs, myeachcol);
30.425 ms (5 allocations: 112 bytes) |
Doh, I should have seen that in the generated code. I'll rename the issue to be about the
Interesting! As an additional datapoint I just tested on a mid-2014 Macbook and macOS and see the same performance split with |
Debian on WSL2 on the same Windows machine mentioned above had the same trending results as your Macbook. julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
julia> @btime test(xs, eachcol);
28.689 ms (5 allocations: 112 bytes)
julia> @btime test(xs, myeachcol);
36.112 ms (5 allocations: 112 bytes) |
Why was this issue closed? I re-ran the measurements from the original post and the results still hold on Julia 1.6 on macOS: julia> myeachcol(A::AbstractVecOrMat) = ((@inbounds view(A, :, i)) for i in axes(A, 2));
julia> test(xs, eachcol) = sum(sum(x) for x in eachcol(xs));
julia> xs = rand(2, 10^7);
julia> using BenchmarkTools
julia> test(xs, eachcol); test(xs, myeachcol); # warmup
julia>
julia> @time test(xs, eachcol);
0.036498 seconds (5 allocations: 112 bytes)
julia> @time test(xs, myeachcol);
0.039457 seconds (5 allocations: 112 bytes)
julia> @btime test(xs, eachcol);
27.893 ms (5 allocations: 112 bytes)
julia> @btime test(xs, myeachcol);
32.159 ms (5 allocations: 112 bytes) |
So, you're effectively creating a nested for loop: for i in axes(A, 2)
v = #= maybe @inbounds =# view(A, :, i)
inner_result = 0.0
for j in eachindex(v)
inner_result += v[j]
end
result += inner_result
end It's not terribly surprising that a boundscheck on the julia> xs = rand(2, 10^7);
julia> function f(A)
result = 0.0
for i in axes(A, 2)
v = view(A, :, i)
inner_result = 0.0
for j in eachindex(v)
inner_result += v[j]
end
result += inner_result
end
return result
end
f (generic function with 2 methods)
julia> function g(A)
result = 0.0
for i in axes(A, 2)
v = @inbounds view(A, :, i)
inner_result = 0.0
for j in eachindex(v)
inner_result += v[j]
end
result += inner_result
end
return result
end
g (generic function with 2 methods)
julia> @btime f($xs)
13.590 ms (0 allocations: 0 bytes)
9.999553205385104e6
julia> @btime g($xs)
18.410 ms (0 allocations: 0 bytes)
9.999553205385104e6 |
Interesting, thanks for the explanation. It makes sense to me now that This issue was opened because someone noticed the slowdown and was surprised, and #26261 contains this quote from someone else who was also surprised:
Should this issue stay open to track documentation of the existing behavior? |
@inbounds
communicate its (removed) invariants to LLVM?
It seemed to me that @maleadt's comment indicated that this issue was fixed, so I closed it. |
@mcabbott and I were investigating the performance effect of
@inbounds
annotations insideeachcol
and found some puzzling behavior with a simple test function.Using the following definitions
I measured the performance of
test
usingeachcol
andmyeachcol
both using@time
and@btime
:To my surprise the function with
@inbounds
annotation is slower than the Base function without it.The generated assembly (output of
@code_native
) is the same on my mac for the two functions modulo line number comments [edit: this part is not relevant, see @kimikage 's comment below; the actual processing is done inmapfold_impl
]:Here's a more complete set of statistics reported by
@benchmark
:The Base
eachcol
definition in Julia 1.5.3 inabstractarraymath.jl:479
is:The text was updated successfully, but these errors were encountered: