-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The natural way of iterating backwards (with stepsize negative one) has bad performance #26770
Comments
I think we'll be able to fix this after #25261 |
Looking forward to it! |
Yes, the overhead comes from creating the |
So this can be fixed now? Ref #26770 (comment) |
Starting writing an issue but realized it was a dup of this one. Here it is: The following two functions should do the same thing function copy_second(x, y)
@inbounds for j in 1:size(y, 2)
k = 1
for i in 1:2:size(y, 1)
x[k, j] = y[i, j]
k += 1
end
end
end
function copy_second_while(x, y)
@inbounds for j in 1:size(y, 2)
i = 1; k = 1
while i <= size(y, 1)
x[k, j] = y[i, j]
i+=2
k+=1
end
end
end Yet: y = rand(8, 10^5)
x = rand(4, 10^5)
using BenchmarkTools
@btime copy_second(x, y)
# 1.157 ms (0 allocations: 0 bytes)
@btime copy_second_while(x, y)
# 356.281 μs (0 allocations: 0 bytes) The first function has calls to
which seems intentionally outlined to make the constructor itself inlined. Would be nice to solve this. |
Can we make the creation of struct PositiveStepRange{T,S}
start::T
step::S
stop::T
end
Base.length(r::PositiveStepRange) = length(StepRange(r.start, r.step, r.stop))
@inline function Base.iterate(r::PositiveStepRange)
r.start <= r.stop || return nothing
return (r.start, r.start)
end
@inline function Base.iterate(r::PositiveStepRange, i)
next = i + r.step
next <= r.stop || return nothing
return (next, next)
end
using Test
@testset for i in 1:10
@test collect(PositiveStepRange(1, i, 5)) == 1:i:5
end
function copy_second(x, y)
@inbounds for j in 1:size(y, 2)
k = 1
for i in 1:2:size(y, 1)
x[k, j] = y[i, j]
k += 1
end
end
end
function copy_second_while(x, y)
@inbounds for j in 1:size(y, 2)
i = 1; k = 1
while i <= size(y, 1)
x[k, j] = y[i, j]
i+=2
k+=1
end
end
end
function copy_second_pos(x, y)
@inbounds for j in 1:size(y, 2)
k = 1
for i in PositiveStepRange(1, 2, size(y, 1))
x[k, j] = y[i, j]
k += 1
end
end
end
using BenchmarkTools
y = rand(8, 10^5)
x = rand(4, 10^5)
@btime copy_second(x, y)
# 755.079 μs (0 allocations: 0 bytes)
@btime copy_second_while(x, y)
# 362.013 μs (0 allocations: 0 bytes)
@btime copy_second_pos(x, y)
# 358.420 μs (0 allocations: 0 bytes) Of course, this does not support negative |
Is the crux that we want |
Something like julia> struct FastStepRange{T,S} <: OrdinalRange{T,S}
start::T
step::S
stop::T
end
julia> @inline function Base.iterate(r::FastStepRange)
(r.stop - r.start)*r.step >= 0 || return nothing
return (r.start, r.start)
end
julia> @inline function Base.iterate(r::FastStepRange, i)
next = i + r.step
(r.stop - next)*r.step >= 0 || return nothing
return (next, next)
end might work. Math is typically faster than branching, though of course branch prediction can change that. The hard part is the set of "corner cases": what do you do with |
@KristofferC I'm not sure if I understand your comment. @timholy How about struct FastStepRange{T,S} # <: OrdinalRange{T,S}
start::T
step::S
stop::T
end
Base.length(r::FastStepRange) = length(StepRange(r.start, r.step, r.stop))
@inline function Base.iterate(r::FastStepRange, i = nothing)
next = i === nothing ? r.start : i + r.step
(next - r.stop) * sign(r.step) > 0 && return nothing
return (next, next)
end
using Test
@testset for i in 1:10
@test collect(FastStepRange(1, i, 5)) == 1:i:5
@test collect(FastStepRange(1, -i, 5)) == 1:-i:5
@test collect(FastStepRange(-1, -i, -5)) == -1:-i:-5
end
function copy_second(x, y)
@inbounds for j in 1:size(y, 2)
k = 1
for i in 1:2:size(y, 1)
x[k, j] = y[i, j]
k += 1
end
end
end
function copy_second_while(x, y)
@inbounds for j in 1:size(y, 2)
i = 1; k = 1
while i <= size(y, 1)
x[k, j] = y[i, j]
i+=2
k+=1
end
end
end
function copy_second_fast(x, y)
@inbounds for j in 1:size(y, 2)
k = 1
for i in FastStepRange(1, 2, size(y, 1))
x[k, j] = y[i, j]
k += 1
end
end
end
@inline function sum_iter(itr)
acc = 0
for i in itr
acc += i
end
return acc
end
@inline sum_fast(n = 10) = sum_iter(FastStepRange(1, 1, n))
using BenchmarkTools
y = rand(8, 10^5)
x = rand(4, 10^5)
@btime copy_second(x, y)
# 746.105 μs (0 allocations: 0 bytes)
@btime copy_second_while(x, y)
# 361.389 μs (0 allocations: 0 bytes)
@btime copy_second_fast(x, y)
# 392.118 μs (0 allocations: 0 bytes)
@btime sum_iter($(1:1:2^15))
# 8.230 μs (0 allocations: 0 bytes)
@btime sum_iter($(FastStepRange(1, 1, 2^15)))
# 8.234 μs (0 allocations: 0 bytes)
@btime sum_fast($(2^15))
# 1.964 ns (0 allocations: 0 bytes)
@btime sum_iter($(1:2^15))
# 2.213 ns (0 allocations: 0 bytes) Julia's compiler is really great and it looks like it can eliminate |
Actually this was wrong. |
Yeah, sorry, I used the wrong term with O(1). |
@KristofferC I think this can be closed:
|
It would be nice if the natural way of iterating backwards would be as performant as doing it the ugly way.
The text was updated successfully, but these errors were encountered: