Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow fallback array copyto! in Julia 1.7.0 #43287

Closed
johnnychen94 opened this issue Dec 1, 2021 · 3 comments · Fixed by #43347
Closed

slow fallback array copyto! in Julia 1.7.0 #43287

johnnychen94 opened this issue Dec 1, 2021 · 3 comments · Fixed by #43347
Assignees
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@johnnychen94
Copy link
Member

It looks like a potential type inference issue

using BenchmarkTools
X = rand(ComplexF32, 64, 64);

dst = reinterpret(reshape, Float32, X);
src = copy(dst);

@btime copyto!($dst, $src);
# 1.6.4: 42.120 μs (0 allocations: 0 bytes)
# 1.7.0: 800.737 μs (32768 allocations: 1.12 MiB)
# 1.8.0-DEV.1090: 829.629 μs (32768 allocations: 1.12 MiB)

Profiling the code shows that it's hitting the fallback dual-iterator implementation version and a slow Tuple iterator

julia/base/abstractarray.jl

Lines 1041 to 1047 in d16f480

# Dual-iterator implementation
ret = iterate(iterdest)
@inbounds for a in src
idx, state = ret
dest[idx] = a
ret = iterate(iterdest, state)
end

indexed_iterate(t::Tuple, i::Int, state=1) = (@inline; (getfield(t, i), i+1))

image

Downstream JuliaImages issues: JuliaImages/Images.jl#992 JuliaImages/ImageCore.jl#174 (cc: @etibarg @timholy)

@johnnychen94 johnnychen94 added the performance Must go faster label Dec 1, 2021
@KristofferC KristofferC added the regression Regression in behavior compared to a previous version label Dec 1, 2021
@N5N3
Copy link
Member

N5N3 commented Dec 2, 2021

@code_warntype gives no warning.
But @code_typed shows that:

9 ── %70  = %new(Base.SCartesianIndex2{2}, 1, %65)::Base.SCartesianIndex2{2}%71  = Core.tuple(1, %65, %66)::Tuple{Int64, Int64, Int64}%72  = Core.tuple(%70, %71)::Tuple{Base.SCartesianIndex2{2}, Tuple{Int64, Int64, Int64}}
└───        goto #10
10%74  = φ (#9 => %70)::Union{Base.SCartesianIndex2{2}, Tuple{Int64, Int64, Int64}}%75  = φ (#9 => %71)::Union{Base.SCartesianIndex2{2}, Tuple{Int64, Int64, Int64}}%76  = φ (#8 => %68, #9 => %72)::Union{Nothing, Tuple{Base.SCartesianIndex2{2}, Tuple{Int64, Int64, Int64}}}%77  = Base.bitcast(UInt64, 1)::UInt64%78  = Base.sub_int(%77, 0x0000000000000001)::UInt64%79  = Base.arraylen(src)::Int64%80  = Base.sle_int(0, %79)::Bool%81  = Base.bitcast(UInt64, %79)::UInt64%82  = Base.ult_int(%78, %81)::Bool%83  = Base.and_int(%80, %82)::Bool
└───        goto #12 if not %83

Looks like we messes up at %74 and %75

@timholy
Copy link
Member

timholy commented Dec 5, 2021

git bisect blames 1f21f2d

@aviatesk aviatesk self-assigned this Dec 5, 2021
aviatesk added a commit that referenced this issue Dec 6, 2021
…prop' callsite

Makes full use of constant-propagation, by addressing this [TODO](https://github.com/JuliaLang/julia/blob/00734c5fd045316a00d287ca2c0ec1a2eef6e4d1/base/compiler/ssair/inlining.jl#L1212).
Here is a performance improvement from #43287:
```julia
ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @Btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit
```

fixes #43287
@aviatesk
Copy link
Member

aviatesk commented Dec 6, 2021

Thanks for the bisect. Will be fixed by #43347.

aviatesk added a commit that referenced this issue Dec 14, 2021
…prop' callsite

Makes full use of constant-propagation, by addressing this [TODO](https://github.com/JuliaLang/julia/blob/00734c5fd045316a00d287ca2c0ec1a2eef6e4d1/base/compiler/ssair/inlining.jl#L1212).
Here is a performance improvement from #43287:
```julia
ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @Btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit
```

fixes #43287
aviatesk added a commit that referenced this issue Jan 5, 2022
…prop' callsite

Makes full use of constant-propagation, by addressing this [TODO](https://github.com/JuliaLang/julia/blob/00734c5fd045316a00d287ca2c0ec1a2eef6e4d1/base/compiler/ssair/inlining.jl#L1212).
Here is a performance improvement from #43287:
```julia
ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @Btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit
```

fixes #43287
aviatesk added a commit that referenced this issue Jan 5, 2022
…prop' callsite (#43347)

Makes full use of constant-propagation, by addressing this [TODO](https://github.com/JuliaLang/julia/blob/00734c5fd045316a00d287ca2c0ec1a2eef6e4d1/base/compiler/ssair/inlining.jl#L1212).
Here is a performance improvement from #43287:
```julia
ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @Btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit
```

fixes #43287
aviatesk added a commit that referenced this issue Jan 5, 2022
…prop' callsite (#43347)

Makes full use of constant-propagation, by addressing this [TODO](https://github.com/JuliaLang/julia/blob/00734c5fd045316a00d287ca2c0ec1a2eef6e4d1/base/compiler/ssair/inlining.jl#L1212).
Here is a performance improvement from #43287:
```julia
ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @Btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit
```

fixes #43287
aviatesk added a commit that referenced this issue Jan 5, 2022
…prop' callsite (#43347)

Makes full use of constant-propagation, by addressing this 
[TODO](https://github.com/JuliaLang/julia/blob/00734c5fd045316a00d287ca2c0ec1a2eef6e4d1/base/compiler/ssair/inlining.jl#L1212).
Here is a performance improvement from #43287:
```julia
ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @Btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit
```

fixes #43287
aviatesk added a commit that referenced this issue Jan 5, 2022
…prop' callsite (#43347)

Makes full use of constant-propagation, by addressing this 
[TODO](https://github.com/JuliaLang/julia/blob/00734c5fd045316a00d287ca2c0ec1a2eef6e4d1/base/compiler/ssair/inlining.jl#L1212).
Here is a performance improvement from #43287:
```julia
ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @Btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit
```

fixes #43287
LilithHafner pushed a commit to LilithHafner/julia that referenced this issue Feb 22, 2022
…prop' callsite (JuliaLang#43347)

Makes full use of constant-propagation, by addressing this [TODO](https://github.com/JuliaLang/julia/blob/00734c5fd045316a00d287ca2c0ec1a2eef6e4d1/base/compiler/ssair/inlining.jl#L1212).
Here is a performance improvement from JuliaLang#43287:
```julia
ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @Btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit
```

fixes JuliaLang#43287
LilithHafner pushed a commit to LilithHafner/julia that referenced this issue Mar 8, 2022
…prop' callsite (JuliaLang#43347)

Makes full use of constant-propagation, by addressing this [TODO](https://github.com/JuliaLang/julia/blob/00734c5fd045316a00d287ca2c0ec1a2eef6e4d1/base/compiler/ssair/inlining.jl#L1212).
Here is a performance improvement from JuliaLang#43287:
```julia
ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @Btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit
```

fixes JuliaLang#43287
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants