Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concatenation is slower than it could be #21673

Closed
jebej opened this issue May 2, 2017 · 11 comments
Closed

Concatenation is slower than it could be #21673

jebej opened this issue May 2, 2017 · 11 comments
Labels
arrays [a, r, r, a, y, s] performance Must go faster

Comments

@jebej
Copy link
Contributor

jebej commented May 2, 2017

This is an issue on 0.5.1 and 0.6:

__ this was not an inference issue, the case below was updated to reflect that __

function test1(N::Integer)
    return [1;1:N]
end
function test2(N::Integer)
    A = Vector{Int}(undef, N+1)
    A[1] = 1
    A[2:end] = 1:N
    return A
end
using BenchmarkTools
test1(20) == test2(20)
@benchmark test1(20)
@benchmark test2(20)
julia> test1(20) == test2(20)
true

julia> @benchmark test1(20)
BenchmarkTools.Trial:
  memory estimate:  2.25 KiB
  allocs estimate:  49
  --------------
  minimum time:     19.282 μs (0.00% GC)
  median time:      19.905 μs (0.00% GC)
  mean time:        20.430 μs (0.00% GC)
  maximum time:     98.278 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark test2(20)
BenchmarkTools.Trial:
  memory estimate:  256 bytes
  allocs estimate:  1
  --------------
  minimum time:     63.902 ns (0.00% GC)
  median time:      70.512 ns (0.00% GC)
  mean time:        76.032 ns (5.29% GC)
  maximum time:     571.020 ns (72.77% GC)
  --------------
  samples:          10000
  evals/sample:     988
@jebej
Copy link
Contributor Author

jebej commented May 2, 2017

This bug is making speye very slow.

@ararslan ararslan added arrays [a, r, r, a, y, s] compiler:inference Type inference labels May 2, 2017
@TotalVerb
Copy link
Contributor

This doesn't look like an inference problem.

@jebej
Copy link
Contributor Author

jebej commented May 6, 2017

Shouldn't the compiler know A is a vector of ints?

@TotalVerb
Copy link
Contributor

There's no longer an A by the time inference runs; it's optimized out. All of the intermediate expressions have been inferred.

@StefanKarpinski
Copy link
Member

To clarify what @TotalVerb is saying – the fact that A has type Any in the type warn output is a red herring (misleading clue) since A doesn't actually appear anywhere in the code, which is the only reason it doesn't get a type annotation – what is the type of a variable that doesn't exist?

I would note that in either function, calling collect is quite unhelpful here – it would be far better to let the values be generated on-demand. The second version seems to manage to optimize away the creation of collect(1:N) entirely. Unfortunately, removing the collect from test1 doesn't fix the performance issue either.

@KristofferC
Copy link
Member

Ref #21281, #20801

@jebej jebej changed the title Inference issue with array concatenation Concatenation issue May 8, 2017
@jebej
Copy link
Contributor Author

jebej commented May 8, 2017

I see, thanks for the explanation, I've updated the OP.

@jebej
Copy link
Contributor Author

jebej commented Sep 16, 2017

Any idea what the issue is here? This is still a problem on release 0.6

@mbauman mbauman changed the title Concatenation issue Concatenation is slower than it could be Sep 20, 2019
@mbauman mbauman added performance Must go faster and removed compiler:inference Type inference labels Sep 20, 2019
@mbauman
Copy link
Member

mbauman commented Sep 20, 2019

Somewhat better on 1.4-dev:

julia> @benchmark test1(20)
BenchmarkTools.Trial:
  memory estimate:  928 bytes
  allocs estimate:  17
  --------------
  minimum time:     1.113 μs (0.00% GC)
  median time:      1.408 μs (0.00% GC)
  mean time:        1.464 μs (4.21% GC)
  maximum time:     317.081 μs (99.30% GC)
  --------------
  samples:          10000
  evals/sample:     10

julia> @benchmark test2(20)
BenchmarkTools.Trial:
  memory estimate:  256 bytes
  allocs estimate:  1
  --------------
  minimum time:     49.606 ns (0.00% GC)
  median time:      54.791 ns (0.00% GC)
  mean time:        59.367 ns (4.12% GC)
  maximum time:     676.165 ns (88.80% GC)
  --------------
  samples:          10000
  evals/sample:     987

So we're only ~20x slower than ideal, instead of ~300x. Progress! (I updated the OP with the undef syntax for 1.x).

@jebej
Copy link
Contributor Author

jebej commented Oct 4, 2020

For the example in the original issue, where we are concatenating a range with a number of the same type, the issue could be fixed by changing the signature of the method below to Union{AbstractRange{T},T}.

function vcat(rs::AbstractRange{T}...) where T

I can make the PR if that's acceptable, though it would be nice to fix the more general case as well.

The issue seems to be with the general __cat method in abstractarrays.jl which is slow because it needs to handles concatenation of any types in any dimension.

Note that there is a _typed_vcat method in abstract_array.jl that seems like it would work for this case, as long as the assignment a[pos:p1] = Vk was replaced by another for loop (as in the range case):

function _typed_vcat(::Type{T}, V::AbstractVecOrTuple{AbstractVector}) where T

We would also need to widen the signature of both vcat and typed_vcat in the call chain so that the _typed_vcat method would get called.

PS: Note that this _typed_vcat method appears like it would work just as well for ranges, so why is there a specialization in range.jl? Basically it seems like we could have a single vcat method for any know-length 1D container.

timholy added a commit that referenced this issue Jan 19, 2021
The `cat` pipeline has long had poor inferrability.
Together with #39292 and #39294, this should basically put an
end to that problem.

Together, at least in simple cases these make the performance
of `cat` essentially equivalent to the manual version.
In other words, the `test1` and `test2` of #21673 benchmark
very similarly.
@jebej
Copy link
Contributor Author

jebej commented Jan 19, 2021

There is a regressions on 1.6 compared to 1.5, so hopefully we can backport @timholy's PRs. On 1.5, the slowdown is ~26x, whereas on 1.6 beta1, I get a 74x slowdown:

julia> @benchmark test1(20)
BenchmarkTools.Trial:
  memory estimate:  1.56 KiB
  allocs estimate:  34
  --------------
  minimum time:     2.544 μs (0.00% GC)
  median time:      2.633 μs (0.00% GC)
  mean time:        2.918 μs (1.40% GC)
  maximum time:     146.111 μs (96.37% GC)
  --------------
  samples:          10000
  evals/sample:     9

julia> @benchmark test2(20)
BenchmarkTools.Trial:
  memory estimate:  256 bytes
  allocs estimate:  1
  --------------
  minimum time:     33.133 ns (0.00% GC)
  median time:      35.743 ns (0.00% GC)
  mean time:        41.742 ns (3.74% GC)
  maximum time:     560.241 ns (60.00% GC)
  --------------
  samples:          10000
  evals/sample:     996

KristofferC pushed a commit that referenced this issue Jan 20, 2021
The `cat` pipeline has long had poor inferrability.
Together with #39292 and #39294, this should basically put an
end to that problem.

Together, at least in simple cases these make the performance
of `cat` essentially equivalent to the manual version.
In other words, the `test1` and `test2` of #21673 benchmark
very similarly.

(cherry picked from commit 78d55e2)
KristofferC pushed a commit that referenced this issue Feb 1, 2021
The `cat` pipeline has long had poor inferrability.
Together with #39292 and #39294, this should basically put an
end to that problem.

Together, at least in simple cases these make the performance
of `cat` essentially equivalent to the manual version.
In other words, the `test1` and `test2` of #21673 benchmark
very similarly.

(cherry picked from commit 78d55e2)
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this issue May 4, 2021
The `cat` pipeline has long had poor inferrability.
Together with JuliaLang#39292 and JuliaLang#39294, this should basically put an
end to that problem.

Together, at least in simple cases these make the performance
of `cat` essentially equivalent to the manual version.
In other words, the `test1` and `test2` of JuliaLang#21673 benchmark
very similarly.
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this issue May 9, 2021
The `cat` pipeline has long had poor inferrability.
Together with JuliaLang#39292 and JuliaLang#39294, this should basically put an
end to that problem.

Together, at least in simple cases these make the performance
of `cat` essentially equivalent to the manual version.
In other words, the `test1` and `test2` of JuliaLang#21673 benchmark
very similarly.
staticfloat pushed a commit that referenced this issue Dec 23, 2022
The `cat` pipeline has long had poor inferrability.
Together with #39292 and #39294, this should basically put an
end to that problem.

Together, at least in simple cases these make the performance
of `cat` essentially equivalent to the manual version.
In other words, the `test1` and `test2` of #21673 benchmark
very similarly.

(cherry picked from commit 78d55e2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] performance Must go faster
Projects
None yet
Development

No branches or pull requests

6 participants