slow SubMatrix copy #3098

bsxfan · 2013-05-14T10:45:50Z

For comparison to copy!, define:

loopcopy!(D,S) = (for i=1:length(D);D[i]=S[i];end;D)

Create regular and strided source and destination matrices:

S = randn(1000,1000); 
Ssub = sub(randn(2000,1000),1:2:2000,1:1000);
D = zeros(1000,1000); 
Dsub = sub(zeros(2000,1000),1:2:2000,1:1000);

Time loopcopy! vs copy!. Timings below are for 32-bit windows, but behaviour on 64-bit linux is similar:

julia> sum([@elapsed copy!(D,S) for i=1:100])
0.452816014
julia> sum([@elapsed loopcopy!(D,S) for i=1:100])
0.4669809420000002

OK, very close, but for SubMatrix source:

julia> sum([@elapsed loopcopy!(D,Ssub) for i=1:100])
1.8858308039999998
julia> sum([@elapsed copy!(D,Ssub) for i=1:100])
10.769275100000002

copy! is much slower. Similarly:

julia> sum([@elapsed loopcopy!(Dsub,Ssub) for i=1:100])
3.675649855999999
julia> sum([@elapsed copy!(Dsub,Ssub) for i=1:100])
12.968587926999998

Counterexample:

julia> sum([@elapsed loopcopy!(Dsub,S) for i=1:100])
2.3974653309999994
julia> sum([@elapsed copy!(Dsub,S) for i=1:100])
2.401044135

The text was updated successfully, but these errors were encountered:

timholy · 2013-05-14T13:47:31Z

In the case of arrays, on my 64-bit linux (x86_64) I get slightly better performance from copy! than loopcopy! (0.25s vs 0.31s). But by-and-large I replicate your findings.

The problem seems to come from abstractarray.jl:copy!'s use of the construct for x in src. If I replace that with for j = 1:length(src)... then they work out the same.

I'll also note that one can do vastly better with subarrays:

parent(S::SubArray) = S.parent
parent(A::Array) = A

function fastcopy!{T}(pdest::Matrix{T}, sdest::NTuple{2,Int}, psrc::Matrix{T}, ssrc::NTuple{2,Int}, sz::NTuple{2,Int})
    m = sz[1]
    sdest1 = sdest[1]
    ssrc1 = ssrc[1]
    for j = 0:sz[2]-1
        kdest = j*sdest[2]+1
        ksrc = j*ssrc[2]+1
        for i = 0:m-1
            idest = kdest+i*sdest1
            isrc = ksrc+i*ssrc1
            pdest[idest] = psrc[isrc]
        end
    end
end

function fastcopy!{T}(dest::StridedMatrix{T}, src::StridedMatrix{T})
    @assert size(dest) == size(src)
    sdest = strides(dest)
    ssrc = strides(src)
    pdest = parent(dest)
    psrc = parent(src)
    fastcopy!(pdest, sdest, psrc, ssrc, size(dest))
    dest
end

Test:

julia> sum([@elapsed loopcopy!(D,Ssub) for i=1:100])
3.1146607700000004

julia> sum([@elapsed fastcopy!(D,Ssub) for i=1:100])
0.4422959499999998

Images.jl uses some of these tricks, although I haven't yet profiled to see if they are working as expected.
https://github.com/timholy/Images.jl/blob/master/src/algorithms.jl#L117

timholy · 2013-05-14T14:00:47Z

(Basically, subarrays need the kind of love that I gave array.jl long ago. A variant of make_arrayind_loop_nest might need to be written to handle two arrays.)

ViralBShah · 2013-05-14T14:32:21Z

On a slightly orthogonal but not completely unrelated note, there has been the proposal of making array indexing return views. That has benefits of greatly reducing copying and GC, and also removes the need to use subarrays in many cases. Now that we can do stuff like pointer_to_array(pointer(a, m), dims), it should be pretty easy to get high performance array views.

timholy · 2013-05-14T14:40:38Z

Definitely related. I even started working on something like that within the first few days of the merging of immutable types, but got bit by challenges stemming from the need for better inlining. I haven't looked recently.

toivoh · 2013-05-14T18:38:04Z

@ViralBShah: I thought that those views would be subarrays themselves? In my view, producing array views will increase the need to be able to deal with subarrays/strided arrays.

Also, it should be straightforward to get machinery such as what I propose for fast bsxfun in #3100 to create loops like @timholy's fastcopy above, on demand.

bsxfan · 2013-05-15T14:49:24Z

Thanks @timholy for your nice fastcopy!() example. That clears up some mysteries for me. But yes, you are right, SubArray could definitely benefit from some more attention. I've just logged two more issues involving subarray. See #3114 and JuliaLang/LinearAlgebra.jl#14.

poulson · 2014-01-27T19:28:44Z

Have there been any updates on the creation of submatrix views? I am experimenting with using Julia in a linear algebra course that I'm teaching and would find it very helpful for providing high-performance code samples for factorizations.

timholy · 2014-01-27T19:46:54Z

Gimmee a couple more days, and a faster version of copy! should land. I've just been swamped with other things.

Meanwhile @lindahua has submitted the latest proposal for a faster ArrayView (#5556), which I also haven't had time to look at yet but which sounds promising.

ViralBShah · 2014-11-23T20:30:59Z

Can we close this with the new SubArrays, or should we wait for the updates to getindex and such?

timholy · 2014-11-23T20:41:00Z

We've actually had a fast copy! for months (shortly after Base.Cartesian landed).

toivoh mentioned this issue May 15, 2013

Staged bsxfun and other broadcast operations #3100

Merged

timholy closed this as completed Nov 23, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slow SubMatrix copy #3098

slow SubMatrix copy #3098

bsxfan commented May 14, 2013

timholy commented May 14, 2013

timholy commented May 14, 2013

ViralBShah commented May 14, 2013

timholy commented May 14, 2013

toivoh commented May 14, 2013

bsxfan commented May 15, 2013

poulson commented Jan 27, 2014

timholy commented Jan 27, 2014

ViralBShah commented Nov 23, 2014

timholy commented Nov 23, 2014

slow SubMatrix copy #3098

slow SubMatrix copy #3098

Comments

bsxfan commented May 14, 2013

timholy commented May 14, 2013

timholy commented May 14, 2013

ViralBShah commented May 14, 2013

timholy commented May 14, 2013

toivoh commented May 14, 2013

bsxfan commented May 15, 2013

poulson commented Jan 27, 2014

timholy commented Jan 27, 2014

ViralBShah commented Nov 23, 2014

timholy commented Nov 23, 2014