Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-working SubArray #3496

Closed
lindahua opened this issue Jun 22, 2013 · 12 comments
Closed

Re-working SubArray #3496

lindahua opened this issue Jun 22, 2013 · 12 comments
Labels
performance Must go faster

Comments

@lindahua
Copy link
Contributor

I created this issue to move the discussion in #3224 over here.


I wrote a benchmark script on Gist to compare the performances of different approaches:

Here is what I get:

Benchmark with long columns
with_ref!:  slow-down = 11.1970x     # use a[:,i]
with_sub!:  slow-down = 114.4806x    # use sub(a, 1:m, i)
with_sub1!: slow-down = 1.1974x      # use pointer_to_array
with_sub2!: slow-down = 2.5286x      # use immutable sub-vector

Benchmark with short columns
with_ref!:  slow-down = 261.4301x
with_sub!:  slow-down = 1256.6697x
with_sub1!: slow-down = 38.8170x
with_sub2!: slow-down = 19.0293x

We can observe:

  • The current version of sub, which creates SubArray is way too slow -- the overhead and indexing cost just dominate.
  • Using a[:, i] which makes copies also kills performance
  • Both immutable sub-vector and using pointer_to_array to create a shared-memory view work much better. That said, both lead to significant overhead when the columns are short.
  • Overall, the performance of creating a shared-memory array using pointer is better -- after all, you simply get the performance of builtin array. The limitation is that this applies only to contiguous blocks.

I may work on this as the first step towards better performance of Julia. But we need some discussion to settle on a approach first.

A problem with immutable is that methods defined on them do not get properly inlined, resulting in performance hit.

@lindahua
Copy link
Contributor Author

Here is a tentative proposal,

import Base.getindex

abstract AbstractSubArray{T,N,A<:AbstractArray} <: AbstractArray{T,N}

# for contiguous block

immutable ContiguousSubArray{T,N,A<:Array} <: AbstractSubArray{T,N,A}
    parent::A
    dims::NTuple{N,Int}
    offset::Int
end

getindex(a::ContiguousSubArray, i::Int) = a.parent[a.offset + i]
getindex(a::ContiguousSubArray, i::Int, j::Int) = a.parent[a.offset + i + (j - 1) * a.dims[1]]

# for strided block

immutable StridedSubVector{T,A<:Array} <: AbstractSubArray{T,1,A}
    parent::A
    dim1::Int
    offset1::Int
    stride1::Int
end

getindex(a::StridedSubVector, i::Int) = a.parent[a.offset1 + (i - 1) * a.stride1]

immutable StridedSubMatrix{T,A<:Array} <: AbstractSubArray{T,2,A}
    parent::A
    dim1::Int
    dim2::Int
    offset1::Int
    offset2::Int
    stride1::Int
    stride2::Int
end

immutable MultiDimStridedSubArray{T,N,A<:Array} <: AbstractSubArray{T,N,A}
    parent::A
    dims::NTuple{N,Int}
    offsets::NTuple{N,Int}
    strides::NTuple{N,Int}
end

# more generic case where the parent is not an instance of Array

immutable GenericSubArray{T,N,A<:AbstractArray,I<:(RangeIndex...,)} <: AbstractSubArray{T,N,A}
    parent::A
    shape::NTuple{N,Int}
    dims::I    
    strides::NTuple{N,Int}
end

# this allows to define functions that work consistently over arrays with contiguous memory
typealias ContiguousArray{T,N} Union(Array{T,N}, ContiguousSubArray{T})
typealias StridedArray{T,N} Union(Array{T,N}, SubArray{T,N})


# specialized methods to create sub-array
# we need a bunch of these & probably bound checking

function sub{T}(a::Array{T}, ::Colon, j::Int)
    m = size(a, 1)
    ContiguousSubArray{T,1,Array{T}}(a, (m,), m * (j-1))
end

function sub{T}(a::Array{T}, rgn::Range1{Int}, j::Int)
    m = size(a, 1)
    ContiguousSubArray{T,1,Array{T}}(a, (length(rgn),), (rgn[1]-1) + m * (j-1)) 
end

function sub{T}(a::Array{T}, rgn::Range{Int}, j::Int)
    m = size(a, 1)
    StridedSubVector{T,Array{T}}(a, length(rgn), (rgn[1]-1) + m * (j-1), step(rgn))
end

Of course, this is just a small part. But, hopefully, you get the basic idea. If the compiler can get those getindex methods inlined, this can be just as efficient as handcrafted loops. Otherwise, I expect there is about 2x - 3x slow-down, which is still much better than the current SubArray.

If we agree on this general direction, I may then make a pull request and work on it soon.

cc: @ViralBShah @StefanKarpinski @JeffBezanson @timholy

@timholy
Copy link
Member

timholy commented Jun 22, 2013

I haven't looked this over in detail except to note that this version is at least memory-safe, even if it forces construction. In the long run this may be resolvable, but what I like about this is that it gives us a path forward that won't have to be rewritten in the long term.

@lindahua
Copy link
Contributor Author

An improved version that achieves both good indexing performance & safety:

immutable SubVec{T, A<:Array} <: AbstractArray{T, 1}
    a::A
    ptr::Ptr{T}
    len::Int

    SubVec(a::A, offset::Int, len::Int) = new(a, pointer(a, offset+1), len)
end

getindex(s::SubVec, i::Int) = unsafe_load(s.ptr, i)  
setindex!(s::SubVec, v, i::Int) = unsafe_store!(s.ptr, v, i)
length(s::SubVec) = s.len

function sub{T}(a::Array{T}, ::Colon, j::Int)
    m = size(a, 1)
    SubVec{T,Array{T}}(a, (j - 1) * m, m)
end

Benchmark shows that this works as fast as hand-crafted loops. This version incorporates both the parent and an offseted pointer in the immutable:

  • Incorporating the parent makes memory-safe
  • Using the pointer in indexing gets getindex and setindex! inlined

@timholy
Copy link
Member

timholy commented Jun 22, 2013

Very cool! I do think it's probably worth having a dedicated 1d type, since getting good performance is so much easier and it's a very common case.

@lindahua
Copy link
Contributor Author

Not only 1D vectors, any sub-array that is contiguous can in principle get such performance.

However, I just find another issue -- any attempt to do bound checking of getindex and setindex! just kills performance, leading to about 4x - 5x slow down.

@timholy
Copy link
Member

timholy commented Jun 22, 2013

For regular arrays, Jeff does this in the compiler (arrayref, see the check in cgutils.cpp:370). Presumably we'd want this, too.

@timholy
Copy link
Member

timholy commented Jun 22, 2013

Presumably, it's an inlining issue? (I usually check this by the profiler---you can tell by the nesting depth---but if you can read the disassembly it's more direct.)

@ViralBShah
Copy link
Member

I like this approach, and it is much better thought out than what I was thinking of. Let's certainly replace the sub-array implementation with a better one.

On getindex(Array, I, J), I am not sure how to proceed. I mainly worry about returning a view (in cases where the region is contiguous, or at least strided), vs. returning a new array in cases where I and J are arbitrary vectors. In order to keep things consistent, and the communication from the implementer to the user simple, it may be best to let getindex keep returning a copy. SubArray already returns views and we can make that higher performance with the ideas in here.

@lindahua Perhaps you should also try disabling the bounds check in the yet to be merged #3268

@timholy
Copy link
Member

timholy commented Jun 23, 2013

We can also implement a non-strided view. That would solve your consistency concern.

@lindahua
Copy link
Contributor Author

Even when I and J are arbitrary things (e.g. integer vector), we can still make a view by maintaining references to I and J.

So we will have a variety of subarrays, contiguous, strided, and more generic one that simply maintain references to given indexers.

Let's first focus on rewriting sub function to make it always return a view of some kind and just keep the copying behavior of a[I, J] for now.

@ViralBShah
Copy link
Member

The thing about non-strided views is that it will have extra indexing overhead, and will be slower. The default can always be views, and the user has to make copies explicitly in some cases.

I agree that we should make sub work fast to start with.

@timholy
Copy link
Member

timholy commented Jan 3, 2015

I suspect this can be closed now.

@timholy timholy closed this as completed Jan 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

3 participants