ArrayViews (A systematic array view framework) #5556

lindahua · 2014-01-26T18:24:39Z

Efficient array views have been a long standing issue in Julia. Recently, I decided to tackle this problem again, and thus made this Pull Request.

This PR is actually migrated from ArrayViews.jl, which I originally created to explore this.

An important feature of this approach is that it has two view types ContiguousView and StridedView. Each strided view is associated with a static number called contiguous rank, which can be used to determine (statically) the contiguousness of a subview thereof.

A contiguous view (with more compact representation and faster indexing) will be returned whenever the contiguousness of a view can be determined statically, thus substantially improving indexing efficiency in a lot of common situations.

Here is a brief summary of my benchmark results obtained so far:

Indexing over contiguous views are considerably faster than subarrays. For example, traversing view(a, :, i) is about 2x - 3x faster than traversing sub(a, :, i). The speed of traversing view(a, :, i) is comparable to traversing an ordinary array.
Linear indexing over contiguous views are remarkably faster than linear indexing over subarrays (up to 50x - 100x faster) for 2D, 3D and multidimensional cases.
Since view(a, :, i), view(a, i:j), view(a, i:j, k), view(:, i:j, k), view(:,:,k) and many more cases are statically detected as contiguous (thanks to the systematical approach), many practical use cases can benefit from the performance improvement over contiguous views.
For non-contiguous views, indexing performance is similar to subarrays. (For such cases, the codes for indexing are similar).
Constructor of views are much more light-weighted. A specific benchmark on view construction shows that for array views, it takes about 0.15 sec (on average) to construct 1 million views, while it takes 1.5 - 1.9 sec to construct the same number of sub-arrays. The overhead of view construction is substantially reduced, and therefore use cases that require constructing a lot of small subviews can also benefit from this.
Type stability is carefully maintained.

lindahua · 2014-01-26T19:52:03Z

Please review this. To me, this is ready to merge.

The rationale of the design is here: https://github.com/lindahua/ArrayViews.jl

malmaud · 2014-01-27T19:43:20Z

It looks like I can't take a view of a view using linear indexing. Not sure if that's intentional, but it surprised me:

x=randn(100,10)
y=view(x,3:5,3:5)
view(y,1:2)
ERROR: no method roffset(StridedView{Float64,2,1,Array{Float64,2}}, Range1{Int64})
 in view at arrayviews.jl:568

lindahua · 2014-01-27T22:24:49Z

@malmaud: view(view(x, 3:5, 3:5), a:b) is in a irregular pattern (the distance between adjacent elements vary)

lindahua · 2014-01-27T22:34:57Z

To be more specific, v1 = view(x, i1:i2, j1:j2) looks like following:

* * * *
* * * *
* * * *
. . . .

If you take view(v1, 1:6) it will look like

* * * . * *

Such a pattern cannot be represented as a strided view.

On the other hand, you can perfectly do view(view(x, 1:2:n), 1:3:m), because such a composition remains a strided view.

malmaud · 2014-01-27T23:39:54Z

Quite correct, thanks for the explanation. Perhaps a better error message would be in order. Great work, btw.

On Jan 27, 2014, at 5:35 PM, Dahua Lin notifications@github.com wrote:

To be more specific, v1 = view(x, i1:i2, j1:j2) looks like following:

. . . .
If you take view(v1, 1:6) it will look like

* * . * *
Such a pattern cannot be represented as a strided view.

—
Reply to this email directly or view it on GitHub.

malmaud · 2014-01-28T00:00:00Z

I noticed that out-of-bounds views result in #undefs instead of throwing an error. Is that by design?

julia> x=randn(2,2)
2x2 Array{Float64,2}:
 -1.76904  1.32337
  0.2108   1.11225

julia> view(x,:,1:3)
2x3 ContiguousView{Float64,2,Array{Float64,2}}:
 -1.76904  1.32337  #undef
  0.2108   1.11225  #undef

lindahua · 2014-01-28T00:16:34Z

I simply follow the behavior of sub that is currently in Julia base:

julia> x = randn(2,2)
2x2 Array{Float64,2}:
  0.678922  0.953143
 -0.112102  0.174408

julia> sub(x, :, 1:3)
2x3 SubArray{Float64,2,Array{Float64,2},(Range1{Int64},Range1{Int64})}:
  0.678922  0.953143  #undef
 -0.112102  0.174408  #undef

I am open to implement different behaviors if people agree that other behaviors are more desirable.

lindahua · 2014-01-28T00:20:12Z

Whereas I am using a quite different internal implementation & representation, I try to follow what sub does in terms of interface & semantics.

timholy · 2014-01-28T00:41:45Z

#4044

lindahua · 2014-01-28T00:44:40Z

I want to note that despite the large size of this PR, it is actually non-breaking. The function view and view types can perfectly coexist with sub and subarrays.

In my mind, the path forward:

People may play with this PR through the ArrayViews package. They are nearly the same.
Merge this PR when it has been properly reviewed, and encourage people to start using view instead of sub. Most codes should still be working simply replacing sub with view.
Deprecate sub and subarrays.

cc: @StefanKarpinski @ViralBShah @timholy

timholy · 2014-01-28T23:18:32Z

I've looked at this briefly. I think it's a clever approach: come up with a type that works efficiently for special cases (that are common in practice) by placing more smarts in the constructor. In many practical applications, this is clearly an effective approach.

Am I right in understanding that M only plays a role when it is equal to N?

Interestingly, your "2D View with contiguous rank 0" case could have efficient linear indexing: because the number of rows is even, this example is equivalent to a single linear array of memory where you keep every other element. So there may be ways to extend this line of reasoning even further, albeit only in special cases (e.g., if that array had an odd number of rows, it couldn't be efficiently linearly-indexed).

As far as eventually deprecating SubArray goes, there's at least one major problem: currently this only works for views of Arrays. I have real-world uses for SubArrays of Images of InterpolatingGrids of Arrays, so the composition of types and the nesting of parents can get pretty complicated. The advantages of your approach obviously can't be guaranteed if you don't narrowly specify the behavior of the parent object, so I completely understand why you restrict it to views of Arrays. But we do need a mechanism for making views of arbitrary AbstractArrays. (This is why I'm so focused on moving to Cartesian indexing for everything, it's really the only approach I can see that yields efficient performance in such complex cases.)

lindahua · 2014-01-28T23:54:33Z

@timholy:

M plays an important role in view composition and can help to determine whether a subview of a view remains contiguous. Consider the following example, if v is a view of contiguous rank M = 1 (with N = 2), then v(:, i) is a contiguous view, and if v is a view with M = 0, then v(:, i) is not contiguous.
The 2D view case with contiguous rank 0 do have an efficient linear indexing and one can actually obtain a linear view over it. But it is very difficult to do it in a type stable way.
composition of strided views over an array is still a strided view over an array -- we only need to re-compute offsets and strides. So this can be done very efficiently.

For more complicated cases that you mention (where the source is no longer a strided array), I agree SubArray is still useful (or maybe we may call it GenericView that holds input indexers instead of strides).

tknopp · 2014-02-24T13:51:03Z

Is this going to be 0.3 stuff? If yes this might better merged sooner than later to get more testing. It would be great if getindex could be changed to use the view functions so that all code path can benefit from this work.

lindahua · 2014-02-24T14:02:42Z

I think this is ready for review.

porterjamesj · 2014-02-24T19:07:40Z

+1, I would love to see this merged and have getindex use it.

JeffBezanson · 2014-02-24T19:34:53Z

This is really awesome. We had decided not to make array views the default in 0.3, but this PR is starting to make me change my mind.

porterjamesj · 2014-02-24T19:45:27Z

I think if it all possible it would be good to get it in. As @johnmyleswhite pointed out in another issue (I think it was the 0.3 umbrella issue), it's among the biggest breaking changes currently planned.

tknopp · 2014-02-24T20:06:51Z

@JeffBezanson: I can absolutely understand if you want to freeze the features at some point to get 0.3 out. But IMHO this is a really important design change that should happen rather sooner than later. + the code is of high quality code.

I have ported some code from Numpy yesterday and it was 5 times slower because getindex doing copies when accessing matrix columns. Clever devectorization brought me back on track but this is more a workaround.

timholy · 2014-02-24T20:12:32Z

@lindahua, it's been a while since I looked at this. I remember that column-slices are contiguous and therefore fast, but what's the status of row-slices?

JeffBezanson · 2014-02-24T20:15:24Z

@tknopp Yes, it is clear that this implementation is very high quality, which makes me seriously consider adding it to 0.3. @StefanKarpinski @ViralBShah

JeffBezanson · 2014-02-24T20:29:01Z

@tknopp Could you try your code with view using this branch and see what performance you get?

johnmyleswhite · 2014-02-24T21:08:37Z

+1 for merging this sooner than later

lindahua · 2014-02-24T21:20:50Z

@timholy The code try to detect most contiguous cases whenever it is possible to do so statically. For cases like view(a, i, :), it is generally not contiguous, and the function will return a strided view of size (1, size(a, 2)). For such views, cartesian indexing is still quite efficient. It would be considerably less efficient to do linear indexing on non-contiguous strided views.

However, it is completely possible and not too difficult to introduce a specific RowView type for the special case view(a, i, :) to support efficient linear indexing. I am a little bit hesitant as it encourages of row-wise operations that are not very cache-friendliness.

ViralBShah · 2014-05-05T10:20:52Z

Copy on write is certainly an interesting idea to consider.

StefanKarpinski · 2014-05-05T23:39:55Z

I actually think that view semantics are often quite useful as a way to modify the original array via a view.

On May 5, 2014, at 6:20 AM, "Viral B. Shah" notifications@github.com wrote:

Copy on write is certainly an interesting idea to consider.

—
Reply to this email directly or view it on GitHub.

JeffBezanson · 2014-05-06T00:40:28Z

Copy on write is problematic since it means an array's data pointer can
change unpredictably and need to be reloaded.
On May 5, 2014 7:39 PM, "Stefan Karpinski" notifications@github.com wrote:

I actually think that view semantics are often quite useful as a way to
modify the original array via a view.

On May 5, 2014, at 6:20 AM, "Viral B. Shah" notifications@github.com
wrote:

Copy on write is certainly an interesting idea to consider.

—
Reply to this email directly or view it on GitHub.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/5556#issuecomment-42253727
.

lindahua · 2014-05-07T22:29:05Z

I agree with @JeffBezanson. IMHO, copy-on-write introduces more problems than it solves.

simonster · 2014-05-08T03:35:48Z

Maybe we could implement "copy on maybe write" semantics by determining if a given function could change an array's data pointer (i.e., if it might call arrayset or a function that calls arrayset), and if so, making the copy before any such functions are called, so that the data pointer is guaranteed to be constant for subsequent accesses? Or is this crazy talk?

ihnorton · 2014-05-08T03:38:36Z

"here's my pointer, so copy me maybe"?

jiahao · 2014-05-08T03:45:39Z

I get dereference.

jakebolewski · 2014-05-08T03:48:07Z

@carlyraejepsen should weigh in

johnmyleswhite · 2014-05-08T05:11:47Z

I get dereference.

I nominate this as Jiahao's best pun yet.

StefanKarpinski · 2014-05-08T07:21:14Z

@simonster – with dynamic dispatch, it's amazingly hard to know if arrayset will get called or not. Even worse, whether the compiler thinks it will or not could depend on subtle changes to type inference. Personally, I think it's crazy for something as major as copying an array to happen or not based on something that's implicit and almost impossible to predict. That's my feeling about copy-on-write in general, not just your proposal.

simonster · 2014-05-08T14:42:36Z

I guess it is difficult to determine which functions could touch which array because of aliasing: multiple arguments to the function could be or contain the same array. But in a potentially non-existent world where it could be implemented without slowing down array operations, I don't think copy-on-write would be such a bad idea. Subtle changes to type inference can already lead to very large changes in performance.

lindahua · 2014-05-12T13:40:15Z

I still think that allowing users to explicitly articulate whether they want to make a copy or just creating a shallow view is the clearest way ahead.

In a lot of practical applications, I just want a view to write to that part of array.

tknopp · 2014-05-12T15:10:34Z

In my mind it would be quite natural if array indexing would return views. If we currently think of the following Julia code:

julia> a=zeros(4)
julia> a[1:3] = 2
julia> a
10-element Array{Float64,1}
 2.0
 2.0
 2.0
 0.0

it already behaves like a view when the array is an lvalue. This is the only sensible thing in that situation but still it does something different then when used as an rvalue currently.

mcg1969 · 2014-08-13T21:24:48Z

This is great stuff. I've been working on a custom implementation of a strided indexer for my abstract array class, and this kind of thing might render it obsolete. (And I like that.)

One thing I ask, though: what about the various functions that can (at least, sometimes) manipulate views in an efficient manner? I'm specifically thinking of transpose/ctranspose, rot180, rotr90, rotl90, diag, reverse, flipdim, vec, slicedim, reshape. It seems to me that they should be implemented.

I'm happy to share my work if it will help facilitate. I've actually implemented this as a sort of mutlidimensional range class: that is, it's an AbstractArray{Int} of indices whose elements are computed, not stored. The goal is to preserve this compact, computed representation for any manipulation of the array that allows it. This lets me play with it in the REPL, and once I'm happy with the results, I can turn it into the indexer for my strided array class.

quinnj · 2014-11-22T05:13:27Z

Close since #8501 was merged?

lindahua · 2014-11-22T05:50:39Z

Yes, this can be closed

lindahua added 4 commits January 26, 2014 09:53

Initial migration from ArrayViews

022861f

complete migration of sources from ArrayViews

2b6c192

add tests for array view types

2ace5c7

add tests for subviews

5941521

lindahua mentioned this pull request Jan 26, 2014

RFC: A systematic array view framework #5513

Closed

timholy mentioned this pull request Jan 27, 2014

slow SubMatrix copy #3098

Closed

Merge branch 'master' into dh/arrayviews

09928fb

mschauer mentioned this pull request May 8, 2014

Do we want fixed-size arrays? #5857

Closed

nalimilan mentioned this pull request May 27, 2014

Slicing an array creates a copy instead of a reference #6988

Closed

lindahua added this to the 0.4 milestone Jul 30, 2014

jiahao mentioned this pull request Aug 6, 2014

Inconsistency in assignment: Numbers vs. Arrays #7861

Closed

timholy mentioned this pull request Aug 10, 2014

Towards array nirvana #7941

Closed

15 tasks

quinnj mentioned this pull request Aug 21, 2014

range indexing should produce a subarray, not a copy #3701

Closed

andreasnoack mentioned this pull request Aug 28, 2014

Transfer ArrayViews from package to Base #8176

Closed

timholy mentioned this pull request Sep 4, 2014

WIP: add views founded on cartesian indexing #8235

Closed

jiahao force-pushed the master branch 3 times, most recently from 6c7c7e3 to 1a4c02f Compare October 11, 2014 22:06

jiahao force-pushed the master branch from cdde4df to 7fdc860 Compare October 28, 2014 04:20

MikeInnes force-pushed the master branch from 5c60996 to b1c3df3 Compare November 14, 2014 17:07

lindahua closed this Nov 22, 2014

ArrayViews (A systematic array view framework) #5556

ArrayViews (A systematic array view framework) #5556

Conversation

lindahua commented Jan 26, 2014

lindahua commented Jan 26, 2014

malmaud commented Jan 27, 2014

lindahua commented Jan 27, 2014

lindahua commented Jan 27, 2014

malmaud commented Jan 27, 2014

malmaud commented Jan 28, 2014

lindahua commented Jan 28, 2014

lindahua commented Jan 28, 2014

timholy commented Jan 28, 2014

lindahua commented Jan 28, 2014

timholy commented Jan 28, 2014

lindahua commented Jan 28, 2014

tknopp commented Feb 24, 2014

lindahua commented Feb 24, 2014

porterjamesj commented Feb 24, 2014

JeffBezanson commented Feb 24, 2014

porterjamesj commented Feb 24, 2014

tknopp commented Feb 24, 2014

timholy commented Feb 24, 2014

JeffBezanson commented Feb 24, 2014

JeffBezanson commented Feb 24, 2014

johnmyleswhite commented Feb 24, 2014

lindahua commented Feb 24, 2014

ViralBShah commented May 5, 2014

StefanKarpinski commented May 5, 2014

JeffBezanson commented May 6, 2014

lindahua commented May 7, 2014

simonster commented May 8, 2014

ihnorton commented May 8, 2014

jiahao commented May 8, 2014

jakebolewski commented May 8, 2014

johnmyleswhite commented May 8, 2014

StefanKarpinski commented May 8, 2014

simonster commented May 8, 2014

lindahua commented May 12, 2014

tknopp commented May 12, 2014

mcg1969 commented Aug 13, 2014

quinnj commented Nov 22, 2014

lindahua commented Nov 22, 2014