-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should reshaping a SubArray produce another SubArray? #9874
Comments
If that's how we want them to work, a rudimentary version is basically all ready-to-go, because our new SubArrays support a julia> A = reshape(1:16, 4, 4)
4x4 Array{Int64,2}:
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
julia> B = sub(A, 2:4, 3:4)
3x2 SubArray{Int64,2,Array{Int64,2},(UnitRange{Int64},UnitRange{Int64}),1}:
10 14
11 15
12 16
julia> b = sub(A, [10,11,12,14,15,16])
6-element SubArray{Int64,1,Array{Int64,2},(Array{Int64,1},),0}:
10
11
12
14
15
16
julia> b[4] = 0
0
julia> A
4x4 Array{Int64,2}:
1 5 9 13
2 6 10 0
3 7 11 15
4 8 12 16 One big downside, though, is that you have to allocate a full-length vector just to express the "unwrapped" 2d indexing. It seems possible one could do better with some kind of smart multidimensional Range object, but I haven't given this any serious design thought yet. |
That is a really helpful start, @timholy. But often we might want to reshape upwards by adding dimensions. So we want to go from a 1D structure to a 2D structure. That seems to require a call to |
Actually, I'm pretty confused by the |
You can currently only do that in a very limited way:
Notice I was deliberately omitting 13, which you can't do with the range. |
But adding dimensions could be implemented more generally with smart multidimensional range objects. |
Thanks for clarifying the index point. When you say multidimensional range objects, are you thinking of something like the punned meaning of |
Ideally, in terms of my example above we'd have an indexing object The problem is, I don't really know how to realistically make such objects work without calling Going the opposite direction in a performant manner is much easier. Suppose you have a vector and you want to reshape it into a matrix: a = 1:15
B = sub(a, Index(1:5,1:3)) # this would become the implementation of `reshape(a, 5, 3)` Then More details at http://docs.julialang.org/en/latest/devdocs/subarrays/, especially the first two sections. |
The reshape docs say "an implementation for a particular type of array may choose whether the data is copied or shared." To me this implies that We could also possibly have a |
I think this makes For context on how these issues are coming up, I've been working with some folks that are new to Julia and are accustomed to building their entire infrastructure around higher-order tensors. They routinely takes slices (possibly with subsequent reshaping) of a linear packing of many distinct arrays. The linear packing is used because they send the entire packed data store into a black-box optimization suite that operates on vectors, but they simultaneously want to articulate the objective function they're optimizing in terms of many different objects -- all of which point, via SubArray-like objects, into the memory store of the grand linear packing. So they need to make sure that every operation they might perform can be done without memory copying. |
I agree with John. I'd rather people choose do an explicit action (like copying, or calling different function that copies) when they need performance, rather than forcing them to reason about whether the data is going to be copied or not. The fact that in Julia arrays sometimes share data is already relatively complex to handle (though incredibly powerful of course). |
I'm in agreement on this too – "Schrödinger's copy" is no good. |
I'd like to keep tabs on this discussion but have nothing to add. And I don't know how to do so in github without posting a comment. Sighs. Sorry for the noise. |
@milktrader, there's a notifications button the right-hand side that will let you do that. It's not very conspicuous, but it's near the bottom of the right-hand side panel. |
Relevant discussion of faster ways to do |
@johnmyleswhite ah, after all these years. Thanks! |
Oh wait, I guess I was subscribed. I suppose I was looking for the participating flag. No way to do that that I can see. (goes back to his desk) |
I posted a rough draft of a solution in this gist. The basic idea is that we create new view type, a julia> using Reshape
julia> A = myreshape(1:15, (3, 5))
3x5 Reshape.ReshapeArray{Int64,2,UnitRange{Int64},(Reshape.IndexMD{1,2},)}:
1 4 7 10 13
2 5 8 11 14
3 6 9 12 15
julia> AA = copy(A)
3x5 Array{Int64,2}:
1 4 7 10 13
2 5 8 11 14
3 6 9 12 15
julia> B = myreshape(AA, (15,))
15-element Reshape.ReshapeArray{Int64,1,Array{Int64,2},(Reshape.IndexMD{2,1},)}:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
julia> BB = myreshape(A, (15,))
15-element Reshape.ReshapeArray{Int64,1,UnitRange{Int64},(Colon,)}:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
julia> C = reshape(1:16, (2,4,2))
2x4x2 Array{Int64,3}:
[:, :, 1] =
1 3 5 7
2 4 6 8
[:, :, 2] =
9 11 13 15
10 12 14 16
julia> CR = myreshape(C, (4,4)) # note this doesn't split along the dimensions of C
4x4 Reshape.ReshapeArray{Int64,2,Array{Int64,3},(Reshape.IndexMD{3,2},)}:
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
julia> V = sub(CR, 2:3,2:4)
2x3 SubArray{Int64,2,Reshape.ReshapeArray{Int64,2,Array{Int64,3},(Reshape.IndexMD{3,2},)},(UnitRange{Int64},UnitRange{Int64}),0}:
6 10 14
7 11 15
julia> myreshape(V, (3,2))
3x2 Reshape.ReshapeArray{Int64,2,SubArray{Int64,2,Reshape.ReshapeArray{Int64,2,Array{Int64,3},(Reshape.IndexMD{3,2},)},(UnitRange{Int64},UnitRange{Int64}),0},(Reshape.IndexMD{2,2},)}:
6 11
7 14
10 15 As the last example illustrates, the problem is that we'll get a The advantage of this approach is that we can guarantee that |
I guess the main point is: what are people's thoughts on this design? EDIT: I've done nothing to optimize it yet, and there are surely still bugs, but at least the general principle should be clear. |
This certainly resolves one of my main concerns about Arrays. Thanks for working on this! |
It indeed sounds better to always return a view, possibly at the cost of performance in very complex cases. People should always be able to call |
@timholy - I think you're not getting many comments here because it seems so sensible. I think this is a great direction. I think it'd be even better if it were performant enough to handle dense array reshapes, too (which is currently handled in the C array implementation). This way there'd be user-visible information about the view-like nature of the reshaped array. Once we move to using views, an open question is if |
Thanks to all who commented. I'll continue with this direction and try to get something merged; I may not get to it for a couple of days, though. |
Good question. In fact, if |
Couldn't the magic numbers for speeding up |
I've thought about that. I suppose the issue will come down to how long they take to compute, and I haven't yet played with that code at all. There are folks who have tried using SubArrays to extract the columns of a 2xN matrix, and even with our new-and-improved SubArrays it doesn't work terribly well. (Presumably once tuples can be elided, it will be much better.) |
That's indeed important to consider. Is that construction overhead that you are talking about in the example of the columns? Maybe a lightweight On a more general note, it is probably also not worth the effort trying to write the most general code that can provide reasonable efficiency for all possible combinations of indexing and slicing, reshaping and permuting for all possible parent array types, when people in the end just want great efficiency for strided views over an In that respect, I am not quite sure whether |
Well, the |
A lot of it comes down to having to save a immutable ColVector{T,M<:AbstractMatrix} <: AbstractVector{T}
parent::M
col::Int
end
ColVector(A::AbstractMatrix, col::Integer) = ColVector{eltype(A), typeof(A)}(A, Int(col))
getindex(v::ColVector, i::Int) = v.parent[i,v.col]
size(v::ColVector) = (size(v.parent, 1),)
size(v::ColVector, d) = d == 1 ? size(v.parent, 1) : 1 Even this has its negatives; I think @mlubin noticed that even storing a reference to the parent array means this can't be elided, and so he was storing a pointer instead (which means you need to manually maintain a reference). |
This latter issue is problem for lots of other things, including Is there any hope we might improve on this at some point? |
Dup of #4211? |
Possibly? It looks very similar and is possibly subsumed by #4211. |
Closing as dup. |
Right now, calling
reshape
on a SubArray allocates memory, which means that reshaping a slice causes you to lose your view into the original data. See the example below for the kinds of situations you might find yourself in.The text was updated successfully, but these errors were encountered: