Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate dispatch for dropping trailing 1s (fixes for non-1 based arrays) #20573

Merged
merged 2 commits into from
Feb 14, 2017

Conversation

timholy
Copy link
Member

@timholy timholy commented Feb 11, 2017

I know this is under consideration for deprecation, but until we pull the trigger...

CC @mbauman

@timholy timholy added the bugfix This change fixes an existing bug label Feb 11, 2017
_to_subscript_indices(A, J, Jrem)
end
function _to_subscript_indices(A::AbstractArray, J::Tuple, Jrem::Tuple{})
sz = _remaining_size(J, indices(A)) # compute trailing size (overlapping the final index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd bet you'll have to inline this.

@mbauman
Copy link
Member

mbauman commented Feb 11, 2017

Interesting. Sure, this seems sensible. But I think it's worth considering that the root cause here isn't tailing singleton dimensions… it's partial linear indexing. Do you really need separate functions here? Or will this just work if you s/size/indices/?

@timholy
Copy link
Member Author

timholy commented Feb 11, 2017

That was the first thing I tried. Triggers this.

@mbauman
Copy link
Member

mbauman commented Feb 11, 2017

Why do we have that method? Is it still needed?

@timholy
Copy link
Member Author

timholy commented Feb 11, 2017

I think it is. We support linear indexing on arrays with non-1 indices, but that's potentially really confusing for vectors. Is it an ind or is it a sub?

@mbauman
Copy link
Member

mbauman commented Feb 11, 2017

Yeah, I see your point. In this context it really feels like it should just work — I have one index and I need one index. Why not give it to me? But since you've made the interpretation of one index vastly different in multidimensional offset arrays, I can see how this is problematic. I still don't like it, but my objections are unrelated here. Perhaps at some point OffsetArrays could move to using a LinearIndex type.

@nanosoldier runbenchmarks(ALL, vs = ":master")

@mbauman
Copy link
Member

mbauman commented Feb 11, 2017

Or could you perhaps use a non-OneTo linearization strategy for all offset arrays? That seems more sensible to me, and then vectors won't have drastically different behavior. This is actually how partial linear indexing currently behaves with TestHelpers.OAs (albeit deprecated):

julia> A = OffsetArray(reshape(1:4*3*2,4,3,2), (-2,-3,-4))
TestHelpers.OAs.OffsetArray{Int64,3,Base.ReshapedArray{Int64,3,UnitRange{Int64},Tuple{}}} with indices -1:2×-2:0×-3:-2:
[:, :, -3] =
 1  5   9


[:, :, -2] =
 13  17  21


julia> A[-1, -1]
5

julia> A[-1, 0]
9

julia> A[-1, 1]
WARNING: Partial linear indexing is deprecated. Use `reshape(A, Val{2})` to make the dimensionality of the array match the number of indices.

13

@timholy
Copy link
Member Author

timholy commented Feb 12, 2017

Isn't that a bug? Seems A[-1, 1] should work like this:

julia> A[-1, ind2sub(Base.tail(indices(A)), 1)...]
1

The bounds-checking on 0.5 seems correct, but there the PLI is borked. Gack.

julia> A[-1, -1]
ERROR: BoundsError: attempt to access OffsetArrays.OffsetArray{Int64,3,Base.ReshapedArray{Int64,3,UnitRange{Int64},Tuple{}}} with indices -1:2×-2:0×-3:-2 at index [-1,-1]                                                 
 in throw_boundserror(::OffsetArrays.OffsetArray{Int64,3,Base.ReshapedArray{Int64,3,UnitRange{Int64},Tuple{}}}, ::Tuple{Int64,Int64}) at ./abstractarray.jl:363                                                            
 in checkbounds at ./abstractarray.jl:292 [inlined]                                                                                                                                                                        
 in _getindex at ./abstractarray.jl:793 [inlined]                                                                                                                                                                          
 in getindex(::OffsetArrays.OffsetArray{Int64,3,Base.ReshapedArray{Int64,3,UnitRange{Int64},Tuple{}}}, ::Int64, ::Int64) at ./abstractarray.jl:760

julia> A[-1, 0]
ERROR: BoundsError: attempt to access OffsetArrays.OffsetArray{Int64,3,Base.ReshapedArray{Int64,3,UnitRange{Int64},Tuple{}}} with indices -1:2×-2:0×-3:-2 at index [-1,0]
 in throw_boundserror(::OffsetArrays.OffsetArray{Int64,3,Base.ReshapedArray{Int64,3,UnitRange{Int64},Tuple{}}}, ::Tuple{Int64,Int64}) at ./abstractarray.jl:363
 in checkbounds at ./abstractarray.jl:292 [inlined]
 in _getindex at ./abstractarray.jl:793 [inlined]
 in getindex(::OffsetArrays.OffsetArray{Int64,3,Base.ReshapedArray{Int64,3,UnitRange{Int64},Tuple{}}}, ::Int64, ::Int64) at ./abstractarray.jl:760

julia> A[-1, 1]
13

Truly, I'd be happy to see the backside of PLI. This is really about trailing 1s.

@mbauman
Copy link
Member

mbauman commented Feb 12, 2017

Even worse, the behaviors are different for LinearFast and LinearSlow — the latter currently errors on master. On this branch LinearSlow uses OneTo linearization whereas LinearFast uses offset linearization for partial indexing.

I know it's a huge change in meaning, but I really think life will be simpler if linear indices are offset by the first dimension they index into. They'd just linearly span multiple dimensions when they reach beyond the final index.

@timholy
Copy link
Member Author

timholy commented Feb 12, 2017

I hadn't paid enough attention to #20079. That's a reasonable strategy. But wow quite a behavior change for this case.

I'd say let's just make this #20079-compliant and call that good enough. OffsetArrays are probably still not very widely used, especially with partial linear indexing.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@timholy
Copy link
Member Author

timholy commented Feb 12, 2017

I think we have to change the definition of general linear indexing as well, so that it starts at first(indices(A,1)). Oh boy.

@timholy
Copy link
Member Author

timholy commented Feb 12, 2017

I do have one conceptual concern with this. #20079 makes sense from the standpoint of "continuing the numbering scheme" from the point where you stopped supplying indices. However, a different perspective on linear indexing is that it has little to do with "coordinates" or "location" (that's what indexing with the full tuple is about), and more to do with "offsets." From that perspective, #20079 seems to make less sense, particularly for cases where you index with a single integer.

@mbauman
Copy link
Member

mbauman commented Feb 12, 2017

My intention with #20079 is that after the deprecation period, indexing with fewer than N dimensions will be defined as A[i, j, map(first, indices(A)[3:end])...]. Yes, partial indexing has been and is currently confused for OffsetArrays, but I still think that #20079 is the correct way forward. When you index with a single integer, it's bounds are checked against linearindices(A) — this is a reasonable optimization to make even in the non-offset case. Could you expand on your concerns here? I think I'd just specialize Base.checkbounds_linear_indices or maybe even trailingsize within OffsetArrays for 0.6 to completely disable partial indexing when any of the remaining indices aren't OneTos. After 0.6, OffsetArrays will be fully compatible with partial indexing again.

I think it's worth noting that I tried to use ind2sub and sub2ind as consistently as possible in #19958. In fact, I had been thinking it'd be nice to eventually expand (and perhaps rename) those functions to completely subsume _to_subscript_indices and _to_linear_index. The extra dispatch cases that I added were places where I was working around offset array test failures (or in one case, avoiding a bounds check branch). It was annoying to work around, and it's precisely why I suggested changing the interpretation of a linear index (yes, including the one-index case).

@timholy
Copy link
Member Author

timholy commented Feb 12, 2017

Perhaps easiest is to show you where I've been going with this (see the "WIP" commit); this is a case where following my nose leads me to make changes that simplify the code (a very good sign, and one that should be taken seriously), but the downsides are that these are breaking changes and make me a little uneasy on one conceptual point.

The key changes are in abstractarray.jl; they are very few in number but profound in their consequences. The bottom line is that

A = OffsetArray(rand(3,3), -1:1, 0:2)

used to be linearly-indexed as A[1], ..., A[9] and now it's moving towards A[-1], ..., A[7]. This occurred basically because of the notion that ind2sub should be able to take just inds and ind, and work the same way no matter whether you've peeled off leading dimensions or not. Prior to #20079, the answer was that ind always started with 1, but the direction it's heading in now is to start with first(inds[1]).

My concern is whether this really makes sense. I think it is justifiable---you're "extending the first index to the rest of the array," and like I said above it also probably makes the code simpler overall. (In particular, the potential confusion over 1-d arrays goes away.) But I think a different notion---that a linear index is simply a different beast from other indices and that there's no good reason to "privilege" the first range in the indices-tuple--- has approximately equal merit. Suppose we had a LinearIndex index type: where would you start the numbering? When I developed this stuff for Julia 0.5, I went with the answer "1" (a highly-subjective preference) and then tried to make i::Int equivalent to LinearIndex(i).

Basically, I brought this up because I felt this PR was diverging farther and farther from the original intent, so before changing too many rules I wanted to see whether you think this direction is as inevitable as I do. My intention is to complete the #20079 transition, unless good arguments arise to the contrary, but I'm a little queasy about it.

@mbauman
Copy link
Member

mbauman commented Feb 12, 2017

Yes, I think removing the discrepancy between the one-index linear indexing of OffsetVectors and OffsetArrays is a very worthwhile thing to do. But I appreciate that it's a huge change, so that decision is up to you.

And, yes, I think that we should continue with the deprecation of partial linear indexing, moving towards a definition where we fill the missing dimensions with the first index of each. Given that partial indexing of offset arrays has never worked in a consistent manner, I think it's totally justifiable to make it an error for OffsetArrays until we fully embrace the new semantics and get it working right.

@mbauman
Copy link
Member

mbauman commented Feb 13, 2017

I've been mulling over the meaning of a linear index a bit more. I agree that there's a sense of loss that the linear index i no longer refers to the ith consecutive element in the array. But at the same time, I think that there's a greater value in making the indexing behaviors more consistent with normal arrays… and in this case I'm pretty sure that means that it simply reaches beyond its bounds contiguously into the higher dimensions.

One place where we may gain insight is the operations A[:,:] and A[:] for an OffsetMatrix A, since the latter behavior may change depending upon the choice here. Perhaps you know this code too well to have an unbiased opinion… but given that A[:,:] returns an OffsetMatrix with the same indices as A, what would you expect A[:] to return? Personally, I think of A[:] as selecting all elements and unrolling the higher dimensions into a vector… so I think I might expect an OffsetVector, stacking all the higher dimensions onto the first one, and using the offset of the first dimension (but I'm now terribly biased by the implementation, too).

@timholy
Copy link
Member Author

timholy commented Feb 13, 2017

I'm going to put this temporarily on ice and see whether it's really so hard to deprecate PLI entirely while still keeping the trailing 1s behavior we have now. My main concern is that I have other deadlines too so it may not go quickly.

@mbauman
Copy link
Member

mbauman commented Feb 13, 2017

To be honest, offset partial indexing is so broken that I don't think we don't need to worry about PLI deprecation warnings for offset arrays. Nobody has complained on either 0.5 or master… I consider that to be a just-as-effective deprecation canary. Let's just try to get something sensible for 0.6.

I think a band-aid with your first commit here and a hacked-in error to prevent any and all partial indexing would be just fine.

@timholy
Copy link
Member Author

timholy commented Feb 13, 2017

@nanosoldier runbenchmarks(ALL, vs = ":master")

@mbauman, here's the least-intrusive version that doesn't change the meaning of "true" linear indices.

Copy link
Member

@mbauman mbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@timholy
Copy link
Member Author

timholy commented Feb 14, 2017

I can't replicate a single one of those apparent regressions, so I'm merging. Thanks for your help, @mbauman!

@timholy timholy merged commit 0d439eb into master Feb 14, 2017
@timholy timholy deleted the teh/remaining_size branch February 14, 2017 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants