Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SubArray bounds checking (or not): bug or feature? #4044

Closed
timholy opened this issue Aug 13, 2013 · 23 comments
Closed

SubArray bounds checking (or not): bug or feature? #4044

timholy opened this issue Aug 13, 2013 · 23 comments
Labels
bug Indicates an unexpected problem or unintended behavior needs decision A decision on this change is needed
Milestone

Comments

@timholy
Copy link
Member

timholy commented Aug 13, 2013

SubArrays currently do not check bounds upon construction:

julia> A = randn(4,8)
4x8 Float64 Array:
 -0.661697  0.150336   -0.979774   0.350783   1.97253   -0.255409   -0.464847   -0.785631
 -0.389022  0.0357723  -2.96341    0.285687   0.446166  -0.0916746  -0.64864     0.476115
 -0.972825  0.037842    0.456445   0.752897  -0.733183   0.320432   -1.00636     2.2454  
  0.964869  0.467261   -0.564566  -2.38089    1.70834    0.0517955  -0.0813409   0.302723

julia> As = sub(A, 1:3, 0:3)
3x4 SubArray of 4x8 Float64 Array:
 #undef  -0.661697  0.150336   -0.979774
 #undef  -0.389022  0.0357723  -2.96341 
 #undef  -0.972825  0.037842    0.456445

julia> As[1,1]
ERROR: BoundsError()
 in getindex at subarray.jl:172

julia> As[1,2]
-0.6616974979624464

This lack of bounds-checking turns out to be incredibly useful for certain applications, and I'm starting to write more and more code that takes advantage of it. However, I also recognize that this may be unintended behavior.

Hence, I'd like to request that we make a decision: is this a bug or a feature? I see three paths:

  • It's a feature; keep it as it is. There are a couple of lines of code that will need fixing. Specifically, in a few places A.indexes[i][j] needs to become first(A.indexes[i]) + (j-1)*step(A.indexes[i]), which is mathematically equivalent except for bounds-checking.
  • It should be an optional feature (introduce an extra parameter into the type definition)
  • It's a bug. We will check bounds on construction, but then we also need an AbstractStridedArray type from which "private" variants can descend.

Left to my own devices I would choose the first. But I'll be happy to implement whichever of these achieves some reasonable facsimile of consensus.

@timholy
Copy link
Member Author

timholy commented Aug 13, 2013

I should also clarify that the issue of bounds-checking upon construction is separate from the issue of bounds-checking on getindex and setindex!. I think we definitely need bounds-checking on the latter, and currently that's not implemented either. Using the same array as in my demo above:

julia> As[4,2]
0.9648692808934075

That does not depend on the fact that As was created with out-of-bounds indexes:

julia> As = sub(A, 1:3, 1:3)
3x3 SubArray of 4x8 Float64 Array:
 -0.661697  0.150336   -0.979774
 -0.389022  0.0357723  -2.96341 
 -0.972825  0.037842    0.456445

julia> As[4,2]
0.4672608417658797

In other words, getindex currently behaves exactly as I implemented for get in #3814. While I still think that's the correct behavior for get, it's definitely not the right behavior for getindex.

So, to summarize: my recommendation is for no bounds-checking on construction, but have bounds-checking on getindex/setindex!. Whatever we choose to do, there's stuff that needs fixing.

@timholy
Copy link
Member Author

timholy commented Aug 13, 2013

Ah, I've realized that we can't currently add getindex/setindex! bounds-checking to SubArray without a major performance penalty.
#3503 (comment)

@JeffBezanson
Copy link
Member

The idea of SubArrays that overlap regions outside an array is kind of neat.

@stevengj
Copy link
Member

I see this as a feature, not a bug. It is still "safe" in the sense of throwing an exception of the bounds of the original array are exceeded.

@kmsquire
Copy link
Member

Can you give an example of where this behavior is useful?

@timholy
Copy link
Member Author

timholy commented Aug 13, 2013

Here's an example from what I'm doing right now (this isn't quite accurate, but explaining the precise analysis would take a lot of time...): I have some movies, and I want to track movement of objects during the course of the movie. The movie is a big array, and I track an object by drawing a box around it (a SubArray). What happens when the object moves near the edge of the frame, so that my centered box would stick out beyond the edge? If the SubArray barfs on construction, then I find that I needed a fair bit more code to handle special cases. If it doesn't barf on construction, then I'm finding that it reduces the number of places in the code that I have to explicitly think about boundaries.

So, there's a real-world application showing where it's useful. That said, a person could rationally argue that such facilities don't belong in SubArray. If that's how this discussion works out, I can easily go off in my own direction with my own custom types (it may happen anyway, of course). There are pluses and minuses, for both me and Julia, either way.

@kmsquire
Copy link
Member

Thanks for the example. How would/do you handle the #undefs in the sub-array? Display on screen seems to work fine, but any access to them throws a BoundsError.

@stevengj
Copy link
Member

Yes, in general this is useful for defining a SubArray for an interior region that is updated, and the surrounding array elements to be "ghost cells" or "halo cells", which is a common technique to implement boundary conditions in PDEs.

@timholy
Copy link
Member Author

timholy commented Aug 14, 2013

Yep, our get command for arrays does a version of this.

The other nice property about not being picky at construction time is that shifts become commutative. Here's a 1d example:

A = rand(15)
rng = 1:4
As = sub(A, rng)

Now:

As = sub(As, rng+1)
As = sub(As, rng-1)

yields no shift, as does

As = sub(As, rng-1)
As = sub(As, rng+1)

However, this second one would give an error if we were picky at construction time.

@timholy
Copy link
Member Author

timholy commented Aug 14, 2013

OK, it sounds like this is a feature. I'll add a brief note pointing to this issue in the subarray.jl code, and fix the sub(A::SubArray, ...) constructor to be consistent with this behavior.

@timholy timholy closed this as completed Aug 14, 2013
@ViralBShah
Copy link
Member

I wonder if the code in examples/plife.jl can benefit from this. I do think that we should make this an intentional and well-defined feature, for the same reasons as already discussed.

timholy added a commit that referenced this issue Aug 14, 2013
@timholy
Copy link
Member Author

timholy commented Aug 14, 2013

I do think that we should make this an intentional and well-defined feature, for the same reasons as already discussed.

It's now intentional and well-defined in the sense that I've set up tests that will fail if this behavior changes.

@JeffBezanson
Copy link
Member

I believe this feature means that a loop from 1:length(subarray) where @inbounds is used is incorrect. That could be a serious problem. We might have to put @boundscheck true in every SubArray indexing method if we want to keep this behavior.

@timholy
Copy link
Member Author

timholy commented Feb 24, 2014

I panicked about this a month or two ago, and then convinced myself that it isn't a problem (but that there's another, much smaller, problem). As far as I can tell---and I could easily be wrong---@inbounds doesn't cross layers. That is, @inbounds S[i,j] will avoid the check on whether i and j lie within 1:size(S,1), 1:size(S,2), but does not turn off bounds checking on S.parent[S.indexes[1][i], S.indexes[2][j]].

One might complain about not being able to turn off bounds checking "all the way down" (and I have practical cases of abstraction nesting 4 deep, e.g., a SubArray of an Image of an InterpolatingGrid of an Array), but I suspect it is the lesser of two evils.

Or, is layer-crossing of @inbounds something that will become a concern with better inlining?

@timholy
Copy link
Member Author

timholy commented Feb 24, 2014

(My example assumed that SubArrays index their parent in a cartesian manner, which they don't currently. But someday they will.)

@JeffBezanson
Copy link
Member

Incorrect --- there is no check for i and j within 1:size(S,1) and 1:size(S,2), because we didn't write one and the compiler only checks bounds for primitives. @inbounds is sloppy, and applies to every inlined array access within. It's really only safe when applied very carefully.

The bug here is real:

julia> A = [1:3];

julia> s = sub(A,1:4)
4-element SubArray{Int64,1,Array{Int64,1},(Range1{Int64},)}:
   1   
   2   
   3   
 #undef

julia> foo(s) = @inbounds s[4]

julia> foo(s)

# no error

@timholy
Copy link
Member Author

timholy commented Feb 24, 2014

because we didn't write one and the compiler only checks bounds for primitives

Yes, I knew that, I was imagining you brought this up because you were in the middle of trying to fix that problem 😄. Now I see you were visiting @lindahua's ArrayView PR.

But I somehow misdiagnosed the sloppiness problem---I am almost sure I had explicitly tested this very scenario, though perhaps I failed to wrap it inside a function. So yes, this is a big problem.

The issue, as you say, is that there isn't a good way to do this unless the type encodes whether it's safe to be sloppy. Given how @inbounds works, a type that does not guarantee in-bounds at construction time basically has to turn off inlining on getindex and setindex!,

I see two ways forward:

  • (easy) Give this feature up for Julia's default array view type(s), and do bounds-checking upon construction. Anyone who needs this has to roll their own, and insert manual bounds-checking. (We desperately need more inlining #3796.)
  • (hard) Let types declare whether it's safe for @inbounds to propagate across inlining steps.

The easy one seems inevitable, and fairly urgent. Sigh.

@timholy
Copy link
Member Author

timholy commented Feb 25, 2014

Just had to test it:

julia> @inbounds s[4]
ERROR: BoundsError()
 in getindex at subarray.jl:193

No wonder I thought this wasn't a problem. But of course your analysis is the correct one.

@timholy timholy reopened this Feb 26, 2014
@timholy
Copy link
Member Author

timholy commented Feb 26, 2014

I don't fully understand how codegen/cgutils bounds-checking works. Is there any way to have it depend on type? I'm wondering if we can expand the array hierarchy, so that somewhere (under DenseArray) we have a PrecheckedArray and an MustCheckArray (those are stupid names, but hopefully you get the gist). Bounds-checking could not be disabled for accesses in an MustCheckArray. We could change SubArray so that it does bounds-checking upon construction, and move it into the PrecheckedArray type. Users who want the functionality discussed above would derive their type from MustCheckArray.

The alternative, I presume, is to write bounds-checking into getindex for the type. That's how it currently works for Ranges. There are well-known negatives (currently) to that approach, but certainly it's better than the alternative of having unsafe types lying around. And long-term will it be as good as the intrinsics?

@lindahua
Copy link
Contributor

lindahua commented Jun 4, 2014

Is this still 0.3 blocking? Maybe, we should tackle this together with the array view stuff in the 0.4 cycle?

@stevengj
Copy link
Member

stevengj commented Jun 4, 2014

I don't think it should block 0.3.

@JeffBezanson
Copy link
Member

Ok, let's leave this for the array view change.

@JeffBezanson JeffBezanson modified the milestones: 0.4, 0.3 Jun 4, 2014
timholy added a commit that referenced this issue Jan 4, 2015
…ction

This may change someday (#4044), but for now it's best to not check.
@timholy
Copy link
Member Author

timholy commented Feb 1, 2015

BTW, here's my personal plan for fixing this in a way that lets us have our cake and eat it too:

  • Add yet another parameter to SubArray, call it BoundsChecked, with values true and false. This is to indicate whether bounds were checked upon construction. One could have more complicated settings for controlling bounds-checking at both construction and usage, but once RFC: Safer, extensible ﹫inbounds #8227 gets resolved I think we can do the latter via @inbounds.
  • Give users a way to control that setting upon construction, likely through Val.
  • Make it so that @inbounds is a no-op on a SubArray with BoundsChecked==false (meaning, if you don't check upon construction, you can't disable upon usage).

I think this plan will let us still do the fun things at the top of this issue, would furthermore enable fun stuff like this:

a = rand(5)
b = sub(a, 2:4)  # provides "padding"
c = similar(b)
for i = 1:length(b)
    @inbounds c[i] = b[i-1] - 2*b[i] + b[i+1]
end

and yet resolve the concern that @JeffBezanson raised.

But all this awaits #8227.

timholy added a commit that referenced this issue Feb 23, 2015
In places where one wants to live dangerously, one can call
Base.sub_unchecked/Base.slice_unchecked. So in a sense this gives the
benefits of #4044.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests

6 participants