`findnext` and degenerate multidimensional arrays #36938

goretkin · 2020-08-06T10:04:01Z

This behavior strikes me as inconsistent:

julia> x1 = zeros(Bool, (Base.OneTo(2), Base.OneTo(0)))
2×0 Array{Bool,2}

julia> x2 = zeros(Bool, (Base.OneTo(0), Base.OneTo(2)))
0×2 Array{Bool,2}

julia> first(keys(x1))
CartesianIndex(1, 1)

julia> first(keys(x2))
CartesianIndex(1, 1)

julia> last(keys(x1))
CartesianIndex(2, 0)

julia> last(keys(x2))
CartesianIndex(0, 2)

julia> findnext(x1, first(keys(x1)))

julia> findnext(x2, first(keys(x2)))
ERROR: BoundsError: attempt to access 0×2 Array{Bool,2} at index [1, 1]
Stacktrace:
 [1] getindex at ./array.jl:810 [inlined]
 [2] getindex at ./multidimensional.jl:557 [inlined]
 [3] findnext(::Array{Bool,2}, ::CartesianIndex{2}) at ./array.jl:1728
 [4] top-level scope at REPL[32]:1

I had expected both calls to findnext to return nothing.

Honing in, the inconsistency arises because of

julia> CartesianIndex(2, 0) > CartesianIndex(1, 1)
false

julia> CartesianIndex(0, 2) > CartesianIndex(1, 1)
true

and so the check at

julia/base/array.jl

Line 1726 in 4c1e3c0

i > l && return nothing

passes through.

I believe the intention in findnext is to compare CartesianIndexs in a way that agrees with the corresponding comparison on the LinearIndex into the array. For example:

julia> Base._sub2ind(x1, Tuple(first(keys(x1)))...)
1

julia> Base._sub2ind(x1, Tuple(last(keys(x1)))...)
0

julia> Base._sub2ind(x2, Tuple(first(keys(x2)))...)
1

julia> Base._sub2ind(x2, Tuple(last(keys(x2)))...)
0

(I tried on Julia 1.4 and Julia 1.5)

The text was updated successfully, but these errors were encountered:

nalimilan · 2020-08-06T12:34:28Z

Good catch. I guess the reason for the current behavior is to have first(keys(x)):last(keys(x)) == keys(x). But it's not great that first(keys(x)) < last(keys(x)) even when x is empty.

A simple fix would be to use checkbounds(Bool, A, i) instead. Would you make a PR to do that (with a test)?

More fundamentally I'm not sure why findnext doesn't throw an error when an out of bounds index is passed. Maybe something to revisit for 2.0.

goretkin · 2020-08-06T15:15:58Z

Would you make a PR to do that (with a test)?

Sure thing.

More fundamentally I'm not sure why findnext doesn't throw an error when an out of bounds index is passed.

I don't have a justification handy (@mbauman comes to mind, so maybe they do), but I suppose this choice is made to be consistent with first on empty keys, exactly so that findnext can work in the empty case where there is no in-bounds index.

One guiding desire is
findfirst(x) == findnext(x, first(keys(x))

which is true right now by definition, on AbstractArray, but not in spirit:

julia> invoke(findfirst, Tuple{Any}, fill(true, (0, 2)))

julia> invoke(findfirst, Tuple{Any}, fill(true, (2, 0)))

julia> findfirst(fill(true, (2, 0)))

julia> findfirst(fill(true, (0, 2)))
ERROR: BoundsError: attempt to access 0×2 Array{Bool,2} at index [1, 1]

The choice of what first and last does on these empty collections in peculiar to begin with. For example:

julia> first(keys(zeros(0, 2)))
CartesianIndex(1, 1)

julia> first(collect(keys(zeros(0, 2))))
ERROR: BoundsError: attempt to access 0×2 Array{CartesianIndex{2},2} at index [1]

which I bet is chosen to align with:

julia> first(LinearIndices(fill(0, (0, 2))))
1

julia> firstindex(fill(0, (0, 2)))
1

It's peculiar that first(empty_thing) and last(empty_thing) return anything at all, such that you can't distinguish between

julia> first(keys(fill(0, (1, 2))))
CartesianIndex(1, 1)

julia> first(keys(fill(0, (0, 2))))
CartesianIndex(1, 1)

Here's one explanation: #34697 (comment)

goretkin · 2020-08-07T03:14:33Z

@nalimilan I am not sure if checkbounds is correct here.

Current behavior:

julia> findnext(fill(false, 2, 2), CartesianIndex(-1, -1))
ERROR: BoundsError: attempt to access 2×2 Matrix{Bool} at index [-1, -1]

If I understood your suggestion with checkbounds, it would instead return nothing.

nalimilan · 2020-08-07T08:26:25Z

Mind the Bool in checkbounds(Bool, A, i).

goretkin · 2020-08-07T16:06:55Z

Perhaps you understood that BoundsError to be coming from the corrected code. But it's from the implementation on master.

Let me know if this clarifies it

function _findnext(A, start)
    l = last(keys(A))
    i = oftype(l, start)
    # i > l && return nothing       # original line
    checkbounds(Bool, A, i) || return nothing
    while true
        A[i] && return i
        i == l && break
        # nextind(A, l) can throw/overflow
        i = nextind(A, i)
    end
    return nothing
end

_findfirst(A) = _findnext(A, first(keys(A)))

#
findfirst(fill(false, 0, 2))    # broken
findfirst(fill(false, 2, 0))    # good

_findfirst(fill(false, 0, 2))   # the change fixed this, good 👍
_findfirst(fill(false, 2, 0))   # this was working before, and still is.

findnext(fill(false, 2, 2), CartesianIndex(0, 0)) # throws `BoundsError`, good
_findnext(fill(false, 2, 2), CartesianIndex(0, 0))  # does not. newly broken. 👎

mbauman · 2020-08-07T17:45:14Z

Instead of:

i > l && return nothing

what about simply doing:

(i > l || isempty(A)) && return nothing

(Edit: oh, I see, this will similarly not throw BoundsErrors like you want)

I think it's just fine — and a feature — that firstindex might return an index out of bounds. Same goes for lastindex.

I agree it feels unsatisfactory that first(keys(x1)) > last(keys(x1)) but first(keys(x2)) < last(keys(x2)) when they're both empty arrays... but I'm afraid there's not much we can do there.

mbauman · 2020-08-07T18:08:20Z

Thinking about this more, I don't think we want to throw bounds errors from findnext. Again, go with the linear indexing version — there i > lastindex(A) is a 100% valid bounds check. The reason for the BoundsError is exactly the same cause as the initial bug.

goretkin · 2020-08-07T18:32:55Z

I wrote

findnext(fill(false, 2, 2), CartesianIndex(0, 0)) # throws `BoundsError`, good
_findnext(fill(false, 2, 2), CartesianIndex(0, 0))  # does not. newly broken. 👎

But I actually wasn't sure whether it should or should not throw BoundsError. I was just observing that the existing code checks the bounds only in one direction (i > l), and that checkbounds checks both directions (ub > i > lb).

Would it be non-breaking to make findnext(fill(false, 2, 2), CartesianIndex(0, 0)) return nothing?

vtjnash · 2020-08-07T18:41:53Z

cross-ref #36768

goretkin · 2020-08-07T19:46:31Z

Probably others already know this, so consider it a note-to-self.

I had suspected that there might be something wrong with the ordering on CartesianIndex. Not that there would be room to change it in 1.0, but I wanted to know if there was any inconsistency. To be sure, I did the following exercise:

import OffsetArrays # for `fill` to work on `UnitRange`

"""
Clumsy implementation based on definition in https://github.com/JuliaLang/julia/pull/16054 :

`isless(I1, I2)` returns true if `I1` would be visited before `I2` during `CartesianRange` iteration
"""
function isless2(I1::CartesianIndex, I2::CartesianIndex)
    ranges = map(x->UnitRange(x...), minmax.(Tuple(I1), Tuple(I2)))
    A = fill(nothing, ranges)
    for i = CartesianIndices(A)
        i === I1 && return true
        i === I2 && return false
    end
    error()
end

@show isless(CartesianIndex(1, 1), CartesianIndex(0, 2))
@show isless(CartesianIndex(1, 1), CartesianIndex(2, 0))

@show isless2(CartesianIndex(1, 1), CartesianIndex(0, 2))
@show isless2(CartesianIndex(1, 1), CartesianIndex(2, 0))

So indeed the implementation of isless(::CartesianIndex, ::CartesianIndex) is true to its meaning.

goretkin · 2020-08-07T19:48:04Z

Again, go with the linear indexing version — there i > lastindex(A) is a 100% valid bounds check

Could you elaborate on this? I think I missed some context.

mbauman · 2020-08-07T20:56:27Z

My point is simply that for integer indices, there aren't values between firstindex and lastindex that are out of bounds in the way that there are for CartesianIndex.

goretkin · 2020-08-07T23:28:10Z

Thanks for clarifying. I am not sure I've understood what (or whether) we've converged on in terms of desired behavior. Note that there seems to be an issue with the bounds checking that is unrelated to degenerate array dimensions.

I wrote some tests here: https://github.com/JuliaLang/julia/pull/36967/files#diff-f5dc554ce02d9b8c6323d7a7e8077ccaR594

@nalimilan @mbauman do those tests encode the behavior we want?

Thinking about this more, I don't think we want to throw bounds errors from findnext

The tests show situations where findnext and co do currently throw BoundsError. They seem desireable to me, and also perhaps necessary to ensure compatibility within 1.0

nalimilan · 2020-08-08T09:34:47Z

OK so we currently have an asymmetry in the integer bounds check (we throw an error when it's too low, but return nothing when it's too high). I guess this behavior makes sense if you want to loop over the result until you get nothing, without having to check bounds manually again.

So regarding your tests, I'm not sure we really want findnext(fill(true, (2, 2)), CartesianIndex(3, 1)) === nothing, as CartesianIndex(3, 1) isn't an index you can get by calling nextind(fill(true, (2, 2)), i) if you start with a value of i which is valid. I'd only keep tests that ensure that we return nothing on first(keys(x)) and on nextindex(last(keys(x)), x).

Regarding the implementation, maybe we should just do (i > l || (isempty(A) && i >= first(A)) && return nothing?

goretkin · 2020-08-08T10:59:09Z

I'm not sure we really want findnext(fill(true, (2, 2)), CartesianIndex(3, 1)) === nothing

You mentioned the too-low-vs-too-high asymmetry which is showing up because the fix for the issue should either intentionally preserve the current behavior, or intentionally improve it.

Another asymmetry that currently exists, that I'd like to get rid of, is that these do different things:

        @test findnext(fill(true, (2, 2)), CartesianIndex(1, 3)) === nothing
        @test findnext(fill(true, (2, 2)), CartesianIndex(3, 1)) === nothing    # 👎
        @test findnext(fill(true, (2, 2)), CartesianIndex(3, 3)) === nothing

Foremost, I believe they should either all throw or all evaluate to nothing. Is that something we can agree on?

If I understood correctly, CartesianIndex(1,3) isn't an index you would get from nextind(fill(true, (2, 2)), i) with a valid i, but I can't tell whether you feel it should keep returning nothing or if it should also throw BoundsError.

nalimilan · 2020-08-08T12:37:12Z

If I understood correctly, CartesianIndex(1,3) isn't an index you would get from nextind(fill(true, (2, 2)), i) with a valid i, but I can't tell whether you feel it should keep returning nothing or if it should also throw BoundsError.

nextind(fill(true, (2, 2)), last(keys(x))) == CartesianIndex(3,3) so you're going to pass that index if you do this:

x = fill(true, (2, 2))
i = first(keys(x))
while (v = findnext(x, i)) !== nothing
    i = nextind(x, i)
    ...
end

So returning nothing is somewhat useful in that case. And anyway we can't start throwing an error before 2.0 as it would break code.

OTC, CartesianIndex(3, 1) and ~~CartesianIndex(3, 1)~~CartesianIndex(3, 3) cannot be obtained by calling nextind(x, i) so there's no reason to stop throwing an error there.

goretkin · 2020-08-08T17:17:34Z

You probably meant

nextind(fill(true, (2, 2)), last(keys(x))) == CartesianIndex(1, 3)

and

OTC, CartesianIndex(3, 1) and CartesianIndex(3, 3)

That helps me understand better what's going on. Thanks for explaining.

nalimilan added arrays [a, r, r, a, y, s] search & find The find* family of functions labels Aug 6, 2020

This was referenced Aug 7, 2020

Reduce code duplication in find* functions #36965

Merged

[WIP] find* bounds checking #36967

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`findnext` and degenerate multidimensional arrays #36938

`findnext` and degenerate multidimensional arrays #36938

goretkin commented Aug 6, 2020

nalimilan commented Aug 6, 2020 •

edited by mbauman

Loading

goretkin commented Aug 6, 2020 •

edited

Loading

goretkin commented Aug 7, 2020

nalimilan commented Aug 7, 2020

goretkin commented Aug 7, 2020

mbauman commented Aug 7, 2020 •

edited

Loading

mbauman commented Aug 7, 2020

goretkin commented Aug 7, 2020

vtjnash commented Aug 7, 2020

goretkin commented Aug 7, 2020 •

edited

Loading

goretkin commented Aug 7, 2020

mbauman commented Aug 7, 2020

goretkin commented Aug 7, 2020 •

edited

Loading

nalimilan commented Aug 8, 2020

goretkin commented Aug 8, 2020

nalimilan commented Aug 8, 2020 •

edited

Loading

goretkin commented Aug 8, 2020 •

edited

Loading

findnext and degenerate multidimensional arrays #36938

findnext and degenerate multidimensional arrays #36938

Comments

goretkin commented Aug 6, 2020

nalimilan commented Aug 6, 2020 • edited by mbauman Loading

goretkin commented Aug 6, 2020 • edited Loading

goretkin commented Aug 7, 2020

nalimilan commented Aug 7, 2020

goretkin commented Aug 7, 2020

mbauman commented Aug 7, 2020 • edited Loading

mbauman commented Aug 7, 2020

goretkin commented Aug 7, 2020

vtjnash commented Aug 7, 2020

goretkin commented Aug 7, 2020 • edited Loading

goretkin commented Aug 7, 2020

mbauman commented Aug 7, 2020

goretkin commented Aug 7, 2020 • edited Loading

nalimilan commented Aug 8, 2020

goretkin commented Aug 8, 2020

nalimilan commented Aug 8, 2020 • edited Loading

goretkin commented Aug 8, 2020 • edited Loading

`findnext` and degenerate multidimensional arrays #36938

`findnext` and degenerate multidimensional arrays #36938

nalimilan commented Aug 6, 2020 •

edited by mbauman

Loading

goretkin commented Aug 6, 2020 •

edited

Loading

mbauman commented Aug 7, 2020 •

edited

Loading

goretkin commented Aug 7, 2020 •

edited

Loading

goretkin commented Aug 7, 2020 •

edited

Loading

nalimilan commented Aug 8, 2020 •

edited

Loading

goretkin commented Aug 8, 2020 •

edited

Loading