Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findnext and degenerate multidimensional arrays #36938

Open
goretkin opened this issue Aug 6, 2020 · 17 comments · May be fixed by #36967
Open

findnext and degenerate multidimensional arrays #36938

goretkin opened this issue Aug 6, 2020 · 17 comments · May be fixed by #36967
Labels
arrays [a, r, r, a, y, s] search & find The find* family of functions

Comments

@goretkin
Copy link
Contributor

goretkin commented Aug 6, 2020

This behavior strikes me as inconsistent:

julia> x1 = zeros(Bool, (Base.OneTo(2), Base.OneTo(0)))
2×0 Array{Bool,2}

julia> x2 = zeros(Bool, (Base.OneTo(0), Base.OneTo(2)))
0×2 Array{Bool,2}

julia> first(keys(x1))
CartesianIndex(1, 1)

julia> first(keys(x2))
CartesianIndex(1, 1)

julia> last(keys(x1))
CartesianIndex(2, 0)

julia> last(keys(x2))
CartesianIndex(0, 2)

julia> findnext(x1, first(keys(x1)))

julia> findnext(x2, first(keys(x2)))
ERROR: BoundsError: attempt to access 0×2 Array{Bool,2} at index [1, 1]
Stacktrace:
 [1] getindex at ./array.jl:810 [inlined]
 [2] getindex at ./multidimensional.jl:557 [inlined]
 [3] findnext(::Array{Bool,2}, ::CartesianIndex{2}) at ./array.jl:1728
 [4] top-level scope at REPL[32]:1

I had expected both calls to findnext to return nothing.

Honing in, the inconsistency arises because of

julia> CartesianIndex(2, 0) > CartesianIndex(1, 1)
false

julia> CartesianIndex(0, 2) > CartesianIndex(1, 1)
true

and so the check at

i > l && return nothing
passes through.

I believe the intention in findnext is to compare CartesianIndexs in a way that agrees with the corresponding comparison on the LinearIndex into the array. For example:

julia> Base._sub2ind(x1, Tuple(first(keys(x1)))...)
1

julia> Base._sub2ind(x1, Tuple(last(keys(x1)))...)
0

julia> Base._sub2ind(x2, Tuple(first(keys(x2)))...)
1

julia> Base._sub2ind(x2, Tuple(last(keys(x2)))...)
0

(I tried on Julia 1.4 and Julia 1.5)

@nalimilan nalimilan added arrays [a, r, r, a, y, s] search & find The find* family of functions labels Aug 6, 2020
@nalimilan
Copy link
Member

nalimilan commented Aug 6, 2020

Good catch. I guess the reason for the current behavior is to have first(keys(x)):last(keys(x)) == keys(x). But it's not great that first(keys(x)) < last(keys(x)) even when x is empty.

A simple fix would be to use checkbounds(Bool, A, i) instead. Would you make a PR to do that (with a test)?

More fundamentally I'm not sure why findnext doesn't throw an error when an out of bounds index is passed. Maybe something to revisit for 2.0.

@goretkin
Copy link
Contributor Author

goretkin commented Aug 6, 2020

Would you make a PR to do that (with a test)?

Sure thing.

More fundamentally I'm not sure why findnext doesn't throw an error when an out of bounds index is passed.

I don't have a justification handy (@mbauman comes to mind, so maybe they do), but I suppose this choice is made to be consistent with first on empty keys, exactly so that findnext can work in the empty case where there is no in-bounds index.

One guiding desire is
findfirst(x) == findnext(x, first(keys(x))

which is true right now by definition, on AbstractArray, but not in spirit:

julia> invoke(findfirst, Tuple{Any}, fill(true, (0, 2)))

julia> invoke(findfirst, Tuple{Any}, fill(true, (2, 0)))

julia> findfirst(fill(true, (2, 0)))

julia> findfirst(fill(true, (0, 2)))
ERROR: BoundsError: attempt to access 0×2 Array{Bool,2} at index [1, 1]

The choice of what first and last does on these empty collections in peculiar to begin with. For example:

julia> first(keys(zeros(0, 2)))
CartesianIndex(1, 1)

julia> first(collect(keys(zeros(0, 2))))
ERROR: BoundsError: attempt to access 0×2 Array{CartesianIndex{2},2} at index [1]

which I bet is chosen to align with:

julia> first(LinearIndices(fill(0, (0, 2))))
1

julia> firstindex(fill(0, (0, 2)))
1

It's peculiar that first(empty_thing) and last(empty_thing) return anything at all, such that you can't distinguish between

julia> first(keys(fill(0, (1, 2))))
CartesianIndex(1, 1)

julia> first(keys(fill(0, (0, 2))))
CartesianIndex(1, 1)

Here's one explanation: #34697 (comment)

@goretkin
Copy link
Contributor Author

goretkin commented Aug 7, 2020

@nalimilan I am not sure if checkbounds is correct here.

Current behavior:

julia> findnext(fill(false, 2, 2), CartesianIndex(-1, -1))
ERROR: BoundsError: attempt to access 2×2 Matrix{Bool} at index [-1, -1]

If I understood your suggestion with checkbounds, it would instead return nothing.

@nalimilan
Copy link
Member

Mind the Bool in checkbounds(Bool, A, i).

@goretkin
Copy link
Contributor Author

goretkin commented Aug 7, 2020

Perhaps you understood that BoundsError to be coming from the corrected code. But it's from the implementation on master.

Let me know if this clarifies it

function _findnext(A, start)
    l = last(keys(A))
    i = oftype(l, start)
    # i > l && return nothing       # original line
    checkbounds(Bool, A, i) || return nothing
    while true
        A[i] && return i
        i == l && break
        # nextind(A, l) can throw/overflow
        i = nextind(A, i)
    end
    return nothing
end

_findfirst(A) = _findnext(A, first(keys(A)))

#
findfirst(fill(false, 0, 2))    # broken
findfirst(fill(false, 2, 0))    # good

_findfirst(fill(false, 0, 2))   # the change fixed this, good 👍
_findfirst(fill(false, 2, 0))   # this was working before, and still is.

findnext(fill(false, 2, 2), CartesianIndex(0, 0)) # throws `BoundsError`, good
_findnext(fill(false, 2, 2), CartesianIndex(0, 0))  # does not. newly broken. 👎

@mbauman
Copy link
Member

mbauman commented Aug 7, 2020

Instead of:

i > l && return nothing

what about simply doing:

(i > l || isempty(A)) && return nothing

(Edit: oh, I see, this will similarly not throw BoundsErrors like you want)

I think it's just fine — and a feature — that firstindex might return an index out of bounds. Same goes for lastindex.

I agree it feels unsatisfactory that first(keys(x1)) > last(keys(x1)) but first(keys(x2)) < last(keys(x2)) when they're both empty arrays... but I'm afraid there's not much we can do there.

@mbauman
Copy link
Member

mbauman commented Aug 7, 2020

Thinking about this more, I don't think we want to throw bounds errors from findnext. Again, go with the linear indexing version — there i > lastindex(A) is a 100% valid bounds check. The reason for the BoundsError is exactly the same cause as the initial bug.

@goretkin
Copy link
Contributor Author

goretkin commented Aug 7, 2020

I wrote

findnext(fill(false, 2, 2), CartesianIndex(0, 0)) # throws `BoundsError`, good
_findnext(fill(false, 2, 2), CartesianIndex(0, 0))  # does not. newly broken. 👎

But I actually wasn't sure whether it should or should not throw BoundsError. I was just observing that the existing code checks the bounds only in one direction (i > l), and that checkbounds checks both directions (ub > i > lb).

Would it be non-breaking to make findnext(fill(false, 2, 2), CartesianIndex(0, 0)) return nothing?

@vtjnash
Copy link
Member

vtjnash commented Aug 7, 2020

cross-ref #36768

@goretkin
Copy link
Contributor Author

goretkin commented Aug 7, 2020

Probably others already know this, so consider it a note-to-self.

I had suspected that there might be something wrong with the ordering on CartesianIndex. Not that there would be room to change it in 1.0, but I wanted to know if there was any inconsistency. To be sure, I did the following exercise:

import OffsetArrays # for `fill` to work on `UnitRange`

"""
Clumsy implementation based on definition in https://github.com/JuliaLang/julia/pull/16054 :

`isless(I1, I2)` returns true if `I1` would be visited before `I2` during `CartesianRange` iteration
"""
function isless2(I1::CartesianIndex, I2::CartesianIndex)
    ranges = map(x->UnitRange(x...), minmax.(Tuple(I1), Tuple(I2)))
    A = fill(nothing, ranges)
    for i = CartesianIndices(A)
        i === I1 && return true
        i === I2 && return false
    end
    error()
end

@show isless(CartesianIndex(1, 1), CartesianIndex(0, 2))
@show isless(CartesianIndex(1, 1), CartesianIndex(2, 0))

@show isless2(CartesianIndex(1, 1), CartesianIndex(0, 2))
@show isless2(CartesianIndex(1, 1), CartesianIndex(2, 0))

So indeed the implementation of isless(::CartesianIndex, ::CartesianIndex) is true to its meaning.

@goretkin
Copy link
Contributor Author

goretkin commented Aug 7, 2020

Again, go with the linear indexing version — there i > lastindex(A) is a 100% valid bounds check

Could you elaborate on this? I think I missed some context.

@mbauman
Copy link
Member

mbauman commented Aug 7, 2020

My point is simply that for integer indices, there aren't values between firstindex and lastindex that are out of bounds in the way that there are for CartesianIndex.

@goretkin
Copy link
Contributor Author

goretkin commented Aug 7, 2020

Thanks for clarifying. I am not sure I've understood what (or whether) we've converged on in terms of desired behavior. Note that there seems to be an issue with the bounds checking that is unrelated to degenerate array dimensions.

I wrote some tests here: https://github.com/JuliaLang/julia/pull/36967/files#diff-f5dc554ce02d9b8c6323d7a7e8077ccaR594

@nalimilan @mbauman do those tests encode the behavior we want?

Thinking about this more, I don't think we want to throw bounds errors from findnext

The tests show situations where findnext and co do currently throw BoundsError. They seem desireable to me, and also perhaps necessary to ensure compatibility within 1.0

@nalimilan
Copy link
Member

OK so we currently have an asymmetry in the integer bounds check (we throw an error when it's too low, but return nothing when it's too high). I guess this behavior makes sense if you want to loop over the result until you get nothing, without having to check bounds manually again.

So regarding your tests, I'm not sure we really want findnext(fill(true, (2, 2)), CartesianIndex(3, 1)) === nothing, as CartesianIndex(3, 1) isn't an index you can get by calling nextind(fill(true, (2, 2)), i) if you start with a value of i which is valid. I'd only keep tests that ensure that we return nothing on first(keys(x)) and on nextindex(last(keys(x)), x).

Regarding the implementation, maybe we should just do (i > l || (isempty(A) && i >= first(A)) && return nothing?

@goretkin
Copy link
Contributor Author

goretkin commented Aug 8, 2020

I'm not sure we really want findnext(fill(true, (2, 2)), CartesianIndex(3, 1)) === nothing

You mentioned the too-low-vs-too-high asymmetry which is showing up because the fix for the issue should either intentionally preserve the current behavior, or intentionally improve it.

Another asymmetry that currently exists, that I'd like to get rid of, is that these do different things:

        @test findnext(fill(true, (2, 2)), CartesianIndex(1, 3)) === nothing
        @test findnext(fill(true, (2, 2)), CartesianIndex(3, 1)) === nothing    # 👎
        @test findnext(fill(true, (2, 2)), CartesianIndex(3, 3)) === nothing

Foremost, I believe they should either all throw or all evaluate to nothing. Is that something we can agree on?

If I understood correctly, CartesianIndex(1,3) isn't an index you would get from nextind(fill(true, (2, 2)), i) with a valid i, but I can't tell whether you feel it should keep returning nothing or if it should also throw BoundsError.

@nalimilan
Copy link
Member

nalimilan commented Aug 8, 2020

If I understood correctly, CartesianIndex(1,3) isn't an index you would get from nextind(fill(true, (2, 2)), i) with a valid i, but I can't tell whether you feel it should keep returning nothing or if it should also throw BoundsError.

nextind(fill(true, (2, 2)), last(keys(x))) == CartesianIndex(3,3) so you're going to pass that index if you do this:

x = fill(true, (2, 2))
i = first(keys(x))
while (v = findnext(x, i)) !== nothing
    i = nextind(x, i)
    ...
end

So returning nothing is somewhat useful in that case. And anyway we can't start throwing an error before 2.0 as it would break code.

OTC, CartesianIndex(3, 1) and CartesianIndex(3, 1)CartesianIndex(3, 3) cannot be obtained by calling nextind(x, i) so there's no reason to stop throwing an error there.

@goretkin
Copy link
Contributor Author

goretkin commented Aug 8, 2020

You probably meant

nextind(fill(true, (2, 2)), last(keys(x))) == CartesianIndex(1, 3)

and

OTC, CartesianIndex(3, 1) and CartesianIndex(3, 3)

That helps me understand better what's going on. Thanks for explaining.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] search & find The find* family of functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants