Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keepat! and deleteat!: inconsistencies and possible bugs #42065

Closed
bkamins opened this issue Aug 31, 2021 · 11 comments
Closed

keepat! and deleteat!: inconsistencies and possible bugs #42065

bkamins opened this issue Aug 31, 2021 · 11 comments
Milestone

Comments

@bkamins
Copy link
Member

bkamins commented Aug 31, 2021

This was tested under 1.7.0-beta4.

I start with what worries me with keepat!.

The first issue is:

julia> x = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> keepat!(x, [true, false, false])
ERROR: ArgumentError: invalid index: true of type Bool

while the same as allowed for deleteat!:

julia> x = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> deleteat!(x, [true, false, false])
2-element Vector{Int64}:
 2
 3

The second issue is that keepat! will corrupt a non-Vector or BitVector vector:

julia> x = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> y = @view x[2:3]
2-element view(::Vector{Int64}, 2:3) with eltype Int64:
 2
 3

julia> keepat!(y, 2)
ERROR: MethodError: no method matching deleteat!(::SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}, ::UnitRange{Int64})
Closest candidates are:
  deleteat!(::Vector, ::UnitRange{<:Integer}) at array.jl:1397
  deleteat!(::Vector, ::AbstractVector) at array.jl:1433
  deleteat!(::Vector, ::Any) at array.jl:1432
  ...
Stacktrace:
 [1] keepat!(a::SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}, inds::Int64)
   @ Base .\abstractarray.jl:3079
 [2] top-level scope
   @ REPL[19]:1

julia> y
2-element view(::Vector{Int64}, 2:3) with eltype Int64:
 3
 3

julia> x
3-element Vector{Int64}:
 1
 3
 3

Maybe better define it only for Vector and BitVector and assume that if some custom vector implements deleteat! it must also implement keepat!?

Now to deleteat! issues.

The first is:

julia> x = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> keepat!(x, true) # I think this is a correct behavior
ERROR: ArgumentError: invalid index: true of type Bool

julia> deleteat!(x, true) # and this is incorrect, unfortunately it is breaking
2-element Vector{Int64}:
 2
 3

The second issue is:

julia> x = falses(4)
4-element BitVector:
 0
 0
 0
 0

julia> y = collect(x)
4-element Vector{Bool}:
 0
 0
 0
 0

julia> deleteat!(x, [true, false, false, false]) # this is incorrect
ERROR: ArgumentError: indices must be unique and sorted

julia> deleteat!(y, [true, false, false, false]) # this is correct
3-element Vector{Bool}:
 0
 0
 0

I can implement the changes when there is a decision what behaviors in these four cases I have listed are expected. Thank you!

@Seelengrab
Copy link
Contributor

Seelengrab commented Aug 31, 2021

julia> keepat!(x, [true, false, false])
ERROR: ArgumentError: invalid index: true of type Bool

I think the behavior of using AbstractVector{Bool} as a mask for x from deleteat! could be ported to keepat! as well. Feels like a natural extension and it's basically the only difference between the docstring for the two.

Seems like there are in general a bunch of specializations for deleteat! that don't exist at all for keepat!, maybe there's more unexpected inconsistency/difference in behavior:

julia> methods(deleteat!)                                                    
# 8 methods for generic function "deleteat!":                                
[1] deleteat!(a::Vector, i::Integer) in Base at array.jl:1395                
[2] deleteat!(a::Vector, r::UnitRange{<:Integer}) in Base at array.jl:1397   
[3] deleteat!(a::Vector, inds::AbstractVector{Bool}) in Base at array.jl:1473
[4] deleteat!(a::Vector, inds::AbstractVector) in Base at array.jl:1433      
[5] deleteat!(a::Vector, inds) in Base at array.jl:1432                      
[6] deleteat!(B::BitVector, i::Integer) in Base at bitarray.jl:949           
[7] deleteat!(B::BitVector, r::UnitRange{Int64}) in Base at bitarray.jl:957  
[8] deleteat!(B::BitVector, inds) in Base at bitarray.jl:981                 
                                                                             
julia> methods(keepat!)                                                      
# 1 method for generic function "keepat!":                                   
[1] keepat!(a::AbstractVector, inds) in Base at abstractarray.jl:2977        

Since it seems they're supposed to be complementary, I think we should mimic the behavior of deleteat! in keepat!, even if it may not make sense at first glance i.e. the deleteat!([1,2,3], true) case. Booleans are numbers in all other cases, so it'd be weird to me to single them out here.

@bkamins
Copy link
Member Author

bkamins commented Aug 31, 2021

it's basically the only difference between the docstring for the two

Yes, but I assume that users would expect keepat! and deleteat! to be consistent.

Seems like there are in general a bunch of specializations

Yes, but apart from Bool handling they are only performance related so I did not comment on them (indeed e.g. for ranges to be kept we could just use memmove for vectors).

Booleans are numbers in all other cases, so it'd be weird to me to single them out here.

Booleans are numbers EXCEPT for indexing:

julia> x = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> x[true]
ERROR: ArgumentError: invalid index: true of type Bool

and I assume that users would want deleteat! and keepat! work consistently with indexing.

In particular the invariant I imagine is that x[y] and keepat!(x, y) are exactly the same, except that keepat! is in-place.

@bkamins
Copy link
Member Author

bkamins commented Sep 1, 2021

@ararslan - I think it would be good that issue gets considered by someone before 1.7 release as keepat! is a new function so it would be good not to change its design details later. Thank you!

@rashidrafeek
Copy link
Contributor

julia> keepat!(x, [true, false, false])
ERROR: ArgumentError: invalid index: true of type Bool

Doesn't #39528 solve this?

@bkamins
Copy link
Member Author

bkamins commented Sep 1, 2021

Yes - if that PR is merged it should be resolved. Then there is the other issue of corrupting the vectors unsupported by deleteat!.

@KristofferC
Copy link
Member

Then there is the other issue of corrupting the vectors unsupported by deleteat!.

The state of input arguments to mutating functions is kind of undefined if the function throws an exception. Just take something like:

julia> a = [1, 2, "three", 4]
4-element Vector{Any}:
 1
 2
  "three"
 4

julia> filter!(x -> x+1 == 3, a)
ERROR: MethodError: no method matching +(::String, ::Int64)
...

julia> a
4-element Vector{Any}:
 2
  "three"
  "three"
 4

Having to ensure that the original array can be returned feels like it would require a lot of prechecking / defensive copying.

@bkamins
Copy link
Member Author

bkamins commented Sep 1, 2021

The problem is that the user needs to know that keepat! calls deleteat! internally, which is not obvious. So I would recommend to at least mention it in a docstring then.

@ararslan ararslan added this to the 1.7 milestone Sep 1, 2021
@ararslan
Copy link
Member

ararslan commented Sep 1, 2021

@ararslan - I think it would be good that issue gets considered by someone before 1.7 release as keepat! is a new function so it would be good not to change its design details later. Thank you!

Sure, added to the 1.7 milestone. Even if someone sees that and removes the milestone, it means they'll have looked at this. 😄

@bkamins
Copy link
Member Author

bkamins commented Sep 1, 2021

Thank you!

Given the discussion my feeling is that:

  • for keepat!:
    • indexing with Bool vector will be handled in add keepat! for in-place logical filtering #39528 (hopefully merged for 1.7 release?)
    • the inconsistency of signature with deleteat! is not a super crucial thing as @KristofferC commented (indeed there are many examples that render a corrupted object after ! function); still maybe some comment in the docstring could be considered
  • for deleteat!:
    • deleteat!(x, true) works but keepat!(x, true) errors; probably fixing deleteat! here has to wait till 2.0 release, as changing this would be breaking;
    • deleteat!(falses(4), [true, false, false, false]) is clearly a bug; if we want 1.7 to be LTS it should be fixed I think;

@JeffBezanson
Copy link
Member

  • the inconsistency of signature with deleteat! is not a super crucial thing as @KristofferC commented (indeed there are many examples that render a corrupted object after ! function)

In general I agree with this, but since deleteat! is only defined for Vector and BitVector I think it would be ok to constrain keepat! that way as well.

@JeffBezanson
Copy link
Member

Fixed by #42144 and #42351.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants