You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of #10593, I have tried to find a way to unify searchsorted, searchsortedfirst and searchsortedlast with the rest of find* functions. In the Search & Find Julep, I suggested replacing them with find* methods dispatching on a Sorted/SortedVector wrapper which would indicate that the input vector is sorted (see also JuliaCollections/DataStructures.jl#290 for a similar but more powerful type).
For searchsorted itself, this plan would work quite well. It could be replaced with find(equalto(needle), Sorted(haystack)), which would just call a specialization of find which would return a range instead of a Vector, while still following the generic AbstractArray API.
The situation is more complex for searchsortedfirst and searchsortedlast. I'll detail the problems for the former, knowing that they are equivalent for the latter. searchsortedfirst returns "the first value in a greater than or equal to x, according to the specified order". With the introduction of the Sorted wrapper, the specified order is the one passed when constructing the wrapper, which indicates how the underlying vector is sorted. So far, so good. But what kind of predicate can we use to reflect this order?
It would be natural to replace searchsortedfirst(needle, haystack) with findfirst(greaterthan(needle), Sorted(haystack)), but then greaterthan cannot have any meaning on its own: it would depend on the order used by haystack, and "greater" could even mean "lower" if rev=true.
We could use findfirst(sortedafter(needle), Sorted(haystack)), which would better reflect that the predicate has no intrinsic meaning. But then we lose the generality of the syntax: it cannot be used on an unsorted vector to find the first value greater than x, which defeats the idea of unifying functions.
Then there are other subtleties, but these are less serious:
greaterthan should actually be greaterthanorequalto and sortedafter be sortedatorafter to accurately reflect the behavior of the function.
searchsortedfirst has an internal variant which allow passing a start and end index indicating the range to be searched, but findfirst only accepts a start index. This feature is used by the sparse matrix code. It could be replaced with array views, or we could just have a method which is more flexible than the other findfirst methods.
searchsortedfirst(x) returns length(x)+1 when no match is found, while findfirst currently returns 0, and may well return nothing in the future (Find Julep: issue with sentinel values Juleps#47). This doesn't sound like an issue, as the caller can easily replace 0 or nothing with length(x)+1 without a significant performance cost if needed.
I would appreciate any help, especially from people who use these functions (I don't). If fixing this in time for 0.7 is too complex, we could unexport these functions (which are needed for sparse matrices), and move them e.g. to SortingAlgorithms.
The text was updated successfully, but these errors were encountered:
As part of #10593, I have tried to find a way to unify
searchsorted
,searchsortedfirst
andsearchsortedlast
with the rest offind*
functions. In the Search & Find Julep, I suggested replacing them withfind*
methods dispatching on aSorted
/SortedVector
wrapper which would indicate that the input vector is sorted (see also JuliaCollections/DataStructures.jl#290 for a similar but more powerful type).For
searchsorted
itself, this plan would work quite well. It could be replaced withfind(equalto(needle), Sorted(haystack))
, which would just call a specialization offind
which would return a range instead of aVector
, while still following the genericAbstractArray
API.The situation is more complex for
searchsortedfirst
andsearchsortedlast
. I'll detail the problems for the former, knowing that they are equivalent for the latter.searchsortedfirst
returns "the first value in a greater than or equal to x, according to the specified order". With the introduction of theSorted
wrapper, the specified order is the one passed when constructing the wrapper, which indicates how the underlying vector is sorted. So far, so good. But what kind of predicate can we use to reflect this order?searchsortedfirst(needle, haystack)
withfindfirst(greaterthan(needle), Sorted(haystack))
, but thengreaterthan
cannot have any meaning on its own: it would depend on the order used byhaystack
, and "greater" could even mean "lower" ifrev=true
.findfirst(sortedafter(needle), Sorted(haystack))
, which would better reflect that the predicate has no intrinsic meaning. But then we lose the generality of the syntax: it cannot be used on an unsorted vector to find the first value greater thanx
, which defeats the idea of unifying functions.Then there are other subtleties, but these are less serious:
greaterthan
should actually begreaterthanorequalto
andsortedafter
besortedatorafter
to accurately reflect the behavior of the function.searchsortedfirst
has an internal variant which allow passing a start and end index indicating the range to be searched, butfindfirst
only accepts a start index. This feature is used by the sparse matrix code. It could be replaced with array views, or we could just have a method which is more flexible than the otherfindfirst
methods.searchsortedfirst(x)
returnslength(x)+1
when no match is found, whilefindfirst
currently returns0
, and may well returnnothing
in the future (Find Julep: issue with sentinel values Juleps#47). This doesn't sound like an issue, as the caller can easily replace0
ornothing
withlength(x)+1
without a significant performance cost if needed.I would appreciate any help, especially from people who use these functions (I don't). If fixing this in time for 0.7 is too complex, we could unexport these functions (which are needed for sparse matrices), and move them e.g. to SortingAlgorithms.
The text was updated successfully, but these errors were encountered: