What to do with searchsorted* functions #24883

nalimilan · 2017-12-02T15:19:09Z

As part of #10593, I have tried to find a way to unify searchsorted, searchsortedfirst and searchsortedlast with the rest of find* functions. In the Search & Find Julep, I suggested replacing them with find* methods dispatching on a Sorted/SortedVector wrapper which would indicate that the input vector is sorted (see also JuliaCollections/DataStructures.jl#290 for a similar but more powerful type).

For searchsorted itself, this plan would work quite well. It could be replaced with find(equalto(needle), Sorted(haystack)), which would just call a specialization of find which would return a range instead of a Vector, while still following the generic AbstractArray API.

The situation is more complex for searchsortedfirst and searchsortedlast. I'll detail the problems for the former, knowing that they are equivalent for the latter. searchsortedfirst returns "the first value in a greater than or equal to x, according to the specified order". With the introduction of the Sorted wrapper, the specified order is the one passed when constructing the wrapper, which indicates how the underlying vector is sorted. So far, so good. But what kind of predicate can we use to reflect this order?

It would be natural to replace searchsortedfirst(needle, haystack) with findfirst(greaterthan(needle), Sorted(haystack)), but then greaterthan cannot have any meaning on its own: it would depend on the order used by haystack, and "greater" could even mean "lower" if rev=true.
We could use findfirst(sortedafter(needle), Sorted(haystack)), which would better reflect that the predicate has no intrinsic meaning. But then we lose the generality of the syntax: it cannot be used on an unsorted vector to find the first value greater than x, which defeats the idea of unifying functions.

Then there are other subtleties, but these are less serious:

greaterthan should actually be greaterthanorequalto and sortedafter be sortedatorafter to accurately reflect the behavior of the function.
searchsortedfirst has an internal variant which allow passing a start and end index indicating the range to be searched, but findfirst only accepts a start index. This feature is used by the sparse matrix code. It could be replaced with array views, or we could just have a method which is more flexible than the other findfirst methods.
searchsortedfirst(x) returns length(x)+1 when no match is found, while findfirst currently returns 0, and may well return nothing in the future (Find Julep: issue with sentinel values Juleps#47). This doesn't sound like an issue, as the caller can easily replace 0 or nothing with length(x)+1 without a significant performance cost if needed.

I would appreciate any help, especially from people who use these functions (I don't). If fixing this in time for 0.7 is too complex, we could unexport these functions (which are needed for sparse matrices), and move them e.g. to SortingAlgorithms.

The text was updated successfully, but these errors were encountered:

nalimilan · 2017-12-16T16:02:34Z

#25133 moves these functions to the SortedSearch stdlib module, so that we can consider improving the API later.

nalimilan · 2018-01-24T09:09:46Z

Looks like the general opinion is that we should keep these functions as they are.

nalimilan added the search & find The find* family of functions label Dec 2, 2017

This was referenced Dec 3, 2017

Range first and last can be misleading #22354

Closed

Unifying search & find functions #10593

Closed

nalimilan mentioned this issue Dec 11, 2017

Clean up search and find API #24673

Merged

This was referenced Dec 31, 2017

Move searchsorted* functions to SortedSearch stdlib module #25133

Closed

Rename searchsorted* functions to findsorted* #25414

Closed

nalimilan closed this as completed Jan 24, 2018

vtjnash mentioned this issue Sep 15, 2020

Feature Request: in sorted array #37442

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What to do with searchsorted* functions #24883

What to do with searchsorted* functions #24883

nalimilan commented Dec 2, 2017

nalimilan commented Dec 16, 2017

nalimilan commented Jan 24, 2018

What to do with searchsorted* functions #24883

What to do with searchsorted* functions #24883

Comments

nalimilan commented Dec 2, 2017

nalimilan commented Dec 16, 2017

nalimilan commented Jan 24, 2018