Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: in sorted array #37442

Closed
remi-garcia opened this issue Sep 7, 2020 · 7 comments · Fixed by #37490
Closed

Feature Request: in sorted array #37442

remi-garcia opened this issue Sep 7, 2020 · 7 comments · Fixed by #37490

Comments

@remi-garcia
Copy link
Contributor

Their is a "find" function in sorted arrays: searchsorted:

julia/base/sort.jl

Lines 212 to 229 in eb2a801

function searchsorted(v::AbstractVector, x, ilo::T, ihi::T, o::Ordering)::UnitRange{keytype(v)} where T<:Integer
u = T(1)
lo = ilo - u
hi = ihi + u
@inbounds while lo < hi - u
m = midpoint(lo, hi)
if lt(o, v[m], x)
lo = m
elseif lt(o, x, v[m])
hi = m
else
a = searchsortedfirst(v, x, max(lo,ilo), m, o)
b = searchsortedlast(v, x, m, min(hi,ihi), o)
return a : b
end
end
return (lo + 1) : (hi - 1)
end

Is that a possible option to add insorted that does the same as in but knowing that the collection is sorted?

@goretkin
Copy link
Contributor

goretkin commented Sep 9, 2020

Without giving it much thought, this seems like it might be better (more orthogonally) achieved with a wrapper type to annotate that an AbstractVector is sorted.

struct Sorted{T}
    _::T
end

function Base.in(x, v::Sorted)
    r = searchsorted(a._, x)
    return length(r) > 0
end

Then you can even use infix in:

x in Sorted(vector)

@remi-garcia
Copy link
Contributor Author

I think I would prefer an independent function to be consistent with the functions searchsorted's. Otherwise the user side would only have findfirst instead of findfirst and searchsortedfirst with findfirst(x, v::Sorted) = searchsortedfirst(x, v._).

I'm working on it but I have a few issues with insorted for AbstractRange.

@goretkin
Copy link
Contributor

I think I would prefer an independent function to be consistent with the functions searchsorted's

I understand the desire to be consistent. In my opinion, the situation with searchsortedfirst and findfirst is already something that could be improved, so I do not take it for granted that it's worth being consistent with it.

There are three ideas: search (or find), sorted/unsorted, first/last/all. So really it's find, with two axes of variation: whether the collection is sorted, and whether you want first, last, or all.

If I'm not mistaken
searchsortedfirst(args...) == first(searchsorted(args...))
and
searchsortedlast(args...) == last(searchsorted(args...))

(here is a monte carlo proof:

using Test

for _ = 1:10000
  a = cumsum(rand(0:1, 10))
  q = rand(1:10)

  r = searchsorted(a, q)
  fr = searchsortedfirst(a, q)
  lr = searchsortedlast(a, q)

  @test first(r) === fr
  @test last(r) === lr
end

)
So there is a way to spell searchsortedfirst that doesn't mash all of the ideas into one name, possibly at the cost of some performance, though I think a clever design might be able to eliminate that gap.

findall(s::Sorted) could be defined to be searchsorted(s._), and similarly for firstfirst and findlast

@remi-garcia
Copy link
Contributor Author

I think I would prefer an independent function to be consistent with the functions searchsorted's

I understand the desire to be consistent. In my opinion, the situation with searchsortedfirst and findfirst is already something that could be improved, so I do not take it for granted that it's worth being consistent with it.

There are three ideas: search (or find), sorted/unsorted, first/last/all. So really it's find, with two axes of variation: whether the collection is sorted, and whether you want first, last, or all.

Imho the solution should be found in a better naming and not a new structure, that would had an extra layer that should be avoided. What does sort!(coll::Vector{Int})?

However, searchsortedfirst() != findfirst() if the element is missing from the collection. The first returns an index while the later returns nothing. I would probably be for adding a findsortedfirst that returns the same as findfirst but efficiently in sorted AbstractArray.

@vtjnash
Copy link
Member

vtjnash commented Sep 15, 2020

Duplicate of #24883

@vtjnash vtjnash marked this as a duplicate of #24883 Sep 15, 2020
@goretkin
Copy link
Contributor

I can see that my comments have duplicated things that I now see have been written before, but I don't see the feature request as a duplicate.

@remi-garcia
Copy link
Contributor Author

Sorry if I misunderstood something, to me the feature request was not a duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants