Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new search type CountWithinRange? #16

Closed
kahaaga opened this issue Aug 29, 2022 · 6 comments
Closed

Add a new search type CountWithinRange? #16

kahaaga opened this issue Aug 29, 2022 · 6 comments

Comments

@kahaaga
Copy link
Contributor

kahaaga commented Aug 29, 2022

I've encountered a use case where I need to construct a tree search structure, and for a list of query points, I need to find how many points are within a certain radius r of each of those points (see this comment).

A way of doing this is to use WithinRange(r) with bulkisearch. However, this is inefficient, because it allocates the index vectors (thus, allocations grow with the number of input points).

NearestNeighbors.jl recently added the inrangecount function, which simply counts the number of neighbors without allocating neither indices nor distances. In my specific application, using inrangecount instead of bulkisearch reduces runtime by half and allocations by many orders of magnitude (depending on size of input data).

It would be nice to have a generic search structure representing this use case in Neighborhood.jl too. Perhaps something like the following?

Base.@kwdef struct CountWithinRange <: SearchType
    r
    self_count::bool = false
end

where r is the radius and self_count flag determines whether the query point itself should be counted or not.

@kahaaga kahaaga changed the title Add a new search type CountWithinRange Add a new search type CountWithinRange? Aug 29, 2022
@Datseris
Copy link
Member

Hi, yes sounds good, but I'd vote to remove the self_count stuff; a user can literally do x + 1 if they want to include the self count, and this seems like a much larger complexity in the source code with extra fields and stuff.

@Datseris
Copy link
Member

However, I'm not sure if this struct is the way to go, because then what would bulksearch return? Wouldn't it make more sense to just implement here the inrangecount function, which means just re-exporting it?

@kahaaga
Copy link
Contributor Author

kahaaga commented Aug 31, 2022

However, I'm not sure if this struct is the way to go, because then what would bulksearch return? Wouldn't it make more sense to just implement here the inrangecount function, which means just re-exporting it?

Good point. think re-exporting inrangecount may work better. I suggest

function inrangecount(tree, query_pts, r)
    [inrangecount(tree, pᵢ, r) for pᵢ in pts]
end

where tree is a pre-computed tree, query_pts[i], where i = 1, 2, ..., N, is a point for which we want to know how many points lie within a radius of r from it (given the metric that was used to construct the tree).

If you're okay with that @Datseris, I'll submit a PR (also including an in-place version where the count vector is pre-allocated).

@Datseris
Copy link
Member

Datseris commented Sep 1, 2022

sounds good, but if there is no performant version for getting in a vector of points already in NearestNeighbors.jl, I'd say that inrangecount(tree, pᵢ, r) is the only thing necessary to export. Thankfully a user can write this list comprehension just as easily.

@kahaaga
Copy link
Contributor Author

kahaaga commented Oct 22, 2022

Solved by #17.

@kahaaga kahaaga closed this as completed Oct 22, 2022
@kahaaga
Copy link
Contributor Author

kahaaga commented Oct 22, 2022

Is there a reason why this was given version 0.2.4? In the list of tagged releases, 0.2.2 was the previous one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants