-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce sample filtering to IVFPQ index search #1513
Introduce sample filtering to IVFPQ index search #1513
Conversation
…Please feel free to use it as a foundation for the future change, if appropriate.
Pull requests from external contributors require approval from a |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! The change looks good and neat. Though I think a few things require further discussion.
cpp/include/raft/neighbors/detail/ivf_pq_compute_similarity-inl.cuh
Outdated
Show resolved
Hide resolved
cpp/include/raft/neighbors/detail/ivf_pq_compute_similarity-inl.cuh
Outdated
Show resolved
Hide resolved
cpp/include/raft/neighbors/detail/ivf_pq_compute_similarity-inl.cuh
Outdated
Show resolved
Hide resolved
cpp/include/raft/neighbors/detail/ivf_pq_compute_similarity-inl.cuh
Outdated
Show resolved
Hide resolved
@achirkin I've made the changes to the PR, please feel free to take a look whenever you have time. Thanks! |
Thanks, @alexanderguzhva. In the meanwhile, could you please elaborate a bit on the need for the |
Well, the use case I'm interested in is when every query uses its own set of allowed samples. This is the requirement from the interested party. |
@achirkin other folks have also requested that we support the ability to pre-filter (before or during k-selection, as opposed to afterwards) based on a given set of indices. |
/ok to test |
Linking FAISS PR containing design that this is enabling: facebookresearch/faiss#2848 |
Anything needs to be done on my end so far? Thanks |
/ok to test |
/ok to test |
/ok to test |
@cjnolet , do you have any handy script for running a style check locally? Thanks! |
@alexanderguzhva we have pre-commit set up on the repository so it should do all the clang-formatting for you upon committing to git. Here's the instructions to install / configure it: https://docs.rapids.ai/api/raft/nightly/contributing/#python-pre-commit-hooks. |
@cjnolet Thanks a lot! |
/ok to test |
@@ -712,16 +734,22 @@ inline auto get_max_batch_size(uint32_t k, | |||
} | |||
|
|||
/** See raft::spatial::knn::ivf_pq::search docs */ | |||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might be some unmerged bits here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'll fix that, thanks!
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've compared the prim benchmarks of this PR (with NoneSampleFilter
) against the current 23.06. There seem to be no performance hit whatsoever. The ncu stats, such as the number of registers stay the same as well, despite having more arguments in the signature. That's an LGTM from the performance side of things!
/ok to test |
/merge |
Thanks again for contributing this feature @alexanderguzhva! |
…anup (#1541) The PR does the following: * Introduces `ivf_flat::search_with_filtering()` call in the same way the filtering was introduced to ivf_pq in #1513 * Moves `sample_filter.cuh` from `raft/neighbor/detail` to `raft/neighbor` * Moves `NoneSampleFilter` from `raft::neighbor::ivf_pq::detail` namespace to `raft::neighbor::filtering` namespace * Renames `NoneSampleFilter` to `NoneIvfSampleFilter` and template argument `SampleFilterT` to `IvfSampleFilterT` * Adds a missing `resource::get_workspace_resource(handle)` in `ivf_flat-inl.cuh` in a `search_with_filtering()` call (which was copied from `search()` call with the same problem) * Adds more comments in `ivf_pq-inl.h` * Some code cleanup in `ivf_pq-inl.h` Authors: - Alexander Guzhva (https://github.com/alexanderguzhva) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) URL: #1541
A prototype that introduces a per-sample filtering for IVFPQ search. Please feel free to use it as a foundation for the future change, if appropriate, because the code is functional, but is not super clean-and-neat.
The diff introduces a template parameter called
SampleFilterT
. An instance is expectedSampleFilterT()
constructible (which was mostly needed to define a default behavior in the form ofSampleFilterT sample_filter=SampleFilterT()
, see below)inline __device__ bool operator(...)
that returnstrue
is a given sample is valid for being used against a given query in IVFPQ searchThe default filter (that I set as a default one in certain facilities in the form of
typename SampleFilterT = NoneSampleFilter
in order not to modify way to many files) allows all samples to be used:Here
__host__
is needed for a CPU-based testing only.Also, I've provided an implementation of
BitMaskSampleFilter
that allows to filter samples based on a bit mask, as an example. The implementation was tested in the semi-production environment.All the filter-related files were added to
cpp/include/raft/neighbors/detail/sample_filter.cuh
.I did not change the default
ivf_pq_search()
method remains unchanged, but one moreivf_pq_search_with_filtering()
method with an additional template argumentSampleFilterT
and one more input parameter was introduced.All the current instantiations use
NoneSampleFilter
only.I've used
SampleFilterT sample_filter
parameters passing instead ofconst SampleFilterT sample_filter
in the function calls in order to be able to add some debugging facilities to a filter and with the hope that the compiler is smart enough to understand the de-facto constness if needed.The filter does not take a computed distance score into account by design, thus the current implementation cannot have a distance threshold. This can be easily changed, if appropriate.
It is still questionable to me whether this filtering needs to be injected right inside the search kernel instead of doing post-processing, please let me know if you have any thoughts on the topic.
I'm happy to address the comments.
Thanks.