Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce sample filtering to IVFPQ index search #1513

Merged
merged 12 commits into from
May 19, 2023

Conversation

alexanderguzhva
Copy link
Contributor

A prototype that introduces a per-sample filtering for IVFPQ search. Please feel free to use it as a foundation for the future change, if appropriate, because the code is functional, but is not super clean-and-neat.

The diff introduces a template parameter called SampleFilterT. An instance is expected

  • to be SampleFilterT() constructible (which was mostly needed to define a default behavior in the form of SampleFilterT sample_filter=SampleFilterT(), see below)
  • to provide a inline __device__ bool operator(...) that returns true is a given sample is valid for being used against a given query in IVFPQ search

The default filter (that I set as a default one in certain facilities in the form of typename SampleFilterT = NoneSampleFilter in order not to modify way to many files) allows all samples to be used:

struct NoneSampleFilter {
  inline __device__ __host__ bool operator()(
    // query index
    const uint32_t query_ix,
    // the current inverted list index
    const uint32_t cluster_ix,
    // the index of the current sample inside the current inverted list
    const uint32_t sample_ix
  ) const {
    return true;
  }
};

Here __host__ is needed for a CPU-based testing only.
Also, I've provided an implementation of BitMaskSampleFilter that allows to filter samples based on a bit mask, as an example. The implementation was tested in the semi-production environment.

All the filter-related files were added to cpp/include/raft/neighbors/detail/sample_filter.cuh.

I did not change the default ivf_pq_search() method remains unchanged, but one more ivf_pq_search_with_filtering() method with an additional template argument SampleFilterT and one more input parameter was introduced.

template <typename T, typename IdxT, typename SampleFilterT>
void search_with_filtering(raft::device_resources const& handle,
            const raft::neighbors::ivf_pq::search_params& params,
            const index<IdxT>& idx,
            const T* queries,
            uint32_t n_queries,
            uint32_t k,
            IdxT* neighbors,
            float* distances,
            rmm::mr::device_memory_resource* mr = nullptr,
            SampleFilterT sample_filter = SampleFilterT());

All the current instantiations use NoneSampleFilter only.

I've used SampleFilterT sample_filter parameters passing instead of const SampleFilterT sample_filter in the function calls in order to be able to add some debugging facilities to a filter and with the hope that the compiler is smart enough to understand the de-facto constness if needed.

The filter does not take a computed distance score into account by design, thus the current implementation cannot have a distance threshold. This can be easily changed, if appropriate.

It is still questionable to me whether this filtering needs to be injected right inside the search kernel instead of doing post-processing, please let me know if you have any thoughts on the topic.

I'm happy to address the comments.

Thanks.

…Please feel free to use it as a foundation for the future change, if appropriate.
@alexanderguzhva alexanderguzhva requested a review from a team as a code owner May 14, 2023 15:23
@rapids-bot
Copy link

rapids-bot bot commented May 14, 2023

Pull requests from external contributors require approval from a rapidsai organization member with write or admin permissions before CI can begin.

@github-actions github-actions bot added the cpp label May 14, 2023
@cjnolet
Copy link
Member

cjnolet commented May 14, 2023

/ok to test

@cjnolet cjnolet added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 14, 2023
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! The change looks good and neat. Though I think a few things require further discussion.

@alexanderguzhva
Copy link
Contributor Author

@achirkin I've made the changes to the PR, please feel free to take a look whenever you have time. Thanks!

@achirkin
Copy link
Contributor

Thanks, @alexanderguzhva. In the meanwhile, could you please elaborate a bit on the need for the query_ix parameter in the filtering? My concern is that the extra kernel parameter (query offset) increases the register pressure even if we don't do any filtering.

@alexanderguzhva
Copy link
Contributor Author

Thanks, @alexanderguzhva. In the meanwhile, could you please elaborate a bit on the need for the query_ix parameter in the filtering? My concern is that the extra kernel parameter (query offset) increases the register pressure even if we don't do any filtering.

Well, the use case I'm interested in is when every query uses its own set of allowed samples. This is the requirement from the interested party.

@cjnolet
Copy link
Member

cjnolet commented May 16, 2023

Thanks, @alexanderguzhva. In the meanwhile, could you please elaborate a bit on the need for the query_ix parameter in the filtering?

@achirkin other folks have also requested that we support the ability to pre-filter (before or during k-selection, as opposed to afterwards) based on a given set of indices.

@cjnolet
Copy link
Member

cjnolet commented May 16, 2023

/ok to test

@cjnolet
Copy link
Member

cjnolet commented May 17, 2023

Linking FAISS PR containing design that this is enabling: facebookresearch/faiss#2848

@alexanderguzhva
Copy link
Contributor Author

Anything needs to be done on my end so far? Thanks

@cjnolet
Copy link
Member

cjnolet commented May 18, 2023

/ok to test

@cjnolet
Copy link
Member

cjnolet commented May 18, 2023

/ok to test

@cjnolet
Copy link
Member

cjnolet commented May 18, 2023

/ok to test

@alexanderguzhva
Copy link
Contributor Author

/ok to test

@cjnolet , do you have any handy script for running a style check locally? Thanks!

@cjnolet
Copy link
Member

cjnolet commented May 18, 2023

@alexanderguzhva we have pre-commit set up on the repository so it should do all the clang-formatting for you upon committing to git. Here's the instructions to install / configure it: https://docs.rapids.ai/api/raft/nightly/contributing/#python-pre-commit-hooks.

@alexanderguzhva
Copy link
Contributor Author

@cjnolet Thanks a lot!

@cjnolet
Copy link
Member

cjnolet commented May 18, 2023

/ok to test

@@ -712,16 +734,22 @@ inline auto get_max_batch_size(uint32_t k,
}

/** See raft::spatial::knn::ivf_pq::search docs */
<<<<<<< HEAD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be some unmerged bits here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll fix that, thanks!

@cjnolet
Copy link
Member

cjnolet commented May 18, 2023

/ok to test

Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've compared the prim benchmarks of this PR (with NoneSampleFilter) against the current 23.06. There seem to be no performance hit whatsoever. The ncu stats, such as the number of registers stay the same as well, despite having more arguments in the signature. That's an LGTM from the performance side of things!

cpp/include/raft/neighbors/ivf_pq-inl.cuh Show resolved Hide resolved
@cjnolet
Copy link
Member

cjnolet commented May 19, 2023

/ok to test

@cjnolet
Copy link
Member

cjnolet commented May 19, 2023

/merge

@rapids-bot rapids-bot bot merged commit cdf107b into rapidsai:branch-23.06 May 19, 2023
@cjnolet
Copy link
Member

cjnolet commented May 19, 2023

Thanks again for contributing this feature @alexanderguzhva!

rapids-bot bot pushed a commit that referenced this pull request Jun 5, 2023
…anup (#1541)

The PR does the following:
* Introduces `ivf_flat::search_with_filtering()` call in the same way the filtering was introduced to ivf_pq in #1513 
* Moves `sample_filter.cuh` from `raft/neighbor/detail` to `raft/neighbor`
* Moves `NoneSampleFilter` from `raft::neighbor::ivf_pq::detail` namespace to `raft::neighbor::filtering` namespace
* Renames `NoneSampleFilter` to `NoneIvfSampleFilter` and template argument `SampleFilterT` to `IvfSampleFilterT`
* Adds a missing `resource::get_workspace_resource(handle)` in `ivf_flat-inl.cuh` in a `search_with_filtering()` call (which was copied from `search()` call with the same problem)
* Adds more comments in `ivf_pq-inl.h`
* Some code cleanup in `ivf_pq-inl.h`

Authors:
  - Alexander Guzhva (https://github.com/alexanderguzhva)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1541
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

3 participants