-
Notifications
You must be signed in to change notification settings - Fork 150
Removing filter templates in ivfpq for binary size reduction #1211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| }; | ||
|
|
||
| _RAFT_HOST_DEVICE inline none_sample_filter::none_sample_filter() = default; | ||
| _RAFT_HOST_DEVICE inline none_sample_filter::~none_sample_filter() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to define them as default here and not in the struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the previous CI run, I could see that the build was failing because it wouldn't let me declare + default with the __device__ annotation. It seems like nvcc ignores __host__ __device__ on default functions when they are defaulted at their first declaration within the struct.
achirkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jinsolp, thanks for working on this!
I have a request regarding the synchronization behavior of the filter object. I believe we should strive to avoid the sync at all costs where possible to gradually improve the small-batch and dynamic-batch performance across our algorithms.
Thanks @achirkin @lowener for your feedbacks! In the binary size sync today, @divyegala suggested using the enum + creating filter inside kernel approach. This is a much less complicated approach, and also shows similar perf to what we already have in this PR (launching separate kernels to track lifetime of kernels). Latest commit introduces this approach. |
|
@jinsolp thanks for the changes! IMO the diff looks much cleaner now. |
|
@jinsolp thanks, would you mind to benchmark this for somewhat larger values of |
| const uint32_t* index_list, \ | ||
| float* query_kths, \ | ||
| IvfSampleFilterT sample_filter, \ | ||
| const filtering::base_filter& sample_filter, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robertmaynard is this going to impact the update you just made for CUDA 13?
@achirkin This is the result of using different k's. Some interesting observations
|
cpp/src/neighbors/sample_filter.cuh
Outdated
| _RAFT_HOST_DEVICE ivf_filter_dev_args_variant() {} | ||
| _RAFT_HOST_DEVICE ~ivf_filter_dev_args_variant() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get away with no explicitly defined constructor/destructor? I have a feeling this looks more complicated than should :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler complains if I remove these explicit constructor/destructor
error: the default constructor of "cuvs::neighbors::filtering::ivf_filter_dev::ivf_filter_dev_args_variant" cannot be referenced -- it is a deleted function
Looks like this is because the bitset_filter_args_t is not trivially constructible. Is there a way to workaround this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What line is this error occurring on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happens when I get rid of the explicit constructor/destructor for the union (these two lines)
_RAFT_HOST_DEVICE ivf_filter_dev_args_variant() {}
_RAFT_HOST_DEVICE ~ivf_filter_dev_args_variant() {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you report what line the compiler says the error occurs on? Must be somewhere during construction of this struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 159 and 164 of sample_filter.cuh. That would be the ivf_filter_dev constructors.
sample_filter.cuh(159): error: the default constructor of "cuvs::neighbors::filtering::ivf_filter_dev::ivf_filter_dev_args_variant" cannot be referenced -- it is a deleted function
{
^
sample_filter.cuh(164): error: the default constructor of "cuvs::neighbors::filtering::ivf_filter_dev::ivf_filter_dev_args_variant" cannot be referenced -- it is a deleted function
{
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Artem's comment below is relevant to this. If you simplify the construction it should be possible.
cpp/src/neighbors/sample_filter.cuh
Outdated
|
|
||
| _RAFT_HOST_DEVICE ivf_filter_dev(none_filter_args_t args = {}) : tag_(FilterType::None) | ||
| { | ||
| new (&args_.none_filter_args) none_filter_args_t(args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just initialize using the args_(args) same as you initialize the tag_ above (after you remove explicit constructor/destructor)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@achirkin @divyegala
Changed to use the initializer list. We still need constructors in the union because bitset_filter_args_t is not trivially constructible and therefore does not have a default constructor. But we don't need the destructor anymore.
divyegala
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall. Can you try to remove all explicit constructors and destructors that are specified as default?
cpp/src/neighbors/sample_filter.cuh
Outdated
| _RAFT_HOST_DEVICE ivf_filter_dev_args_variant() {} | ||
| _RAFT_HOST_DEVICE ~ivf_filter_dev_args_variant() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
…o bs/filter-ivfpq
|
|
||
| struct none_filter_args_t {}; | ||
| using bitset_filter_args_t = | ||
| std::tuple<const int64_t* const*, cuvs::core::bitset_view<uint32_t, int64_t>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be tuple<const int64_t* const*, filtering::bitset_filter<uint32_t, int64_t>> to construct the bitset_filter object just once instead of every call to operator()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we have problems in each of the following scenarios if we want to do so;
If we were to make the filter once...
- outside the
compute_similairty_kernel: we have to launch separate kernels for the filter construction, which complicates things - inside the
compute_similairty_kernel: we don't want to dynamic dispatch inside the kernel, so it is difficult to share common code.
lowener
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
divyegala
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
|
/merge |


Summary
This PR removed filters from templates in ivfpq. Only applied to
compute_similarity_kernelin this PR, but can be applied elsewhere (e.g.ivfflat_interleaved_scan) which will be a follow-up PR to this one.Perf check
Ran the default setting with bitset flter 20 times to check if this slows anything down.
All numbers are averaged across the 20 runs. Profiing done with nsys.
compute_similaritykernel runtimeThink the extra time is coming from the
switchstatements insideivf_to_sample_filter_dev'soperator(). But not too much of an overhead.Binary size reduction
From CI reports: 1107.27 MB ->1040.44 MB