Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat] CAGRA filtering with BFKNN when sparsity matching threshold #378
base: branch-24.12
Are you sure you want to change the base?
[Feat] CAGRA filtering with BFKNN when sparsity matching threshold #378
Changes from 12 commits
0faf889
f3388f0
f14be71
062ca87
a9fd8d8
8e27b74
5378827
651387f
757c222
018879f
bddae7f
caab88b
bac646d
f4c1922
0dc10a2
a73ba1f
2552d8d
ef734d4
0036127
2876506
b5dcc02
5c9c5de
d190b9d
9aa1bb1
a0fba17
e29d74d
4d0fc8e
1bcba66
0cafa23
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please make it possible to disable/enable this feature by the user (see #252 (comment) for the reasoning):
threshold_to_bf
as CAGRA search parameter and set it to 1.0 by default therethreshold_to_bf >= 1.0
then disable further checks and proceed with CAGRA search immediately (i.e. no need to run the sparsity check).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most users with a filter are going to be specifying the filter in batch, and will know the sparsity of the filter. I suggest instead of turning this feature off by default, we allow the user specified filter to know its own nnz unless updated.
Turning this off by default undermines the fundamental benefits of this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most users are not specifying a filter, and when they do, it's expected the filter is going to be heavy. This should not impact all users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens here if the 2d bitmap isn't able to be converted to a 1d bitet without losing information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the number of positive bits in the bitmap also needed to compute this? But then we compute
n_elements
below. We should cache off then_elements
. We should consider making the bitmap immutable. That way the sparsity and the number of positive elements can be safely cached. This isn't something users are going to be updating, like ever.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion! I found a way to allocate the queries in the workspace while keeping the reuse of
copy_with_padding
, which could help with clean code. If I missed something, please feel free to point it out.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to stick to raft primitives and eventually remove
copy_with_padding
, because the latter is not a commonly used primitive and thus one requires to go read its code to understand what does it do exactly.But I don't mind to keep it here for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you pull this out into a separate function that can be invoked here please? This
search_main
function is gettig pretty massive.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done