-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KNN] Add comment and remove duplicate code #13594
base: main
Are you sure you want to change the base?
Conversation
+1 to share as much logic as possible (including
IMO we should not tie Maybe some common KNN utility classes / functions instead? |
I think common utility makes sense. I'll move both createFilterWeights and createBitSet to a utility class. |
There were some recent commits I need to rebase first as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, thanks!
/** Common utilities for KNN queries. */ | ||
class KnnQueryUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Should we name it something like Ann*
(Approximate Near Neighbor *) to fit with *VectorSimilarityQuery
too? (Knn*
sounds specific to Knn*VectorQuery
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. I'll change it to AnnQueryUtils
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
Thanks Kaival for reviewing and approving. Could someone from Lucene committers help review and merge this PR if it looks good? |
import org.apache.lucene.util.Bits; | ||
|
||
/** Common utilities for KNN queries. */ | ||
class KnnQueryUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class KnnQueryUtils { | |
final class KnnQueryUtils { |
import org.apache.lucene.util.Bits; | ||
|
||
/** Common utilities for ANN queries. */ | ||
class AnnQueryUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be a final
class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And can we name it KnnQueryUtils
since all the queries are Knn
queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it to a final class.
It's also used by VectorSimilarityQuery which is not exactly KNN (k-nearest) though.
protected final int k; | ||
|
||
/** the filter to executed before KNN search */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the filter is executed before search to gather the relevant bit set, we should be clear that when the filter is applied is up to the underlying knn index.
@@ -116,6 +115,7 @@ private TopDocs searchLeaf( | |||
return results; | |||
} | |||
|
|||
// Execute the filter if any and perform KNN search at each segment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment doesn't make sense to me. If we really want to add java docs to private methods, lets fix the wording. Maybe something like
"Perform kNN search for the provided LeafReaderContext applying filterWeight as necessary"
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
@benwtrent sorry for the delay. I have addressed the comments. Can you please take a look? Thank you |
Description
weight
To discuss: The
createFilterWeight
might also merit to be shared with the vector similarity query. There is a discrepancy between the 2 classes: the top-k KNN query enhance the filter so that only documents having values on the KNN field will be matched, but the vector similarity query doesn't. I think they should share similar behavior as they are both vector search.Moreover, should we make their behavior more closely so that they can share/extend one another?
The top-k KNN query has the main logic inside rewriteQuery while vector-similarity query has the main logic in Weight.(I now understand that for top-k Query we need to get top-k across all segments and thus cannot just do it in createWeights, which only acts as segment-level matcher). Query timeout check is also not currently supported by vector similarity query.