-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeding HNSW Search #13634
Comments
@seanmacavaney I like this idea (I remember reading this paper a while back and getting excited about it). A couple of concerns I have are:
Looking forward to the PR :) |
Thanks! I just opened a draft PR (#13635). To answer your questions:
I've struggled a bit with this. The PR has an attempt, and I would totally appreciate feedback on it!
Yes, it's scored. The PR sets a limit of 10 seed documents. (In contrast, HNSW uses a single entry point based on the hierarchical search.) This could also be something configurable, though I wouldn't want to complicate the API too much.
Yep, I agree that falling back on the default behavior is reasonable. This is what the PR implements. |
Have we considered providing this as an alternative Query implementation, rather than complicating the existing one? |
@msokolov yeah, my suggested changes do that https://github.com/seanmacavaney/lucene/compare/seeds...benwtrent:lucene:seeds-refactor-idea?expand=1 basically, I think we can add an experimental interface & some experimental queries & collectors and by pass any significant changes to anything except for a small section in the HNSW searcher. |
This has now been merged! Huzzah! |
Thanks @benwtrent! |
Description
In some vector search cases, users may already know some documents that are likely related to a query. Let's support seeding HNSW's scoring stage with these documents, rather than using HNSW's hierarchical stage.
An example use case is hybrid search, where both a traditional and vector search are performed. The top results from the traditional search are likely reasonable seeds for the vector search. Even when not performing hybrid search, traditional matching can often be faster than traversing the hierarchy, which can be used to speed up the vector search process (up to 2x faster for the same effectiveness), as was demonstrated in this article (full disclosure: I'm an author of the article).
This enhancement proposes adding a
seed
query, alongside the existingfilter
query, to the KNN query classes. The results of this query will be fed intoHnswGraphSearcher
, and ultimately replace the graph entry points here. If the seed query fails (e.g., keywords do not match any documents), the approach will fall back onto the existing hierarchical search process.Pull request to follow.
The text was updated successfully, but these errors were encountered: