Seeding HNSW Search #13634

seanmacavaney · 2024-08-06T09:33:16Z

Description

In some vector search cases, users may already know some documents that are likely related to a query. Let's support seeding HNSW's scoring stage with these documents, rather than using HNSW's hierarchical stage.

An example use case is hybrid search, where both a traditional and vector search are performed. The top results from the traditional search are likely reasonable seeds for the vector search. Even when not performing hybrid search, traditional matching can often be faster than traversing the hierarchy, which can be used to speed up the vector search process (up to 2x faster for the same effectiveness), as was demonstrated in this article (full disclosure: I'm an author of the article).

This enhancement proposes adding a seed query, alongside the existing filter query, to the KNN query classes. The results of this query will be fed into HnswGraphSearcher, and ultimately replace the graph entry points here. If the seed query fails (e.g., keywords do not match any documents), the approach will fall back onto the existing hierarchical search process.

Pull request to follow.

The text was updated successfully, but these errors were encountered:

benwtrent · 2024-08-06T11:22:03Z

@seanmacavaney I like this idea (I remember reading this paper a while back and getting excited about it).

A couple of concerns I have are:

The API, this is always tricky to get correct
The seed query, will it be scored? How will we ensure there is a limit (e.g. that somebody doesn't just pass a match all docs query).
What should the behavior be when the seed query matches NO documents? I would assume the correct behavior here is to traverse the graph as normal.

Looking forward to the PR :)

seanmacavaney · 2024-08-06T11:45:45Z

Thanks! I just opened a draft PR (#13635). To answer your questions:

The API, this is always tricky to get correct

I've struggled a bit with this. The PR has an attempt, and I would totally appreciate feedback on it!

The seed query, will it be scored? How will we ensure there is a limit (e.g. that somebody doesn't just pass a match all docs query).

Yes, it's scored. The PR sets a limit of 10 seed documents. (In contrast, HNSW uses a single entry point based on the hierarchical search.) This could also be something configurable, though I wouldn't want to complicate the API too much.

What should the behavior be when the seed query matches NO documents? I would assume the correct behavior here is to traverse the graph as normal.

Yep, I agree that falling back on the default behavior is reasonable. This is what the PR implements.

msokolov · 2024-10-02T18:02:07Z

Have we considered providing this as an alternative Query implementation, rather than complicating the existing one?

benwtrent · 2024-10-02T18:06:16Z

@msokolov yeah, my suggested changes do that https://github.com/seanmacavaney/lucene/compare/seeds...benwtrent:lucene:seeds-refactor-idea?expand=1

basically, I think we can add an experimental interface & some experimental queries & collectors and by pass any significant changes to anything except for a small section in the HNSW searcher.

benwtrent · 2025-01-20T13:41:34Z

This has now been merged! Huzzah!

seanmacavaney · 2025-01-20T13:43:00Z

Thanks @benwtrent!

seanmacavaney added the type:enhancement label Aug 6, 2024

seanmacavaney mentioned this issue Aug 6, 2024

Add AbstractKnnVectorQuery.seed for seeded HNSW #13635

Closed

cpoerschke mentioned this issue Aug 23, 2024

support Lucene's (proposed) HNSW search seeding feature apache/solr#2664

Draft

benwtrent mentioned this issue Dec 20, 2024

Add two new "Seeded" Knn queries for seeded vector search #14084

Merged

benwtrent closed this as completed Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeding HNSW Search #13634

Seeding HNSW Search #13634

seanmacavaney commented Aug 6, 2024

benwtrent commented Aug 6, 2024

seanmacavaney commented Aug 6, 2024

msokolov commented Oct 2, 2024

benwtrent commented Oct 2, 2024

benwtrent commented Jan 20, 2025

seanmacavaney commented Jan 20, 2025

Seeding HNSW Search #13634

Seeding HNSW Search #13634

Comments

seanmacavaney commented Aug 6, 2024

Description

benwtrent commented Aug 6, 2024

seanmacavaney commented Aug 6, 2024

msokolov commented Oct 2, 2024

benwtrent commented Oct 2, 2024

benwtrent commented Jan 20, 2025

seanmacavaney commented Jan 20, 2025