Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support rowid in (...) constraints in vss_search() KNN queries #19

Open
asg017 opened this issue May 12, 2023 · 5 comments
Open

Support rowid in (...) constraints in vss_search() KNN queries #19

asg017 opened this issue May 12, 2023 · 5 comments

Comments

@asg017
Copy link
Owner

asg017 commented May 12, 2023

In KNN style searches, we should support rowid in (...) constraints in queries like so:

select rowid, distance
from vss_articles
where vss_search(description_embeddings, :query_vector)
  and rowid in (1, 2, 3, ..., 100)
limit 25

Currently we ignore the "equals" constraint on rowid, but if we were to capture that constraint (and enable sqlite3_vtab_in), we could read in all the rowids and use IDSelector to pre-filter results.

This would be especially great when paired with subqueries:

with subset as (
  select rowid
  from articles
  where published_at between '2022-01-01' and '2023-01-01'
    and newsroom = 'NY Times'
)
select rowid, distance
from vss_articles
where vss_search(description_embeddings, :query_vector)
  and rowid in (select rowid in subset)
limit 25

This would enable "pre-filtering" according to this post. This would be an easy-to-implement but probably-slow solution to push-down filters described in #2.

@asg017
Copy link
Owner Author

asg017 commented May 22, 2023

Use IDSelectorBatch.

need to figure out idxStr/idxNum rules

@teowave
Copy link

teowave commented Jul 18, 2023

I suppose then we can do pre-filtering with standard SQL and then feed the resulting rowids into the vss query. Nice.

Definitely nicer than getting 1000 "wide net" results from the vss query and then filtering.

That being said, in a document searching app I am working on I do 50 top searches and then filter, seems to work, albeit we never have the certainty that we are not missing something important.

@teowave
Copy link

teowave commented Jul 18, 2023

Use IDSelectorBatch.

need to figure out idxStr/idxNum rules

I didn´t get the meaning of this one - can you please expand for us noobies?

@asg017
Copy link
Owner Author

asg017 commented Jul 18, 2023

Those are mostly personal notes about how to implement this feature. IDSelectorBatch is Faiss tool that'll make it easier to search a large subset of vectors, and idxStr/idxNum refer to some internal changes I need to make to the vss0 module in order to make this compatible with older code.

I'll probably work on this next after the new v0.1.1 releases this week!

@sutyum
Copy link

sutyum commented Mar 22, 2024

Any update on this? @asg017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants