Is it possible to (efficiently) query different subsets of the database vectors for different query vectors? #3580
tomleung1996
started this conversation in
General
Replies: 1 comment
-
This looks like a filtered search problem. The answer depends on what the filtering criterion would be. If the dataset is clustered then you can build one index per clustrer. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I would like to calculate the similarities between the embedding of a paper and the embeddings of its cited references, and I have a lot of them (~30 million).
By using multiple GPUs, I was able to calculate pair-wise similarities between all papers, but I could only save the top K most similar results. The problem is that the cited references of a paper are not always the most similar papers in terms of semantic distance. Even though I have calculated all pair-wise similarities, I cannot obtain my desired results.
Therefore, I am wondering if is it possible to (efficiently) query different subsets of the database vectors for different query vectors. Or maybe there is a smarter way to achieve my goal?
Beta Was this translation helpful? Give feedback.
All reactions