-
Notifications
You must be signed in to change notification settings - Fork 1.3k
De-dup raw vectors? #15440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
De-dup raw vectors? #15440
Conversation
File LayoutToday, the The Proposing to change the Correspondingly, the In case of duplicate vectors within a document, we can simply "point" to a pre-existing vector, without writing another copy on disk! Earlier, the offset of the vector at ordinal Now, we're storing an additional |
Notes
BenchmarkIn order to index everything in a single segment, I had to:
Made use of the option added in mikemccand/luceneutil#468 ( Cohere vectors, 768d,
This PR Note the reduction in |
Description
Closes #14758
Demonstrating the proposal to de-duplicate raw vectors in Lucene!
Note: Right now this is very crude, and only for demonstration purposes.