-
Notifications
You must be signed in to change notification settings - Fork 57
Adds multi vector query class #402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset. In case there are security findings, they will be communicated to you as a comment inside the PR. Hope you’ll enjoy using Jit. Questions? Comments? Want to learn more? Get in touch with us. |
Very nice. I do think it could be interesting if this vector class also handled the conversion to bytes (and thus any compression based on choice of dtype) I think we handle this within the query class today but some of that logic could be pulled into here as a true data container for vectors -- might be a cleaner and more testable pattern? @bsbodden @abrookins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add new query class to the sphinx docs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds multi-vector query functionality to the Redis Vector Library, enabling simultaneous search across multiple vector fields in a document with weighted score combination.
- Introduces new
MultiVectorQuery
class andVector
helper class for multi-vector search capabilities - Adds comprehensive test coverage for the new multi-vector query functionality including unit tests and integration tests
- Updates documentation to include the new Vector and MultiVectorQuery classes
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
redisvl/query/aggregate.py | Implements Vector class and MultiVectorQuery class with validation and query building logic |
redisvl/query/init.py | Exports new MultiVectorQuery and Vector classes |
tests/unit/test_aggregation_types.py | Adds unit tests for MultiVectorQuery and Vector classes with validation and initialization tests |
tests/integration/test_aggregation.py | Adds integration tests for multi-vector queries with various scenarios including filters, weights, and datatypes |
tests/conftest.py | Adds multi_vector_data fixture with sample data for testing |
docs/api/vector.rst | Adds documentation for the Vector class |
docs/api/query.rst | Adds documentation for the MultiVectorQuery class |
docs/api/index.md | Updates API documentation index to include vector documentation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@tylerhutcherson Yes, I think it makes sense to have fields own converting themselves to from redisvl.redis.utils import array_to_buffer
class Vector(BaseModel):
# ...
def as_buffer(self) -> bytes:
if isinstance(self.vector, (bytes, bytearray)):
return bytes(self.vector)
return array_to_buffer(self.vector, dtype=self.dtype) We should be able to touch that up later though, if we want to get in a first pass. Another thing I was wondering. This PR hard-codes the distance at 2.0 (all items?), but wouldn't it also work to expose that as a parameter on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @tylerhutcherson that making Vector
the owner of conversion logic makes sense and encapsulates that better.
I had a question (posted separately) about whether we'd want to expose distance threshold as a parameter on Vector
.
Neither are blocking, so I'd say if you want to keep the momentum and get this merged, and then consider those other improvements, I say 🫡 to that.
Seconded on the Copilot stuff, as well (just spelling, AFAIK).
No description provided.