Improve vectorizer kwargs and typing #291

tylerhutcherson · 2025-02-27T14:58:56Z

Changes Made

Expanded Type Support:
- Updated return type signatures across all vectorizers to properly reflect the ability to return either data lists (List[float]) or binary buffers (bytes)
- Added special handling for Cohere's integer embedding types (List[int])
Standardized Interface:
- Uniform type annotations and docstrings across all vectorizer implementations
- Consistent default batch sizes (10) for better predictability
Improved Provider-Specific Support:
- Enhanced kwargs forwarding to allow passing provider-specific parameters
- Better warnings for deprecated parameters (like Cohere's embedding_types)
Fixed Type Checking:
- Added strategic type ignores to resolve MyPy errors
- Made minimal changes to consumer code to handle the expanded return types

Motivation

These changes create a more consistent and flexible vectorizer interface that:

Accurately represents what the methods can return
Accommodates provider-specific features (like Cohere's integer embeddings)
Provides clearer documentation for users
Maintains backward compatibility

Future Improvements

For future consideration:

Introduce helper methods (like embed_as_list()) that guarantee specific return types when needed
Add more robust type conversion in consumer code that relies on specific types
Develop a cleaner separation between the base vectorizer interface and provider-specific extensions
Consider a more structured approach to provider-specific parameters

tylerhutcherson · 2025-02-28T14:53:47Z

Looking for feedback on the approach here, team. Let me know what you think. This doesn't break the interface or user experience, but enables us to leverage the cohere model for int8 embeddings. It also exposes maybe some gaps in the current broader implementation that we should review in prep for a 1.x.x release later this year.

@abrookins @rbs333 @bsbodden @justin-cechmanek

justin-cechmanek · 2025-02-28T18:45:04Z

I like the standardization, and better support for passing through kwargs. No critiques from me.

abrookins

Nice cleanup! 🔥

abrookins · 2025-03-04T18:43:57Z

redisvl/utils/vectorize/text/cohere.py

-            texts=[text], model=self.model, input_type=input_type
-        ).embeddings[0]
+        # Check if embedding_types was provided and warn user
+        if "embedding_types" in kwargs:


add kwargs support to all vectorizer embed methods

a354220

tylerhutcherson added the enhancement New feature or request label Feb 27, 2025

tylerhutcherson added 3 commits February 27, 2025 14:13

vectorizer typing changes

d51e4f3

bring back vectorizer tests

9a3eb08

reset default batch_size to 10 for all vectorizers

7f3a0fe

tylerhutcherson requested a review from abrookins February 27, 2025 19:21

tylerhutcherson changed the title ~~Add kwargs support to all vectorizer embed methods~~ Improve vectorizer type support Feb 27, 2025

fix test

56931ac

tylerhutcherson marked this pull request as ready for review February 27, 2025 20:47

tylerhutcherson requested review from justin-cechmanek and rbs333 February 28, 2025 14:52

justin-cechmanek approved these changes Feb 28, 2025

View reviewed changes

abrookins approved these changes Mar 4, 2025

View reviewed changes

tylerhutcherson changed the title ~~Improve vectorizer type support~~ Improve vectorizer kwargs and typing Mar 5, 2025

tylerhutcherson merged commit 38c0a60 into main Mar 5, 2025
36 checks passed

tylerhutcherson deleted the feat/RAAE-675-support-vectorizer-embed-kwargs branch March 5, 2025 01:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve vectorizer kwargs and typing #291

Improve vectorizer kwargs and typing #291

Uh oh!

tylerhutcherson commented Feb 27, 2025 •

edited

Loading

Uh oh!

tylerhutcherson commented Feb 28, 2025

Uh oh!

justin-cechmanek commented Feb 28, 2025

Uh oh!

abrookins left a comment

Uh oh!

abrookins Mar 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Improve vectorizer kwargs and typing #291

Improve vectorizer kwargs and typing #291

Uh oh!

Conversation

tylerhutcherson commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

Motivation

Future Improvements

Uh oh!

tylerhutcherson commented Feb 28, 2025

Uh oh!

justin-cechmanek commented Feb 28, 2025

Uh oh!

abrookins left a comment

Choose a reason for hiding this comment

Uh oh!

abrookins Mar 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tylerhutcherson commented Feb 27, 2025 •

edited

Loading