-
Notifications
You must be signed in to change notification settings - Fork 46
Improve vectorizer kwargs and typing #291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve vectorizer kwargs and typing #291
Conversation
Looking for feedback on the approach here, team. Let me know what you think. This doesn't break the interface or user experience, but enables us to leverage the cohere model for int8 embeddings. It also exposes maybe some gaps in the current broader implementation that we should review in prep for a 1.x.x release later this year. |
I like the standardization, and better support for passing through kwargs. No critiques from me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup! 🔥
texts=[text], model=self.model, input_type=input_type | ||
).embeddings[0] | ||
# Check if embedding_types was provided and warn user | ||
if "embedding_types" in kwargs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helpful!
Changes Made
Expanded Type Support:
List[float]
) or binary buffers (bytes
)List[int]
)Standardized Interface:
Improved Provider-Specific Support:
embedding_types
)Fixed Type Checking:
Motivation
These changes create a more consistent and flexible vectorizer interface that:
Future Improvements
For future consideration:
embed_as_list()
) that guarantee specific return types when needed