-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor README and Vicinity class to support any serializable item type #56
Conversation
- Updated README.md to clarify that items can be strings or other serializable objects. - Modified the Vicinity class to accept a broader range of item types by changing type hints from `str` to `Any` in several methods. - Enhanced the insert and delete methods to handle non-string tokens appropriately, ensuring that items can be checked and managed regardless of their type.
… and evaluating backends
Codecov ReportAll modified and coverable lines are covered by tests ✅
|
- Simplified the logic for checking and appending tokens in the insert method, ensuring that duplicate tokens are properly managed.
- Updated the `items` fixture to return a mix of dictionaries and strings based on index parity. - Modified `test_vicinity_insert_duplicate` to use the updated `items` fixture for inserting items. - Adjusted `test_vicinity_delete_and_query` to reference items by their indices instead of hardcoded values. - Enhanced the Vicinity class to streamline token management, ensuring proper handling of duplicates and improving error messaging for token deletions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thanks! One improvement I see is that we can explicitly throw a ValueError
on trying to save if the items are not JSON serializable. e.g., by catching a JsonEncodeError
and giving a more informative warning.
Another improvement is the typing, but I'll do that in a follow-up because I actually don't know what the best way is 💀
Co-authored-by: Stephan Tulkens <stephantul@gmail.com>
…ling - Replaced the nested loop for checking duplicates with a single extend operation for tokens. - Improved efficiency by directly appending tokens to the items list, ensuring proper management of duplicates.
…ling - Replaced the nested loop for token matching with a more efficient list comprehension. - Enhanced error messaging to specify which tokens were not found in the vector space.
- Added a try-except block around the JSON serialization process to catch JSONEncodeError.
- Introduced a new pytest fixture `non_serializable_items` that generates a list of non-serializable objects for testing. - Added a test case `test_vicinity_save_and_load_non_serializable_items` to verify that saving a Vicinity instance with non-serializable items raises a JSONEncodeError. - Updated the Vicinity class documentation to specify that JSONEncodeError may be raised if items are not serializable.
I was playing around and realised the items don't necessarily need to be strings.
Update
str
toAny
in several methods.Question
Also, why do we use tokens as argument names for items in the insert and delete methods?
Example
Something like the following works now. Slower during insert and deletion but I would say the flexibility is worth it.