Skip to content

Add feature : add chromadb support as a vector database #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 22, 2024

Conversation

powerli2002
Copy link
Contributor

@powerli2002 powerli2002 commented Dec 14, 2024

This PR is independent of ES and is based on the latest main branch.

This PR partially addresses the #57 (comment)

Abstract
Support for chroma as a vector database has been implemented in ModelCache.

Notes:
The implementation of chromadb in this PR uses the chromadb.PersistentClient method to persist it locally. According to the official documentation, this is not a method suitable for production environments. If changed to HttpClient or AsyncHttpClient, the chroma run --path /db_path command needs to be run in advance, which might need to be mentioned in the README document.

Example for chromadb_config.ini

[chromadb]  
persist_directory=./chromadb

I have found some inconsistencies in multicache, such as:

  • For the vector database, the logic implemented with Redis is that different types of data are stored in different index, while this logic does not exist in Faiss.
  • Multiple method names are inconsistent with the corresponding vector database methods of the non-multimodal modelcache. For example, rebuilding the database has the multimodal method: def rebuild_idx(self, model):, and the non-multimodal method: def rebuild_col(self, model):.
  • In the multimodal Redis implementation logic, almost all methods use the parameter mm_type, but when calling these methods, there are cases where this parameter is not passed, such as in adddelete, etc. However, due to the need to store data, I still chose to store the data in different collections.
  • Additionally, some software packages may not have been included in the requirements.txt.

Given the lack of relevant documentation and my limited understanding of certain features of multicache, as well as the above confusions, my multimodal implementation is for reference only, and if there are any issues, feel free to contact me.

@peng3307165
Copy link
Collaborator

Thank you for your continued work. We plan to send you a CodeFuse souvenir. You can also participate in activities related to CodeFuse open source in the future. Can we obtain your contact information? My email is: hongen.phe@antgroup.com

@powerli2002
Copy link
Contributor Author

Thank you for your continued work. We plan to send you a CodeFuse souvenir. You can also participate in activities related to CodeFuse open source in the future. Can we obtain your contact information? My email is: hongen.phe@antgroup.com

Thank you kindly. I have contacted you via my private Foxmail email. Please check your inbox at your convenience.

@peng3307165 peng3307165 merged commit 3dad91c into codefuse-ai:main Dec 22, 2024
@peng3307165
Copy link
Collaborator

Thank you for your contribution to the ModelCache project! we've accepted your code. We truly appreciate your efforts and collaboration. Best wishes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants