Skip to content

Add feature : Support for Elasticsearch as a scalar database #59

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 23, 2024

Conversation

powerli2002
Copy link
Contributor

This PR partially addresses the #57 (comment), and I plan to complete this task in two steps. This PR focus on the implementation of ES (Elasticsearch) features.

Abstract: Support for Elasticsearch as a scalar database has been implemented in ModelCache.

Notes:

  1. ES Primary Key Issue

    • Since Elasticsearch uses auto-generated UUIDs as the _id primary key, while in the project, vector databases (e.g., Milvus) handle parameters using INT64.
    • To address this, Snowflake IDs are used instead of the UUIDs generated by ES, utilizing the snowflake-id library.
    • Snowflake IDs are generated based on timestamps and a machine-allocated identifier (set to 1 in the code), and support generating multiple IDs within the same millisecond on the same node, with 12 bits allowing for up to 4096 IDs per ms per node. Snowflake IDs are sorted in ascending order by time, and proper configuration of the machine identifier prevents ID collisions within the system.
    • Drawbacks:
      1. Strong dependence on machine clock; if the clock is rolled back on the machine, it can lead to duplicate ID generation.
      2. Higher time consumption.
    • Inserting 100 records, using Milvus as the vector database:
      • Using Snowflake IDs with ES: time used: 90.179
      • Not using Snowflake IDs with ES: time used: 65.817
      • MySQL: time used: 31.118

    If there are better methods, please let me know, and I will make further modifications.

  2. timm Library Support

    • Due to limited documentation on timm support in the repository, I have made similar updates to the files under the modelcache_mm folder based on my understanding. If there are any issues, please contact me.

@peng3307165
Copy link
Collaborator

Thank you for your contribution! We have received your pull request and will review it shortly.
Best regards!

@peng3307165 peng3307165 merged commit cdba51f into codefuse-ai:main Dec 23, 2024
@peng3307165
Copy link
Collaborator

Thank you for your contribution to the ModelCache project! we've accepted your code.
Regarding your concerns about performance in Elasticsearch, we will follow up on this matter. If there are any updates, we will contact you.
Best regards!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants