Add feature : Support for Elasticsearch as a scalar database #59

powerli2002 · 2024-12-09T14:52:32Z

This PR partially addresses the #57 (comment), and I plan to complete this task in two steps. This PR focus on the implementation of ES (Elasticsearch) features.

Abstract: Support for Elasticsearch as a scalar database has been implemented in ModelCache.

Notes:

ES Primary Key Issue
- Since Elasticsearch uses auto-generated UUIDs as the _id primary key, while in the project, vector databases (e.g., Milvus) handle parameters using INT64.
- To address this, Snowflake IDs are used instead of the UUIDs generated by ES, utilizing the snowflake-id library.
- Snowflake IDs are generated based on timestamps and a machine-allocated identifier (set to 1 in the code), and support generating multiple IDs within the same millisecond on the same node, with 12 bits allowing for up to 4096 IDs per ms per node. Snowflake IDs are sorted in ascending order by time, and proper configuration of the machine identifier prevents ID collisions within the system.
- Drawbacks:
  1. Strong dependence on machine clock; if the clock is rolled back on the machine, it can lead to duplicate ID generation.
  2. Higher time consumption.
- Inserting 100 records, using Milvus as the vector database:
  - Using Snowflake IDs with ES: time used: 90.179
  - Not using Snowflake IDs with ES: time used: 65.817
  - MySQL: time used: 31.118
If there are better methods, please let me know, and I will make further modifications.
timm Library Support
- Due to limited documentation on timm support in the repository, I have made similar updates to the files under the modelcache_mm folder based on my understanding. If there are any issues, please contact me.

peng3307165 · 2024-12-12T03:39:37Z

Thank you for your contribution! We have received your pull request and will review it shortly.
Best regards！

peng3307165 · 2024-12-23T07:03:16Z

Thank you for your contribution to the ModelCache project! we've accepted your code.
Regarding your concerns about performance in Elasticsearch, we will follow up on this matter. If there are any updates, we will contact you.
Best regards！

powerli2002 added 2 commits December 9, 2024 22:09

Add feature : Using elasticsearch as a scalar database

2d692ae

Add feature : add timm support

682179b

powerli2002 and others added 2 commits December 22, 2024 10:00

update requirements.txt

8753cb3

Merge branch 'main' into add-es

7901e49

peng3307165 merged commit cdba51f into codefuse-ai:main Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add feature : Support for Elasticsearch as a scalar database #59

Add feature : Support for Elasticsearch as a scalar database #59

Uh oh!

powerli2002 commented Dec 9, 2024

Uh oh!

peng3307165 commented Dec 12, 2024

Uh oh!

peng3307165 commented Dec 23, 2024

Uh oh!

Uh oh!

Add feature : Support for Elasticsearch as a scalar database #59

Add feature : Support for Elasticsearch as a scalar database #59

Uh oh!

Conversation

powerli2002 commented Dec 9, 2024

Uh oh!

peng3307165 commented Dec 12, 2024

Uh oh!

peng3307165 commented Dec 23, 2024

Uh oh!

Uh oh!