You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
No matter whether running local modes / remote models via APIs, OpenSearch's model registry / deploy process involves a few Async API calls, which is hard to manage.
`OpenSearch's local model serving is built on a Java solution that seems not stable (as of v2.15.0). I am having issues/errors with large models.
Performance considerations (based on self not very accurate tests):
Based on tests, node.js-based solution serving ONNX format model (gte-base-en-v1.5) is able to produce embeddings for 3 short strings around 40ms - 60ms.
Python solution (use SentenceTransformers): around 500ms
OpenSearch local models: around 140ms - 160ms
It might be easier to accommodate a customised search/indexing design with our own solution.
Recent models come with larger max seq. Limit (8k to 32k) and can offer similar or better performance for different tasks with query side instructions only (i.e. no index side instructions). See huggingface MTEB leaderboard
No need to have different vector fields for different tasks
For performance consideration & deploy cost (especially RAM), can use a single vector field to store embedding of text aggregates information from multiple fields
Hybrid Search
Power existing search APIs with Hybrid Search (Combining Semantic / Vector Search and Full-text keyword-based search for better search results).
This is the first step in LLM-powered search engine development. The motivation for this development is:
Acceptance Criteria
Technical Notes
Based on the recent research & evaluation:
OpenSearch
's model registry / deploy process involves a few Async API calls, which is hard to manage.SentenceTransformers
): around 500msThe text was updated successfully, but these errors were encountered: