Create new “How to tell if a REST embedder is compatible with Meilisearch” guide #3041

guimachiavelli · 2024-11-11T17:22:44Z

Recent customer feedback indicates users are struggling to move beyond the basic AI-powered search tutorial and implement hybrid search in their own projects. One specific thing users who are already familiar with LLMs struggle with is knowing whether their specific embedder is compatible with Meilisearch.

We should write a guide defining minimum requirements and what meilisearch looks for when sending documents and receiving vectors.

What is important from an API perspective?
Are there significant differences between using a REST embedder with Meilisearch when compared to other applications?
API-specific questions aside, are there other details users must be aware of? For example, if Meilisearch tends to reach rate limits or timeouts more easily?
pinging @dureuill for the above questions

dureuill · 2024-11-12T10:59:08Z

What is important from an API perspective?

The remote embedder must offer an endpoint that accepts a JSON request and returns a JSON response.
The JSON request must provide a way to inject the text to embed.
The JSON response must contain the embedding matching the text to embed that was sent in the request.
Optionally, Meilisearch supports sending up to 10 texts to embed in a single request. In that case, the JSON response must contain exactly as many embeddings as the number of texts in the JSON request.

Are there significant differences between using a REST embedder with Meilisearch when compared to other applications?

I'm not sure I understand the question. Can you elaborate on which "other applications" you're thinking of?

API-specific questions aside, are there other details users must be aware of? For example, if Meilisearch tends to reach rate limits or timeouts more easily?

When indexing, Meilisearch will attempt to make up to 40 requests in parallel, which is enough to reach most rate limits.

When rate limiting happens, the remote server should return HTTP 429.
On HTTP 429 or any HTTP 5xx, or if the connection to the remote server cannot be established, Meilisearch will retry the request for up to 10 tries. Each try will be delayed with an exponential backoff strategy and a delay ranging from 1-2ms (first retry) to 1-2mins (from the 6th retry on).
Any other HTTP 4xx will trigger an immediate failure of the indexing operation.

For the OpenAI embedder, Meilisearch supports retrying when the input is too long. For other embedders, the behavior on inputs that are too long to embed will be embedder-specific, and Meilisearch does not implement any logic to accommodate this case, which will likely result in a failed indexing operation. Therefore, the user should take care not to cause any failure related to a too large input. Meilisearch provides a way to truncate the rendered document template to a fixed number of bytes (the default is 400 bytes).

Regarding authentication, the apiKey parameter is injected in the Authorization standard header using the "Bearer" scheme, meaning that if apiKey: foo, then the Authorization header will be Bearer foo.

It is possible to accommodate non-standard authentication schemes with the custom headers parameter of the REST embedder.

guimachiavelli · 2024-11-12T16:08:40Z

Thanks, @dureuill!

Can you elaborate on which "other applications" you're thinking of?

What I mean is: if I have used e.g. Mistral with Algolia, will it be pretty much the same thing if I use Mistral with Meilisearch? Are there any specific adaptations I need to make when using Mistral with Meilisearch besides complying with our API? E.g. an extra preprocessing step prior to submitting documents for vectorization?

dureuill · 2024-11-12T16:42:01Z

Alright, I see what you mean, but we need to investigate to create this kind of knowledge.

We also need to keep in mind that other applications are evolving over time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create new “How to tell if a REST embedder is compatible with Meilisearch” guide #3041

Create new “How to tell if a REST embedder is compatible with Meilisearch” guide #3041

guimachiavelli commented Nov 11, 2024

dureuill commented Nov 12, 2024 •

edited

Loading

guimachiavelli commented Nov 12, 2024

dureuill commented Nov 12, 2024 •

edited

Loading

Create new “How to tell if a REST embedder is compatible with Meilisearch” guide #3041

Create new “How to tell if a REST embedder is compatible with Meilisearch” guide #3041

Comments

guimachiavelli commented Nov 11, 2024

dureuill commented Nov 12, 2024 • edited Loading

guimachiavelli commented Nov 12, 2024

dureuill commented Nov 12, 2024 • edited Loading

dureuill commented Nov 12, 2024 •

edited

Loading

dureuill commented Nov 12, 2024 •

edited

Loading