Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new “How to tell if a REST embedder is compatible with Meilisearch” guide #3041

Open
guimachiavelli opened this issue Nov 11, 2024 · 3 comments

Comments

@guimachiavelli
Copy link
Member

Recent customer feedback indicates users are struggling to move beyond the basic AI-powered search tutorial and implement hybrid search in their own projects. One specific thing users who are already familiar with LLMs struggle with is knowing whether their specific embedder is compatible with Meilisearch.

We should write a guide defining minimum requirements and what meilisearch looks for when sending documents and receiving vectors.

  • What is important from an API perspective?
  • Are there significant differences between using a REST embedder with Meilisearch when compared to other applications?
  • API-specific questions aside, are there other details users must be aware of? For example, if Meilisearch tends to reach rate limits or timeouts more easily?
  • pinging @dureuill for the above questions
@dureuill
Copy link
Contributor

dureuill commented Nov 12, 2024

What is important from an API perspective?

  • The remote embedder must offer an endpoint that accepts a JSON request and returns a JSON response.
  • The JSON request must provide a way to inject the text to embed.
  • The JSON response must contain the embedding matching the text to embed that was sent in the request.
  • Optionally, Meilisearch supports sending up to 10 texts to embed in a single request. In that case, the JSON response must contain exactly as many embeddings as the number of texts in the JSON request.

Are there significant differences between using a REST embedder with Meilisearch when compared to other applications?

I'm not sure I understand the question. Can you elaborate on which "other applications" you're thinking of?

API-specific questions aside, are there other details users must be aware of? For example, if Meilisearch tends to reach rate limits or timeouts more easily?

When indexing, Meilisearch will attempt to make up to 40 requests in parallel, which is enough to reach most rate limits.

  • When rate limiting happens, the remote server should return HTTP 429.
  • On HTTP 429 or any HTTP 5xx, or if the connection to the remote server cannot be established, Meilisearch will retry the request for up to 10 tries. Each try will be delayed with an exponential backoff strategy and a delay ranging from 1-2ms (first retry) to 1-2mins (from the 6th retry on).
  • Any other HTTP 4xx will trigger an immediate failure of the indexing operation.

For the OpenAI embedder, Meilisearch supports retrying when the input is too long. For other embedders, the behavior on inputs that are too long to embed will be embedder-specific, and Meilisearch does not implement any logic to accommodate this case, which will likely result in a failed indexing operation. Therefore, the user should take care not to cause any failure related to a too large input. Meilisearch provides a way to truncate the rendered document template to a fixed number of bytes (the default is 400 bytes).

Regarding authentication, the apiKey parameter is injected in the Authorization standard header using the "Bearer" scheme, meaning that if apiKey: foo, then the Authorization header will be Bearer foo.

It is possible to accommodate non-standard authentication schemes with the custom headers parameter of the REST embedder.

@guimachiavelli
Copy link
Member Author

Thanks, @dureuill!

Can you elaborate on which "other applications" you're thinking of?

What I mean is: if I have used e.g. Mistral with Algolia, will it be pretty much the same thing if I use Mistral with Meilisearch? Are there any specific adaptations I need to make when using Mistral with Meilisearch besides complying with our API? E.g. an extra preprocessing step prior to submitting documents for vectorization?

@dureuill
Copy link
Contributor

dureuill commented Nov 12, 2024

Alright, I see what you mean, but we need to investigate to create this kind of knowledge.

We also need to keep in mind that other applications are evolving over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants