Meilisearch v1.6 focuses on improving indexing performance. This new release also adds hybrid search and simplifies the process of generating embeddings for semantic search.
🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.
Some SDKs might not include all new features—consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we'll love you for that ❤️).
With v1.6, you can configure Meilisearch so it automatically generates embeddings using either OpenAI or HuggingFace. If neither of these third-party options suit your application, you may provide your own embeddings manually:
openAI
: Meilisearch uses the OpenAI API to auto-embed your documents. You must supply an OpenAPI key to use this embedderhuggingFace
: Meilisearch automatically downloads the specifiedmodel
from HuggingFace and generates embeddings locally. This will use your CPU and may impact indexing performanceuserProvided
: Compute embeddings manually and supply document vectors to Meilisearch. You may be familiar with this approach if you have used vector search in a previous Meilisearch release. Read further for details on breaking changes for user provided embeddings usage
Use the embedders
index setting to configure embedders. You may set multiple embedders for an index. This example defines 3 embedders named default
, image
and translation
:
curl \
-X PATCH 'http://localhost:7700/indexes/movies/settings' \
-H 'Content-Type: application/json' \
--data-binary '{
"embedders": {
"default": {
"source": "openAi",
"apiKey": "<your-OpenAI-API-key>",
"model": "text-embedding-ada-002",
"documentTemplate": "A movie titled \'{{doc.title}}\' whose description starts with {{doc.overview|truncatewords: 20}}"
},
"image": {
"source": "userProvided",
"dimensions": 512,
},
"translation": {
"source": "huggingFace",
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
"documentTemplate": "A movie titled \'{{doc.title}}\' whose description starts with {{doc.overview|truncatewords: 20}}"
}
}
}'
documentTemplate
is a view of your document that will serve as the base for computing the embedding. This field is a JSON string in the Liquid formatmodel
is the model OpenAI or HuggingFace should use when generating document embeddings
Refer to the documentation for more vector search usage instructions.
If you have used vector search between v1.3.0 and v1.5.0, API usage has changed with v1.6:
-
When providing both the
q
andvector
parameters for a single query, you must provide thehybrid
parameter -
Define a model in your embedder settings is now mandatory:
"embedders": {
"default": {
"source": "userProvided",
"dimensions": 512
}
}
- Vectors should be JSON objects instead of arrays:
"_vectors": { "image2text": [0.0, 0.1, …] } # ✅
"_vectors": [ [0.0, 0.1] ] # ❌
Done in #4226 by @dureuill, @irevoire, @Kerollmops and @ManyTheFish.
This release introduces hybrid search functionality. Hybrid search allows users to mix keyword and semantic search at search time.
Use the hybrid
search parameter to perform a hybrid search:
curl \
-X POST 'http://localhost:7700/indexes/movies/search' \
-H 'Content-Type: application/json' \
--data-binary '{
"q": "Plumbers and dinosaurs",
"hybrid": {
"semanticRatio": 0.9,
"embedder": "default"
}
}'
embedder
is the embedder you choose to perform the search among the ones you defined in your settingssemanticRatio
is a number between0
and1
. The default value is0.5
.1
corresponds to a full semantic search and0
corresponds to keyword search
Tip
The new vector search functionality uses Arroy, a Rust library developed by the Meilisearch engine team. Check out @Kerollmops blog post describing the whole process.
Done in #4226 by @dureuill, @irevoire, @Kerollmops and @ManyTheFish.
This version introduces significant indexing performance improvements. Meilisearch v1.6 has been optimized to:
- store and pre-compute less data than in previous versions
- re-index and delete only the necessary data when updating a document. For example, when you update one document field, Meilisearch will no longer re-index the whole document
On an e-commerce dataset of 2.5Gb of documents, these changes led to more than a 50% time reduction when adding documents for the first time. When updating documents frequently and partially, re-indexing performance hovers between 50% and 75%.
Done in #4090 by @ManyTheFish, @dureuill and @Kerollmops.
Meilisearch now stores less internal data. This leads to smaller database disk sizes.
With a ~15Mb dataset, the created database is 40% and 50% smaller. Additionally, the database size has become more stable and will display more modest growth with new document additions.
You can now customize the accuracy of the proximity ranking rule.
Computing this ranking rule uses a significant amount of resources and may lead to increased indexing times. Lowering its precision may lead to significant performance gains. In a minority of use cases, lower proximity precision may also impact relevancy for queries using multiple search terms.
curl \
-X PATCH 'http://localhost:7700/indexes/books/settings/proximity-precision' \
-H 'Content-Type: application/json' \
--data-binary '{
"proximityPrecision": "byAttribute"
}'
proximityPrecision
accepts either byWord
or byAttribute
:
byWord
calculates the exact distance between words. This is the default setting.byAttribute
only determines whether words are present in the same attribute. It is less accurate, but provides better performance.
Done in #4225 by @ManyTheFish.
Meilisearch may occasionally batch too many tasks together, which may lead to system instability. Relaunch Meilisearch with the --experimental-max-number-of-batched-tasks
configuration option to address this issue:
./meilisearch --experimental-max-number-of-batched-tasks 100
You may also configure --experimental-max-number-of-batched-tasks
as an environment variable or directly in the config file with MEILI_EXPERIMENTAL_MAX_NUMBER_OF_BATCHED_TASKS
.
Done in #4249 by @Kerollmops
This release introduces a configurable webhook url that will be called whenever Meilisearch finishes processing a task.
Relaunch Meilisearch using --task-webhook-url
and --task-webhook-authorization-header
to use the webhook:
./meilisearch \
--task-webhook-url=https://example.com/example-webhook?foo=bar&number=8 \
--task-webhook-authorization-header=Bearer aSampleAPISearchKey
You may also define the webhook URL and header with environment variables or in the configuration file with MEILI_TASK_WEBHOOK_URL
and MEILI_TASK_WEBHOOK_AUTHORIZATION_HEADER
.
Done by @irevoire in #4238
- Fix document formatting performances during search (#4313) @ManyTheFish
- The dump tasks are now cancellable (#4208) @irevoire
- Fix: the payload size limit is now also applied to all routes, not only routes to add and update documents (#4231) @Karribalu
- Fix: typo tolerance is ineffective for attributes with similar content (related issue: #4256)
- Fix: the geosort is no longer ignored after the first bucket of a preceding
sort
ranking rule (#4226) - Fix hang on
/indexes
and/stats
routes (#4308) @dureuill - Limit the number of values returned by the facet search based on
maxValuePerFacet
setting (#4311) @Kerollmops
- Dependencies upgrade
- Updating CI dependencies
- Update to heed 0.20 (#4223) @Kerollmops
- Set rust toolchain to 1.71.1 in Dockerfile (#4261) @dureuill
- Update mini-dashboard to v0.2.12 (#4277) @mdubus
- Documentation
- Remove banner (#4191) @curquiza
- Misc
- Extract the creation and last updated timestamp from v2 dumps (#4132) @vivek-26
- Fix puffin in the index scheduler (#4234) @irevoire
- Remove the actix-web dependency from (#4239) @Kerollmops
❤️ Thanks again to our external contributors: @Karribalu, and @vivek-26