24 Jun 10:40

dureuill

298c7b0

v1.9.0-rc.4 🦎 Pre-release

Pre-release

⚠️ Since this is a release candidate (RC), we do NOT recommend using it in a production environment. Is something not working as expected? We welcome bug reports and feedback about new features.

Bug fixes 🪲

Fix memory leak @dureuill in #4710 -- This issue was also fixed in v1.8.3
Fix panic in hybrid search when removing all embedders from the DB @irevoire in #4715

Improvements

Update mini-dashboard to 2.14 by @curquiza in #4712
Speed up facet distribution by @Kerollmops in #4713

❤️ Thanks to @sam-ulrich1 for reporting the panic in hybrid search in #4588

Full Changelog

Contributors

Kerollmops, irevoire, and 3 other contributors

Assets 7

19 Jun 16:10

dureuill

v1.8.3

7d69953

v1.8.3 🪼

Fixes 🪲

Fix memory leak @dureuill in #4707

❤️ Thanks @irevoire for reproducing the issue and checking fixes

❤️ Thanks to Tater, Hannsr, Martin from the Discord thread, and @doutatsu on GH for the report and helping with the investigation

Contributors

doutatsu, irevoire, and dureuill

Assets 8

18 Jun 09:15

irevoire

v1.9.0-rc.3

e580d6b

v1.9.0-rc.3 🦎 Pre-release

Pre-release

⚠️ Since this is a release candidate (RC), we do NOT recommend using it in a production environment. Is something not working as expected? We welcome bug reports and feedback about new features.

Bug fixes

Fix a meilisearch freeze that could happen under heavy search loads by @dureuill in #4681 -- Note that this bug is already fixed in Meilisearch v1.8.2

Breaking changes

The _vectors field is not returned anymore when retrieving documents; you must use the retrieveVector parameter instead
When retrieving the _vectors field with the retrieveVector parameter, their embeddings are not returned "as-is"; they'll always be returned with the maximum precision
When specifying or retrieving vectors, the userProvided field has been removed in favor of a new regenerate field that better represents your intent. When set to true it means the embeddings will be regenerated on every change to the document (default behavior). If set to false the embeddings will never be updated by the engine.
Dumps with embeddings created from previous RCs cannot be imported into the new RC

Improvements

Speed Up Filter ANDs operations by @Kerollmops in #4682
Speedup the vector store and reduce the size of the database by @irevoire and @dureuill in #4649
Define your distinct attributes at search time by @Kerollmops in #4693

Misc

Fix ci tests by @ManyTheFish in #4685

Full Changelog: v1.9.0-rc.2...v1.9.0-rc.3

Contributors

Kerollmops, ManyTheFish, and 2 other contributors

Assets 7

10 Jun 10:18

dureuill

v1.8.2

6c6c473

v1.8.2 🪼

Fixes 🪲

Fix concurrency issue by @dureuill in #4681

Thanks to @savikko for first reporting the issue ❤️

Contributors

savikko and dureuill

Assets 8

10 Jun 08:10

ManyTheFish

v1.9.0-rc.2

cb765ad

v1.9.0-rc.2 🦎 Pre-release

Pre-release

⚠️ Since this is a release candidate (RC), we do NOT recommend using it in a production environment. Is something not working as expected? We welcome bug reports and feedback about new features.

Meilisearch v1.9 includes performance improvements for hybrid search and the addition/updating of settings. This version benefits from multiple requested features, such as the new frequency matching strategy and the ability to retrieve similar documents.

Speedup additional searchable Attributes by @Kerollmops in #4680

When adding new fields in the searchableAttributes setting, the engine will only index the additional attributes instead of recomputing all the searchable attributes.

Update Charabia v0.8.11 by @ManyTheFish in #4684

The words containing œ or æ will be retrieved using oe or ae, like Daemon <=> Dæmon.

Misc

Fix: Test CI failing when enabling/disabling some features #4629

Contributors

Kerollmops and ManyTheFish

Assets 7

05 Jun 08:53

dureuill

v1.9.0-rc.1

98e062a

v1.9.0-rc.1 🦎 Pre-release

Pre-release

⚠️ Since this is a release candidate (RC), we do NOT recommend using it in a production environment. Is something not working as expected? We welcome bug reports and feedback about new features.

New features and updates 🔥

Filter by score

To filter returned documents by their ranking score, a new rankingScoreThreshold parameter has been added to the search and similar routes.

When a rankingScoreThreshold is provided, the results of the search/similar request are modified in the following way:

No document whose _rankingScore is under the rankingScoreThreshold is returned
Any document encountered during the search that is under the threshold is removed from the set of candidates and won’t count towards the estimatedTotalHits, totalHits and the facet distribution.

Examples

request without score threshold:

POST /indexes/movies/search
{
  "q": "Badman dark returns 1",
  "showRankingScore": true,
  "limit": 5
}

results:

{
	"hits": [
	    {
	      "title": "Batman the dark knight returns: Part 1",
	      "id": "A",
	      "_rankingScore": 0.93430081300813
	    },
	    {
	      "title": "Batman the dark knight returns: Part 2",
	      "id": "B",
	      "_rankingScore": 0.6685627880184332
	    },
	    {
	      "title": "Badman",
	      "id": "E",
	      "_rankingScore": 0.25
	    },
	    {
	      "title": "Batman Returns",
	      "id": "C",
	      "_rankingScore": 0.11553030303030302
	    },
	    {
	      "title": "Batman",
	      "id": "D",
	      "_rankingScore": 0.11553030303030302
	    }
	],
	"query": "Badman dark returns 1",
	"processingTimeMs": 11,
	"limit": 5,
	"offset": 0,
	"estimatedTotalHits": 62
}

request with score threshold:

POST /indexes/movies/search
{
  "q": "Badman dark returns 1",
  "showRankingScore": true,
  "limit": 5
  "rankingScoreThreshold": 0.2
}

results:

{
	"hits": [
	    {
	      "title": "Batman the dark knight returns: Part 1",
	      "id": "A",
	      "_rankingScore": 0.93430081300813
	    },
	    {
	      "title": "Batman the dark knight returns: Part 2",
	      "id": "B",
	      "_rankingScore": 0.6685627880184332
	    },
	    {
	      "title": "Badman",
	      "id": "E",
	      "_rankingScore": 0.25
	    }
	],
	"query": "Badman dark returns 1",
	"processingTimeMs": 11,
	"limit": 5,
	"offset": 0,
	"estimatedTotalHits": 3
}

Known limitations

⚠️ For performance reasons, if Meilisearch finds limit hits above the rankingScoreThreshold, then the ranking score of the remaining documents is not evaluated, and so they are not removed from the set of candidates, even if their ranking score would be below the threshold.

As a result, in this configuration the estimatedTotalHits, totalHits and the facet distribution may be overapproximation of their values.

Done by @dureuill in #4666

Other improvements

Misc

Dependencies updates
- Update actix-web 4.5.1 -> 4.6.0 done by @dureuill in #4675

Contributors

dureuill

Assets 7

03 Jun 08:31

curquiza

v1.9.0-rc.0

d6bd88c

v1.9.0-rc.0 🦎 Pre-release

Pre-release

⚠️ Since this is a release candidate (RC), we do NOT recommend using it in a production environment. Is something not working as expected? We welcome bug reports and feedback about new features.

New features and updates 🔥

Hybrid search improvements

Since we're focusing on AI innovation, this version introduces multiple improvements and changes related to hybrid search.
More detailed changelog here.

Done by @dureuill and @irevoire in #4633 and #4649

⚠️ Breaking changes of hybrid search usage

Before v1.9, an empty array in _vectors.embedder used to be interpreted as a single embedding of dimension 0 when specifying embeddings in documents. In v1.9 it is now interpreted as 0 embedding. The previous behavior was surprising and not useful.

Improvements

Meilisearch v1.9.0 improves performance when indexing and using hybrid search, avoiding useless operations and optimizing the important ones.

Get similar documents

To retrieve similar documents in your datasets, two new routes have been introduced

POST /indexes/:indexUid/similar using parameters in the request body.
GET /indexes/:indexUid/similar, using query URL parameters.

POST /indexes/:indexUid/similar
{
  // Mandatory: the external id of the target document
  "id": "23",
  // Optional, defaults to 0: how many results to skip
  "offset": 0,
  // Optional, defaults to 20: how many results to display
  "limit": 2,
  // Optional, defaults to null: an additional filter for the returned documents
  "filter": "release_date > 1521763199",
  // Optional, defaults to the default embedder: name of the embedder to use
  // for computing recommendations.
  "embedder": "default",
  // Optional, defaults to null: same as the search query parameter of the same name
  "attributesToRetrieve": [],
  // Both optional, defaults to false: allow displaying the ranking score
  // (resp. detailed ranking score)
  "showRankingScore": false,
  "showRankingScoreDetails": false
}

Done by @dureuill in #4647

`frequency` matching strategy when searching

A frequency variant to the matchingStrategy search parameter has been added. This favors the least frequent query words when retrieving the documents.

curl \
 -X POST 'http://localhost:7700/indexes/movies/search' \
 -H 'Content-Type: application/json' \
 --data-binary '{
    "q": "chaval blanc",
    "matchingStrategy": "frequency"
 }'

Previous existing values for matchingStrategy are last and all (last is the default value).

Done by @ManyTheFish in #4667

Improve indexing speed when updating/adding settings

Meilisearch now limits operations when importing settings by avoiding useless writing operations in its internal database and by reducing disk usage.

Done by @irevoire and @Kerollmops in #4646, #4656 and #4631

Other improvements

Prometheus experimental feature: Use HTTP path pattern instead of full path in metrics (#4619) @gh2k
⚠️ Remove exportPuffinReport experimental feature. Use logs routes and logs modes instead (#4655) @Kerollmops

Fixes 🐞

When no searchable attributes are declared, all the fields have the same importance instead of being randomly given more importance. More information here (#4631) @irevoire
Fix searchableAttributes behavior with nested fields when they were not explicitly defined. More information here (#4631) @irevoire
Fix security issue in dependency: bump Rustls to non-vulnerable versions (#4622) @Kerollmops

Misc

CIs and tests
- Add "precommands" to benchmark (#4624) @dureuill
- Allow to comment with the results of benchmark invocation (#4651) @dureuill
Documentation
- Update README.md (#4664) @tpayet
Misc
- Fix some typos in comments (#4568) @yudrywet
- Fix some typos in comments (#4582) @writegr

❤️ Thanks again to our external contributors:

Meilisearch: @gh2k, @writegr, @yudrywet.

Contributors

Kerollmops, ManyTheFish, and 6 other contributors

Assets 7

22 May 08:21

ManyTheFish

v1.8.1

ba75d23

v1.8.1 🪼

Fixes 🪲

Index the _geo fields when changing the setting while there are already documents in the DB by @irevoire and @ManyTheFish in #4642

Contributors

ManyTheFish and irevoire

Assets 8

06 May 07:30

curquiza

v1.8.0

c668043

v1.8.0 🪼

Meilisearch v1.8 introduces new changes and optimizations related to the Hybrid search with the addition of new models and embedders like REST embedders and the Ollama model. This version also focuses on stability by adding more security around the search requests. Finally, we introduce the negative operator to exclude specific terms from a search query.

🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.

Some SDKs might not include all new features. Consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we'll love you for that ❤️).

New features and updates 🔥

Hybrid search

This release introduces a few changes to hybrid search.): a new distribution embedder setting, support for two new embedder sources, and breaking changes to hybrid and semantic search ranking score.

🗣️ This is an experimental feature and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.

Done by @dureuill and @jakobklemm in #4456, #4537, #4509, #4548, #4549.

⚠️ Breaking changes: `_semanticScore`

To increase search response times and reduce bandwidth usage:

Meilisearch no longer returns the vector field will in the search response
Meilisearch no longer returns the _semanticScore in the search response. Use _rankingScore in its place
Meilisearch no longer displays the query vector and its value when"showRankingScoreDetails": true

New embedders: Ollama and generic REST embedder

Ollama model

Ollama is a framework for building and running language models locally. Configure it by supplying an embedder object to the /settings endpoint:

"default": {
  "source": "ollama",
  "url": "http://localhost:11434/api/embeddings",  // optional, fetched from MEILI_OLLAMA_URL environment variable if missing
  "apiKey": "<foobarbaz>",  // optional
  "model": "nomic-embed-text",
  "documentTemplate": "A document titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"
}

Generic REST embedder

Meilisearch now also supports any embedder with a RESTful interface. Configure it by supplying an embedder object to the /settings endpoint:

"default": {
  "source": "rest",
  "url": "http://localhost:12345/api/v1/embed", //Mandatory, full URL to the embedding endpoint
  "apiKey": "187HFLDH97CNHN", // Optional, passed as Bearer in the Authorization header
  "dimensions": 512, // Optional, inferred with a dummy request if missing
  "documentTemplate": "A document titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"
  "inputField": ["data", "text"], // Optional, defaults to []
  "inputType": "text", // Optional, either "text" or "textArray", defaults to text
  "query": { // Optional, defaults to {}
    "model": "MODEL_NAME",
    "dimensions": 512
  },
  "pathToEmbeddings": ["data"], // Optional, defaults to []
  "embeddingObject": ["embedding"] // Optional, defaults to []
}

New embedder setting: `distribution`

Use distribution to apply an affine transformation to the _rankingScore of semantic search results. This can help to compare _rankingScores of semantic and keyword search results and improve result ranking.

"default": {
  "source": "huggingFace",
  "model": "MODEL_NAME",
  "distribution": {  // describes the natural distribution of results
    "mean": 0.7, // mean value
    "sigma": 0.3 // variance
  }
}

Other hybrid search improvements

Hide the API key in settings and task queue (#4533) @dureuill
Return keyword search results even in case of a failure of the embedding when performing hybrid searches (#4548) @dureuill
For hybrid or semantic search requests, add a semanticHitCount field at the top of the search response indicating the number of hits originating from the semantic search (#4548) @dureuill

New feature: Negative keywords

Search queries can now contain a negative keyword to exclude terms from the search. Use the - operator in front of a word or a phrase to make sure no document that contains those words are shown in the results:

curl \
  -X POST 'http://localhost:7700/indexes/places/search' \
  -H 'Content-Type: application/json' \
  --data-binary '{ "q": "-escape room" }'

-escape returns any document that does not contain escape
-escape room returns documents containing room but not escape
-"on demand" returns any document that does not contain "on demand"

Done by @Kerollmops in #4535.

Search robustness updates

Search cutoff

To avoid crashes and performance issues, Meilisearch now interrupts search requests that take more than 1500ms to complete.

Use the /settings endpoint to customize this value:

curl \
  -X PATCH 'http://localhost:7700/indexes/movies/settings' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "searchCutoffMs": 150
  }'

The default value of the searchCutoffMs setting is null and corresponds to a 1500ms timeout.

Done by @irevoire in #4466.

Concurrent search request limits

This release introduces a limit for concurrent search requests to prevent Meilisearch from consuming an unbounded amount of RAM and crashing.

The default number of requests in the queue is 1000. Relaunch your self-hosted instance with --experimental-search-queue-size to change this limit:

./meilisearch --experimental-search-queue-size 100

👉 This limit does NOT impact the search performance. It only affects the number of enqueued search requests to prevent security issues.

🗣️ This is an experimental feature and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.

Done by @irevoire in #4536

Other improvements

Increase indexing speed when updating settings (#4504) @ManyTheFish
Update search logs: do not display hits in the search output for DEBUG log level (#4580) @irevoire
The sortFacetValuesBy setting now impacts the /facet-search route (#4476) @Kerollmops
Prometheus experimental feature: add status code label to the HTTP request counter (#4373) @rohankmr414
Tokenizer improvements by bumping charabia to 0.8.8 (#4511) @6543
- Support markdown formatted code blocks
- Improve Korean segmentation to correctly use the context ID registered in the dictionary
- Add \t as recognized separator
- Make the pinyin-normalization optional - this can be reactivated by enabling the chinese-normalization-pinyin feature

Fixes 🐞

Fix crash when putting empty separator (#4574) @ManyTheFish
Stop crashing when panic occurs in thread pool (#4593) @Kerollmops
Always show facet numbers in alpha order in the facet distribution (#4581) @Kerollmops
Prometheus experimental feature: fix the HTTP request duration histogram bucket boundaries to follow the OpenTelemetry spec (#4530) @rohankmr414
Hybrid search experimental feature: fix an error on Windows when generating embeddings (#4549) @dureuill

Misc

Dependency updates
- Bump mio from 0.8.9 to 0.8.11 (#4457)
- Upgrade rustls to 0.21.10 and ring to 0.17 (#4400) @hack3ric
CIs and tests
- Add automation to create openAPI issues (#4520) @curquiza
- Add tests to check when the field limit is reached (#4463) @irevoire
- Allow running benchmarks without sending results to the dashboard (#4475) @dureuill
- Create automation when creating GitHub milestones to create update-version issue (#4416) @curquiza
- Fix reason param when benches are triggered from a comment (#4483) @dureuill
Documentation
- Fix milli link in contributing doc (#4499) @mohsen-alizadeh
- Fix some typos in comments (#4546) @redistay
- Remove repetitive words in Benchmark docs (#4526) @availhang
- Remove repetitive words in code-base comments (#4491) @shuangcui
- Update sprint_issue.md (#4516) @curquiza
- Add documentation for benchmarks (#4477) @dureuill
- Fix typos (#4542) @brunoocasali
Misc
- Update cargo version (#4474) @curquiza
- Remove useless analytics (#4578) @irevoire
- Fix milli/Cargo.toml for usage as dependency via git (#4547) @Toromyx

❤️ Thanks again to our external contributors:

Meilisearch: @availhang, @hack3ric, @jakobklemm, @mohsen-alizadeh, @redistay, @rohankmr414, @shuangcui, @Toromyx, @6543.
Charabia: @Gusted, @mosuka, @6543

Contributors

mosuka, Kerollmops, and 15 other contributors

Assets 8

29 Apr 08:39

ManyTheFish

v1.8.0-rc.2

ebca29f

v1.8.0-rc.2 🪼 Pre-release

Pre-release

⚠️ Since this is a release candidate (RC), we do NOT recommend using it in a production environment. Is something not working as expected? We welcome bug reports and feedback about new features.

What's Changed

Remove useless analytics by @irevoire in #4578
Stop crashing when panic occurs in thread pool by @Kerollmops in #4593
Fix embedders api by @ManyTheFish in #4600
Fix embeddings settings update by @ManyTheFish in #4597

Contributors

Kerollmops, ManyTheFish, and irevoire

Assets 7

Releases: meilisearch/meilisearch

v1.9.0-rc.4 🦎

Bug fixes 🪲

Improvements

Contributors

v1.8.3 🪼

Fixes 🪲

Contributors

v1.9.0-rc.3 🦎

Bug fixes

Breaking changes

Improvements

Misc

Contributors

v1.8.2 🪼

Fixes 🪲

Contributors

v1.9.0-rc.2 🦎

Speedup additional searchable Attributes by @Kerollmops in #4680

Update Charabia v0.8.11 by @ManyTheFish in #4684

Misc

Contributors

v1.9.0-rc.1 🦎

New features and updates 🔥

Filter by score

Examples

Known limitations

Other improvements

Misc

Contributors

v1.9.0-rc.0 🦎

New features and updates 🔥

Hybrid search improvements

⚠️ Breaking changes of hybrid search usage

Improvements

Get similar documents

frequency matching strategy when searching

Improve indexing speed when updating/adding settings

Other improvements

Fixes 🐞

Misc

Contributors

v1.8.1 🪼

Fixes 🪲

Contributors

v1.8.0 🪼

New features and updates 🔥

Hybrid search

⚠️ Breaking changes: _semanticScore

New embedders: Ollama and generic REST embedder

Ollama model

Generic REST embedder

New embedder setting: distribution

Other hybrid search improvements

New feature: Negative keywords

Search robustness updates

Search cutoff

Concurrent search request limits

Other improvements

Fixes 🐞

Misc

Contributors

v1.8.0-rc.2 🪼

What's Changed

Contributors

`frequency` matching strategy when searching

⚠️ Breaking changes: `_semanticScore`

New embedder setting: `distribution`