Skip to content

Latest commit

 

History

History
192 lines (141 loc) · 9.73 KB

File metadata and controls

192 lines (141 loc) · 9.73 KB

v1.8.0 release changelogs

⚠️ Since this is a release candidate (RC), we do NOT recommend using it in a production environment. Is something not working as expected? We welcome bug reports and feedback about new features.

Meilisearch v1.8 introduces new changes and optimizations related to the Hybrid search with the addition of new models and embedders like REST embedders and the Ollama model. This version also focuses on stability by adding more security around the search requests. Finally, we introduce the negative operator to exclude specific terms from a search query.

🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.

Some SDKs might not include all new features. Consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we'll love you for that ❤️).

New features and updates 🔥

Hybrid search

This release introduces a few changes to hybrid search.): a new distribution embedder setting, support for two new embedder sources, and breaking changes to hybrid and semantic search ranking score.

🗣️ This is an experimental feature and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.

Done by @dureuill and @jakobklemm in #4456, #4537, #4509, #4548, #4549.

⚠️ Breaking changes: _semanticScore

To increase search response times and reduce bandwidth usage:

  • Meilisearch no longer returns the vector field will in the search response
  • Meilisearch no longer returns the _semanticScore in the search response. Use _rankingScore in its place
  • Meilisearch no longer displays the query vector and its value when"showRankingScoreDetails": true

New embedders: Ollama and generic REST embedder

Ollama model

Ollama is a framework for building and running language models locally. Configure it by supplying an embedder object to the /settings endpoint:

"default": {
  "source": "ollama",
  "url": "http://localhost:11434/api/embeddings",  // optional, fetched from MEILI_OLLAMA_URL environment variable if missing
  "apiKey": "<foobarbaz>",  // optional
  "model": "nomic-embed-text",
  "documentTemplate": "A document titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"
}

Generic REST embedder

Meilisearch now also supports any embedder with a RESTful interface. Configure it by supplying an embedder object to the /settings endpoint:

"default": {
  "source": "rest",
  "url": "http://localhost:12345/api/v1/embed", //Mandatory, full URL to the embedding endpoint
  "apiKey": "187HFLDH97CNHN", // Optional, passed as Bearer in the Authorization header
  "dimensions": 512, // Optional, inferred with a dummy request if missing
  "documentTemplate": "A document titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"
  "inputField": ["data", "text"], // Optional, defaults to []
  "inputType": "text", // Optional, either "text" or "textArray", defaults to text
  "query": { // Optional, defaults to {}
    "model": "MODEL_NAME",
    "dimensions": 512
  },
  "pathToEmbeddings": ["data"], // Optional, defaults to []
  "embeddingObject": ["embedding"] // Optional, defaults to []
}

New embedder setting: distribution

Use distribution to apply an affine transformation to the _rankingScore of semantic search results. This can help to compare _rankingScores of semantic and keyword search results and improve result ranking.

"default": {
  "source": "huggingFace",
  "model": "MODEL_NAME",
  "distribution": {  // describes the natural distribution of results
    "mean": 0.7, // mean value
    "sigma": 0.3 // variance
  }
}

Other hybrid search improvements

  • Hide the API key in settings and task queue (#4533) @dureuill
  • Return keyword search results even in case of a failure of the embedding when performing hybrid searches (#4548) @dureuill
  • For hybrid or semantic search requests, add a semanticHitCount field at the top of the search response indicating the number of hits originating from the semantic search (#4548) @dureuill

New feature: Negative keywords

Search queries can now contain a negative keyword to exclude terms from the search. Use the - operator in front of a word or a phrase to make sure no document that contains those words are shown in the results:

curl \
  -X POST 'http://localhost:7700/indexes/places/search' \
  -H 'Content-Type: application/json' \
  --data-binary '{ "q": "-escape room" }'
  • -escape returns any document that does not contain escape
  • -escape room returns documents containing room but not escape
  • -"on demand" returns any document that does not contain "on demand"

Done by @Kerollmops in #4535.

Search robustness updates

Search cutoff

To avoid crashes and performance issues, Meilisearch now interrupts search requests that take more than 1500ms to complete.

Use the /settings endpoint to customize this value:

curl \
  -X PATCH 'http://localhost:7700/indexes/movies/settings' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "searchCutoffMs": 150
  }'

The default value of the searchCutoffMs setting is null and corresponds to a 1500ms timeout.

Done by @irevoire in #4466.

Concurrent search request limits

This release introduces a limit for concurrent search requests to prevent Meilisearch from consuming an unbounded amount of RAM and crashing.

The default number of requests in the queue is 1000. Relaunch your self-hosted instance with --experimental-search-queue-size to change this limit:

./meilisearch --experimental-search-queue-size 100

👉 This limit does NOT impact the search performance. It only affects the number of enqueued search requests to prevent security issues.

🗣️ This is an experimental feature and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.

Done by @irevoire in #4536

Other improvements

  • Increase indexing speed when updating settings (#4504) @ManyTheFish
  • Update search logs: do not display hits in the search output for DEBUG log level (#4580) @irevoire
  • The sortFacetValuesBy setting now impacts the /facet-search route (#4476) @Kerollmops
  • Prometheus experimental feature: add status code label to the HTTP request counter (#4373) @rohankmr414
  • Tokenizer improvements by bumping charabia to 0.8.8 (#4511) @6543
    • Support markdown formatted code blocks
    • Improve Korean segmentation to correctly use the context ID registered in the dictionary
    • Add \t as recognized separator
    • Make the pinyin-normalization optional - this can be reactivated by enabling the chinese-normalization-pinyin feature

Fixes 🐞

  • Fix crash when putting empty separator (#4574) @ManyTheFish
  • Stop crashing when panic occurs in thread pool (#4593) @Kerollmops
  • Always show facet numbers in alpha order in the facet distribution (#4581) @Kerollmops
  • Prometheus experimental feature: fix the HTTP request duration histogram bucket boundaries to follow the OpenTelemetry spec (#4530) @rohankmr414
  • Hybrid search experimental feature: fix an error on Windows when generating embeddings (#4549) @dureuill

Misc

  • Dependency updates
    • Bump mio from 0.8.9 to 0.8.11 (#4457)
    • Upgrade rustls to 0.21.10 and ring to 0.17 (#4400) @hack3ric
  • CIs and tests
    • Add automation to create openAPI issues (#4520) @curquiza
    • Add tests to check when the field limit is reached (#4463) @irevoire
    • Allow running benchmarks without sending results to the dashboard (#4475) @dureuill
    • Create automation when creating GitHub milestones to create update-version issue (#4416) @curquiza
    • Fix reason param when benches are triggered from a comment (#4483) @dureuill
  • Documentation
    • Fix milli link in contributing doc (#4499) @mohsen-alizadeh
    • Fix some typos in comments (#4546) @redistay
    • Remove repetitive words in Benchmark docs (#4526) @availhang
    • Remove repetitive words in code-base comments (#4491) @shuangcui
    • Update sprint_issue.md (#4516) @curquiza
    • Add documentation for benchmarks (#4477) @dureuill
    • Fix typos (#4542) @brunoocasali
  • Misc
    • Update cargo version (#4474) @curquiza
    • Remove useless analytics (#4578) @irevoire
    • Fix milli/Cargo.toml for usage as dependency via git (#4547) @Toromyx

❤️ Thanks again to our external contributors:

  • Meilisearch: @availhang, @hack3ric, @jakobklemm, @mohsen-alizadeh, @redistay, @rohankmr414, @shuangcui, @Toromyx, @6543.
  • Charabia: @Gusted, @mosuka, @6543