diff --git a/open-api.yaml b/open-api.yaml index 8b3b16b9..db881de7 100644 --- a/open-api.yaml +++ b/open-api.yaml @@ -117,6 +117,157 @@ components: - start - length description: Starting position and length in bytes of the matched term in the returned value + order: + type: integer + description: The order that this ranking rule was applied + customRankingRuleDetails: + type: object + properties: + order: + $ref: '#/components/schemas/order' + value: + type: + - string + - number + - point + description: The value that was used for sorting this document + distance: + type: number + description: The distance between the target point and the geoPoint in the document + required: + - order + - value + description: Custom rule in the form of either `attribute:direction` or `_geoPoint(lat, lng):direction`. + score: + type: number + description: | + The relevancy score of a document according to a ranking rule and relative to a search query. Higher is better. + + `1.0` indicates a perfect match, `0.0` no match at all (Meilisearch should not return documents that don't match the query). + rankingScoreDetails: + type: object + properties: + words: + type: object + properties: + order: + $ref: '#/components/schemas/order' + matchingWords: + type: integer + description: the number of words from the query found + maxMatchingWords: + type: integer + score: + $ref: '#/components/schemas/score' + required: + - order + - matchingWords + - maxMatchingWords + - score + typo: + type: object + properties: + order: + $ref: '#/components/schemas/order' + typoCount: + type: integer + description: The number of typos to correct in the query to match that document. + maxTypoCount: + type: integer + description: The maximum number of typos that can be corrected in the query to match a document. + score: + $ref: '#/components/schemas/score' + required: + - order + - typoCount + - maxTypoCount + - score + proximity: + type: object + properties: + order: + $ref: '#/components/schemas/order' + score: + $ref: '#/components/schemas/score' + required: + - order + - score + attribute: + type: object + properties: + order: + $ref: '#/components/schemas/order' + attribute_ranking_order_score: + type: number + description: | + Score computed depending on the first attribute each word of the query appears in. + + The first attribute in the `searchableAttributes` list yields the highest score, the last attribute the lowest. + query_word_distance_score: + type: number + description: | + Score computed depending on the position the attributes where each word of the query appears in. + + Words appearing in an attribute at the same position as in the query yield the highest score. The greater the distance to the position + in the query, the lower the score. + score: + $ref: '#/components/schemas/score' + required: + - order + - attribute_ranking_order_score + - query_word_distance_score + - score + exactness: + type: object + properties: + order: + $ref: '#/components/schemas/order' + matchType: + type: string + description: | + One of `exactMatch`, `matchesStart` or `noExactMatch`. + - `exactMatch`: the document contains an attribute that exactly matches the query. + - `matchesStart`: the document contains an attribute that exactly starts with the query. + - `noExactMatch`: any other document. + score: + $ref: '#/components/schemas/score' + required: + - order + - matchType + - score + additionalProperties: + $ref: '#/components/schemas/customRankingRuleDetails' + description: (EXPERIMENTAL) The ranking score per ranking rule. + examples: + With sort: + words: + order: 0 + matchingWords: 7 + maxMatchingWords: 7 + score: 1.0 + "typo": + "order": 1 + "typoCount": 0 + "maxTypoCount": 0 + "score": 1.0 + "proximity": + "order": 2, + "score": 1.0 + "attribute": + "order": 3 + "attribute_ranking_order_score": 1.0 + "query_word_distance_score": 1.0 + "score": 1.0 + "title:asc": + "order": 4 + "value": "batman: the dark knight returns, part 1" + "release_date:desc": + "order": 5 + "value": 1345507200.0 + "exactness": + "order": 6 + "matchType": "exactMatch" + "score": 1.0 hit: type: object additionalProperties: true @@ -168,6 +319,15 @@ components: properties: '': $ref: '#/components/schemas/matchesPosition' + _rankingScore: + type: number + description: Only present if showRankingScore = `true`. The ranking score of that document. + _rankingScoreDetails: + type: object + description: (EXPERIMENTAL) Only present if showRankingScoreDetails = `true`. The ranking score of each ranking rule for that document. + properties: + '': + $ref: '#/components/schemas/rankingScoreDetails' attribute: type: - string @@ -712,6 +872,14 @@ components: type: boolean description: Defines whether an `_matchesPosition` object that contains information about the matches should be returned or not. default: false + showRankingScore: + type: boolean + description: Defines whether a `_rankingScore` number representing the relevancy score of that document should be returned or not. + default: false + showRankingScoreDetails: + type: boolean + description: (EXPERIMENTAL) Defines whether a `_rankingScoreDetails` object containing information about the score of that document for each ranking rule should be returned or not. + default: false matchingStrategy: type: string description: Defines which strategy to use to match the query terms within the documents as search results. Two different strategies are available, `last` and `all`. By default, the `last` strategy is chosen. @@ -762,6 +930,7 @@ components: attributesToHighlight: - overview showMatchesPosition: true + showRankingScore: true wordsMatchingStrategy: all error: title: error diff --git a/text/0034-telemetry-policies.md b/text/0034-telemetry-policies.md index 92ec5e04..0cf7db10 100644 --- a/text/0034-telemetry-policies.md +++ b/text/0034-telemetry-policies.md @@ -191,6 +191,8 @@ The collected data is sent to [Segment](https://segment.com/). Segment is a plat | `clear_all` | `true` if `DELETE /indexes/:indexUid/documents` endpoint was used in this batch, otherwise `false` | false | `Documents Deleted` | | vector_store | Whether the [vector store](./0193-experimental-features.md#vector-store) feature is enabled. | `true` | `Experimental features Updated` | | score_details | Whether the [score details](./0193-experimental-features.md#score-details) feature is enabled. | `true` | `Experimental features Updated` | +| scoring.show_ranking_score | Was `showRankingScore` used in the aggregated event? If yes, `true`, otherwise `false` | `false` | `Documents Searched POST`, `Documents Searched GET`, `Documents Searched by Multi-Search POST` | +| scoring.show_ranking_score_details | Was `showRankingScoreDetails` used in the aggregated event? If yes, `true`, otherwise `false` | `false` | `Documents Searched POST`, `Documents Searched GET`, `Documents Searched GET` | ---- @@ -283,9 +285,11 @@ This property allows us to gather essential information to better understand on | formatting.max_attributes_to_crop | The maximum number of attributes to crop encountered among all requests in the aggregated event. | `100` | | formatting.crop_length | Does `cropLength` has been used in the aggregated event? If yes, `true` otherwise `false` | `false` | | formatting.crop_marker | Does `cropMarker` has been used in the aggregated event? If yes, `true` otherwise `false` | `false` | -| formatting.show_matches_position | Does `showMatchesPosition` has been used in the aggregated event? If yes, `true` otherwise `false` | `false` | +| formatting.show_matches_position | Was `showMatchesPosition` used in the aggregated event? If yes, `true` otherwise `false` | `false` | | facets.avg_facets_number | The average number of facets among all the requests containing the `facets` parameter in the aggregated event. `"facets": []` equals to `0` while not sending `facets` does not influence the average in the aggregated event. | `10` | | matching_strategy.most_used_strategy | Most used word matching strategy among all search requests in the aggregated event. `last` / `all` | `last` | +| scoring.show_ranking_score | Was `showRankingScore` used in the aggregated event? If yes, `true`, otherwise `false` | `false` | +| scoring.show_ranking_score_details | Was `showRankingScoreDetails` used in the aggregated event? If yes, `true`, otherwise `false` | `false` | --- @@ -320,6 +324,8 @@ This property allows us to gather essential information to better understand on | formatting.show_matches_position | Does `showMatchesPosition` has been used in the aggregated event? If yes, `true` otherwise `false` | `false` | | facets.avg_facets_number | The average number of facets among all the requests containing the `facets` parameter in the aggregated event. `"facets": []` equals to `0` while not sending `facets` does not influence the average in the aggregated event. | `10` | | matching_strategy.most_used_strategy | Most used word matching strategy among all search requests in the aggregated event. `last` / `all` | `last` | +| scoring.show_ranking_score | Was `showRankingScore` used in the aggregated event? If yes, `true`, otherwise `false` | `false` | +| scoring.show_ranking_score_details | Was `showRankingScoreDetails` used in the aggregated event? If yes, `true`, otherwise `false` | `false` | --- diff --git a/text/0118-search-api.md b/text/0118-search-api.md index 0b3ca2c5..09edd146 100644 --- a/text/0118-search-api.md +++ b/text/0118-search-api.md @@ -32,25 +32,27 @@ If a master key is used to secure a Meilisearch instance, the auth layer returns ### 3.1. Search Payload Parameters -| Field | Type | Required | -|-------------------------------------------------------|--------------------------|----------| -| [`q`](#311-q) | String | False | -| [`filter`](#312-filter) | Array of String - String | False | -| [`sort`](#313-sort) | Array of String - String | False | -| [`facets`](#314-facets) | Array of String - String | False | -| [`limit`](#315-limit) | Integer | False | -| [`offset`](#316-offset) | Integer | False | -| [`page`](#317-page) | Integer | False | -| [`hitsPerPage`](#318-hitsperpage) | Integer | False | -| [`attributesToRetrieve`](#319-attributestoretrieve) | Array of String - String | False | -| [`attributesToHighlight`](#3110-attributestohighlight)| Array of String - String | False | -| [`highlightPreTag`](#3111-highlightpretag) | String | False | -| [`highlightPostTag`](#3112-highlightposttag) | String | False | -| [`attributesToCrop`](#3113-attributestocrop) | Array of String - String | False | -| [`cropLength`](#3114-croplength) | Integer | False | -| [`cropMarker`](#3115-cropmarker) | String | False | -| [`showMatchesPosition`](#3116-showmatchesposition) | Boolean | False | -| [`matchingStrategy`](#3117-matchingStrategy) | String | False | +| Field | Type | Required | +|---------------------------------------------------------------|--------------------------|----------| +| [`q`](#311-q) | String | False | +| [`filter`](#312-filter) | Array of String - String | False | +| [`sort`](#313-sort) | Array of String - String | False | +| [`facets`](#314-facets) | Array of String - String | False | +| [`limit`](#315-limit) | Integer | False | +| [`offset`](#316-offset) | Integer | False | +| [`page`](#317-page) | Integer | False | +| [`hitsPerPage`](#318-hitsperpage) | Integer | False | +| [`attributesToRetrieve`](#319-attributestoretrieve) | Array of String - String | False | +| [`attributesToHighlight`](#3110-attributestohighlight) | Array of String - String | False | +| [`highlightPreTag`](#3111-highlightpretag) | String | False | +| [`highlightPostTag`](#3112-highlightposttag) | String | False | +| [`attributesToCrop`](#3113-attributestocrop) | Array of String - String | False | +| [`cropLength`](#3114-croplength) | Integer | False | +| [`cropMarker`](#3115-cropmarker) | String | False | +| [`showMatchesPosition`](#3116-showmatchesposition) | Boolean | False | +| [`showRankingScore`](#3117-showrankingscore) | Boolean | False | +| [`showRankingScoreDetails`](#3118-showrankingscoredetails) | Boolean | False | +| [`matchingStrategy`](#3119-matchingStrategy) | String | False | #### 3.1.1. `q` @@ -533,7 +535,7 @@ The first page has a value of `1`, the second `2`, etc... When `0` is provided a When providing `page` or `hitsPerPage` in the query parameters, the `page selection` system is enabled, which makes it possible to navigate through the search results pages. See explanation on the [`page selection`](#3181-navigating-search-results-by-page-selection). -If in addition to either `page` and/or `hitsPerPage`, `limit` and/or `offset` are provided as well, `limit` and `offset` are ignored. See [explaination](#3181-navigating-search-results-by-page-selection). +If in addition to either `page` and/or `hitsPerPage`, `limit` and/or `offset` are provided as well, `limit` and `offset` are ignored. See [explanation](#3181-navigating-search-results-by-page-selection). - 🔴 Sending a value with a different type than `Integer` for `page` returns an [invalid_search_page](0061-error-format-and-definitions.md#invalid_search_page) error. @@ -892,7 +894,32 @@ It's useful when more control is needed than offered by the built-in highlightin - 🔴 Sending a value with a different type than `Boolean` or `null` for `showMatchesPosition` returns an [invalid_search_show_matches_position](0061-error-format-and-definitions.md#invalid_search_show_matches_position) error. -#### 3.1.17. `matchingStrategy` +#### 3.1.17. `showRankingScore` + +- Type: Boolean +- Required: False +- Default: `false` + +Adds a [`_rankingScore`](#32114-rankingscore) number to each document in the search response, representing the relevancy score of a document according to the applied ranking rules and relative to a search query. Higher is better. + +`1.0` indicates a perfect match, `0.0` no match at all (Meilisearch should not return documents that don't match the query). + +- 🔴 Sending a value with a different type than `Boolean` or `null` for `showRankingScore` returns an [invalid_search_ranking_score](0061-error-format-and-definitions.md#invalid_search_show_ranking_score) error. + +#### 3.1.18. `showRankingScoreDetails` + +(EXPERIMENTAL) + +- Type: Object +- Required: False +- Default: `false` + +Adds a [`_rankingScoreDetails`](#32115-rankingscoredetails) object to each document in the search response, containing information about the score of that document for each applied ranking rule. + +- 🔴 Sending a value with a different type than `Boolean` or `null` for `showRankingScoreDetails` returns an [invalid_search_ranking_score_details](0061-error-format-and-definitions.md#invalid_search_show_ranking_score_details) error. +- 🔴 Using that field while the [`score details`](./0193-experimental-features.md#score-details) experimental feature has not been enabled returns a [feature_not_enabled](0061-error-format-and-definitions.md#feature_not_enabled) error. + +#### 3.1.19. `matchingStrategy` - Type: String - Required: False @@ -940,15 +967,17 @@ Results of the search query as an array of documents. > The search parameters `attributesToRetrieve` influence the returned payload for a hit. See [3.1.7. `attributesToRetrieve`](#319-attributestoretrieve) section. -A search result can contain special properties. See [3.2.1.1. `hit` Special Properties](#3211-hits-special-properties) section. +A search result can contain special properties. See [3.2.1.1. `hit` Special Properties](#3211-hit-special-properties) section. ##### 3.2.1.1. `hit` Special Properties -| Field | Type | Required | -|----------------------------------------------|---------|----------| -| [`_geoDistance`](#32111-geodistance) | Integer | False | -| [`_formatted`](#32112-formatted) | Object | False | -| [`_matchesPosition`](#32113-matchesposition) | Object | False | +| Field | Type | Required | +|------------------------------------------------------|---------|----------| +| [`_geoDistance`](#32111-geodistance) | Integer | False | +| [`_formatted`](#32112-formatted) | Object | False | +| [`_matchesPosition`](#32113-matchesposition) | Object | False | +| [`_rankingScore`](#32114-rankingscore) | Number | False | +| [`_rankingScoreDetails`](#32115-rankingscoredetails) | Object | False | ###### 3.2.1.1.1. `_geoDistance` @@ -1155,6 +1184,28 @@ The beginning of a matching term within a field is indicated by `start`, and its > See [3.1.14. `showMatchesPosition`](#3116-showmatchesposition) section. +###### 3.2.1.1.4. `_rankingScore` + +- Type: Number +- Required: False + +The relevancy score of a document relative to the search query. Higher is better. + +`1.0` indicates a perfect match, `0.0` no match at all (Meilisearch should not return documents that don't match the query). + +> See [Ranking Score](./0195-ranking-score.md#31-ranking-score) for details. + +###### 3.2.1.1.5. `_rankingScoreDetails` + +- Type: Object +- Required: False + +(EXPERIMENTAL) The ranking score of a document per each ranking rule and relative to the search query. + +This object features one field for each applied ranking rule, whose values are an object with at least the field `order` indicating in which order this ranking rule has been applied. + +> See [Ranking Score details](./0195-ranking-score.md#32-ranking-score-details) for details. + #### 3.2.2. `limit` - Type: Integer diff --git a/text/0195-ranking-score.md b/text/0195-ranking-score.md new file mode 100644 index 00000000..b734c4e4 --- /dev/null +++ b/text/0195-ranking-score.md @@ -0,0 +1,183 @@ +# Ranking Score + +## 1. Summary + +Adds two kinds of scores to documents returned by a [search query](./0118-search-api.md). + +## 2. Motivation + +When configuring the Meilisearch relevancy according to their needs, users cannot know why one document has been favored over another. + +Showing how the documents ranked according to Meilisearch’s ranking rules unlocks: + +- Further customization of the developer workflow, such as fine-tuning settings and improving relevancy for example. +- Returning a unified list of results for multi-index search queries +- Sharding +- Debugging and helping users better understand how ranking works + +## 3. Functional Specification + +### 3.1. Ranking score + +A ranking score is a number attached to each document returned by a search when the [`showRankingScore`](./0118-search-api.md#3117-showrankingscore) flag is set to true in the search query. + +#### 3.1.1. Scale and interpretation + +The ranking score is contained between 1.0 and 0.0. A higher score signifies better relevancy, with 1.0 representing a perfect match, and 0.0 indicating that the document does not match the query (Meilisearch should not return documents that do not match the query). + +That number rates the relevancy of the document with respect to the specified search query and the current settings of the index. + +The score of a document follows its relevancy in the sense of Meilisearch, in that the first few ranking rules have a much higher influence on the score than the next rules. This is consistent with the way that later ranking rules are only used to break ties with earlier ranking rules, when ranking documents. + +#### 3.1.2. Score independence + +The score of a document is independent of what other documents are contained in the index but is influenced by the settings of the index. The table below details all the settings that can influence the score. Unlisted settings do not influence the ranking score. + +| Setting name | Influences if | Rationale | +|--------------|---------------|-----------| +|`searchableAttributes`|The `attribute` ranking rule is used|The `attribute` ranking rule rates the document depending on the attribute in which the query terms show up. The order is determined by `searchableAttributes`| +|`rankingRules`|Always|The score is computed by computing the subscore of each ranking rule with a weight that depends on their order.| +|`stopWords`|Always|Stop words influence the `words` ranking rule, which is almost always used| +|`synonyms`|Always|Synonyms influence the `words` ranking rule, which is almost always used| +|`typoTolerance`|The `typo` ranking rule is used|Used to compute the maximum number of typos for a query| + + +Additionally, the following can impact score independence: + +- If the `attribute` ranking rule is used, but `searchableAttributes` has not been specified, then the score is dependent on all the fields that appear in documents and their precise order, as determined by Meilisearch. +the score is dependent on the search query. + +Depending on the use case, it can be meaningful to compare scores coming from indexes with settings that are different: + +- When comparing two scores produced on two indexes with different settings, possibly on a distinct search query, one is comparing the relevancy of each of the scored documents to their respective search query. This is good to present the most relevant documents first when working with heterogeneous indexes, without taking into account which document best suits one single query. +- On the other hand, to find what document best suits one single query against two homogeneous indexes, one must be careful to make sure that the indexes have the settings above set to the same value. + +#### 3.1.3. The sort ranking rules do not impact the score + +Custom `sort` and `geosort` ranking rules modify the ranking of documents such that they are returned sorted by the value of the target field, rather than by their relevancy to the search query. + +As such, these ranking rules have no impact on the score. As a corollary of this, if a `sort` ranking rule is not the last ranking rule, then it is possible to see documents returned with ranking scores that are not monotonically decreasing. + +Similarly, re-ranking documents by their ranking score will ignore any `sort` ranking rule. + +If you need to factor sort ranking rules into your score, then use the [ranking score details](#32-ranking-score-details). + +### 3.2. Ranking score details + +(EXPERIMENTAL) The ranking score details are represented as an object attached to each document returned by a search when the [`showRankingScoreDetails`](./0118-search-api.md#3118-showrankingscoredetails) flag is set to true in the search query. + +The ranking score details are experimental and require enabling the corresponding [experimental feature](./0193-experimental-features.md#score-details). + +#### 3.2.1. General shape + +The fields of the object have for key the identifier of the various ranking rules that were applied, and for value an object with at least the following field: + +- `order`: the numerical order in which the ranking rule was applied. Starts at 0. Consecutive numbers denote ranking rules consecutively applied. + +Additionally, all ranking rules except the `sort` and `geosort` ranking rules have the following field: + +- `score`: the relevancy score of the document relative to this search query, for this ranking rule. A number between 1.0 and 0.0, with 1.0 meaning a perfect match to the query according to the ranking rule, and 0.0 no match. + +#### 3.2.2. Ranking-rule-specific fields + +Each ranking rule exposes specific fields meant to provide semantic information about how the ranking rule was applied to the document. + +The table below details these rule-specific fields. + +​ +| Ranking rule | Field description | +| :--------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `words` | | +| `typo` | | +| `proximity` | No rule-specific field | +| `attribute` | | +| `exactness` | | + +#### 3.2.3. Sort ranking rules + +`Sort` and `geosort` ranking rules appear as fields in the score details, but with the following difference: + +- Their key follows the following format: `{:attribute-sorted-on}:{:sort-direction}`, with the `:attribute-sorted-on` the name of the attribute that is being sorted on, and the `:sort-direction` either `asc` if the sort is in ascending order, or `desc` if the sort is in descending order. For the `geosort` ranking rule, it is similarly `_geoPoint({:lat}, {:lng}):{:sort-direction}`, with the `:lat` and `:lng` being the latitude and respective longitude of the point that serves as base to sort by distance. +- They don't have a `score` field, but instead they have a `value` field, representing the value used to sort the document. It is typically the value of the sorted attribute for the document, but can sometimes be a subvalue (case where the value is an array of values). +- For the `geosort`, there is an additional `distance` field representing the distance between the target point and the point used in the document to sort the document. + +#### 3.2.4 Example + +The following is an example of a `_scoreDetails` returned for a document matching a search query. + + +```json +"_rankingScoreDetails": { + "words": { + "order": 0, + "matchingWords": 1, + "maxMatchingWords": 1, + "score": 1 + }, + "typo": { + "order": 1, + "typoCount": 0, + "maxTypoCount": 1, + "score": 1 + }, + "proximity": { + "order": 2, + "score": 1 + }, + "attribute": { + "order": 3, + "attributes_ranking_order": 0.8333333333333334, + "attributes_query_word_order": 1, + "score": 0.8333333333333334 + }, + "exactness": { + "order": 4, + "matchType": "exactMatch", + "score": 1 + }, + "release_date:asc": { + "order": 5, + "value": 1165881600 + } +} +``` + +## 4. Technical Details + +### 4.1. Ranking score calculation + +The ranking score calculation in this section is given for informative purposes and is not normative. + +The implementation computes the [ranking score](#31-ranking-score) from each ranking rule (excluding `sort` and `geosort`) with two bits of data per ranking rule. For the `k`th applied ranking rule: + +- The maximum rank `max_rank_k` that a document can score with the rule, [independently from the other documents in the index](#312-score-independence) +- The rank `rank_k` of that document for that rule, with the highest rank being equal to the maximum rank, and the lowest rank being equal to 1. + +The score is given by the following formula, assuming `n` ranking rules denoted from `0` to `n-1`: + +``` +score = sum(i in 0..(n-1), (rank_i - 1) / product(j in 0..=i, max_rank_j)) + (rank_(n-1) / product(i in 0..n, max_rank_i)) +``` + +The intuition behind this formula is that every document falls in a range for each rule, between `rank_i / max_rank_i` and `(rank_i - 1) / max_rank_i`, and the next ranking rule allows to refine where the document is in this range, with the last ranking rule providing the exact score. + + +### 4.2. Hidden ranking rules + +If the [`displayedAttributes`](./0123-displayed-attributes-setting-api.md) list is defined, then attributes that are not part of that list, but are used in `sort` ranking rules are **hidden**. + +Instead of seeing `{:attribute-sorted-on}:{:sort-direction}` like described in [the relevant section](#323-sort-ranking-rules), the name of that field is replaced with ``, with `{:number}` a number that serves to uniquely distinguish between such hidden rules. + +Note: that number is not guaranteed to start at 0 nor to be consecutive. The only guarantee is that no hidden ranking rule will have the same number. + +Furthermore, the `value` that was used to sort the document is also hidden and replaced by `""`. + +### 4.3. Disabled optimization + +The engine optimizes search by skipping the application of ranking rules when there's only one remaining document (no tie to break). + +To compute an accurate score, however, all ranking rules must be applied, so this optimization is disabled as soon as a score is requested in the search request. When no scores are requested, the optimization is active. + +## 5. Future Possibilities + +- Extend the [multi-search API](./0192-multi-search-api.md) to rerank documents according to their score, providing federated search.