Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Settings API - Customize the hard limits for pagination and faceting #157

Merged
merged 9 commits into from
Jul 7, 2022

Conversation

gmourier
Copy link
Member

@gmourier gmourier commented Jun 8, 2022

🤖 API Diff


Summary

This pull request introduces two new settings API to customize the current hard limits that are blocking users to adjust the search engine behavior to their custom needs.

pagination settings API

Before v0.27, we didn't have a limit on the number of reachable documents for a query. A handful of users complained that this made their database scrappable or subject to DDoS attack, so we decided to introduce a limit of 1000 for v0.27, non-customizable.

Our hypothesis (Some users might need to customize it at a superior range) proved to be true, some users need to customize this limit.

The main argument is that some users need to make the search results scrappable by search engines for their SEO.

Some users also simply want to go further in the pagination despite the relevancy of the first pages.

No one has yet asked to customize it to a lower value, but we imagine this is possible, especially for those who want to reduce the “scrappability” of their dataset.

faceting settings API

Since v0.21, there is no more limit on the number of values returned per facet. This was the case before v0.21 but some users found the default value too low, so it was decided to remove this limit.

The fact that it is unlimited today poses performance problems for users who have many possible values for a facet.

Rather than blocking one or the other, we make this value customizable according to their needs.

Important

Increasing the default values of these properties can degrade performance and more easily expose an instance to DDoS attacks or scrapping by a malicious user.

The default values should be a good compromise between performance and usage for most users.


Changes

API Endpoints

  • Introduce a pagination settings API resource. GET/PATCH/DELETE - /indexes/:indexName/settings/pagination
    • maxTotalHits property - Type: Integer - Default to 1000
  • Introduce a faceting settings API resource. GET/PATCH/DELETE - /indexes/:indexName/settings/faceting
    • maxValuesPerFacet property - Type: Integer - Default to 100
  • pagination and faceting objects are accessible from the global settings endpoints /indexes/:indexName/settings

Telemetry Datapoints

  • pagination.max_total_hits and faceting.max_values_per_facet are as a property of the Settings Updated event.
  • Pagination Updated event is introduced with the max_total_hits property.
  • Faceting Updatedevent is introduced with the max_values_per_facet property

Out Of Scope

n/a


Attention To Reviewers

Introduced fields like maxTotalHits and maxValuesPerFacet are open to suggestion.


Misc

  • Update OpenAPI specification file
  • Update telemetry datapoints

@github-actions
Copy link

github-actions bot commented Jun 8, 2022

🤖 API change detected:

Added (6)

  • DELETE /indexes/{indexUid}/settings/faceting
  • DELETE /indexes/{indexUid}/settings/pagination
  • GET /indexes/{indexUid}/settings/faceting
  • GET /indexes/{indexUid}/settings/pagination
  • PATCH /indexes/{indexUid}/settings/faceting
  • PATCH /indexes/{indexUid}/settings/pagination

Modified (2)

  • GET /indexes/{indexUid}/settings
    • Response modified: 200
      • Body attributes added: pagination, faceting
  • PATCH /indexes/{indexUid}/settings
    • Body modified
      • Body attributes added: pagination, faceting

View documentation diff

Powered by Bump

@gmourier gmourier changed the title WIP Settings API - Customize the hard limits for pagination and faceting Jun 8, 2022
@gmourier gmourier added the Ready For Review Feature specification must be reviewed. label Jun 8, 2022
text/0034-telemetry-policies.md Outdated Show resolved Hide resolved
Copy link
Member

@brunoocasali brunoocasali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmourier give a look at the PATCH/PUT definition here I think it is not compatible with this one #152 😄 .

open-api.yaml Show resolved Hide resolved
text/0034-telemetry-policies.md Show resolved Hide resolved
text/0034-telemetry-policies.md Show resolved Hide resolved
bors bot added a commit to meilisearch/milli that referenced this pull request Jun 9, 2022
550: Add the two new pagination and faceting settings r=ManyTheFish a=Kerollmops

This PR adds two new settings in the database, those settings are described [in this spec](meilisearch/specifications#157).

Co-authored-by: Kerollmops <clement@meilisearch.com>
@bidoubiwa
Copy link
Contributor

limited_to

What do you think of max_total_hits or total_hit_limit.

It goes in line with estimatedTotalHits

maxValuesPerFacet

Makes me think we can customize this value for every facet. Like genres: 20, color: 100

If this is not the case, what about something more generic like: facet_values_limit or max_facet_values

?

@dichotommy
Copy link
Contributor

dichotommy commented Jun 9, 2022

I have doubts about the name pagination.

Usually when we speak about pagination, we speak about how results are sent to and displayed on the front end—particularly when the total number of results is broken down into smaller pages.

This setting on the other hand affects only the total number of results returned by the engine. So it doesn't have anything to do with pagination as far as I can tell. Won't it create confusion?

Edit: Just wanted to say I totally understand if the pagination setting is being created because we plan to add additional pagination-related settings in the future, but I'm still not sure that this particular setting (search response maximum) should belong to that category.

@dichotommy
Copy link
Contributor

limited_to
What do you think of max_total_hits or total_hit_limit.

Strong agree with max_total_hits 👍🏻

maxValuesPerFacet
Makes me think we can customize this value for every facet. Like genres: 20, color: 100

For me, the name maxValuesPerFacet is really clear. When I read maxValuesPerFacet: x, I think that the engine will show only the x most common facet values for each attribute defined in filterableAttributes.

bors bot added a commit to meilisearch/meilisearch that referenced this pull request Jun 9, 2022
2494: Introduce the new faceting and pagination settings r=ManyTheFish a=Kerollmops

This PR introduces two new settings following the newly created spec meilisearch/specifications#157:
 - The `faceting.max_values_per_facet` one describes the maximum number of values (each with a count) associated with a value in a facet distribution query.
 - The `pagination.limited_to` one describes the maximum number of documents that a search query can ever return.

Co-authored-by: Kerollmops <clement@meilisearch.com>
@gmourier
Copy link
Member Author

Thanks for your comments on the names! Do you have a better name than pagination knowing that the pain points about this limit have been expressed for this use case and that we will probably add more parameters about it? Unfortunately, I don't have a better suggestion to make.

@dichotommy
Copy link
Contributor

Hi @gmourier , if you know that you will add more pagination-related settings in the future and wish to keep this one within that category, then I completely understand and support the naming decision.

On the other hand, I think this setting can be useful and interesting to people besides those who need it for pagination. One example that you've already brought up is security, to prevent scraping of the database. There may be other cases we haven't considered where users want to enforce a base limit on all search queries without using the parameter. My concern would be that users could miss this setting if it's in an object called pagination.

I think this setting would be easier to find as an int at the root level of the settings resource with a name like resultsLimit, maxResults, maxHits or similar. Just my two cents.

@bidoubiwa
Copy link
Contributor

bidoubiwa commented Jun 15, 2022

Poll time!

Limit the number of hits on which we can paginate

Little context, currently it would be inside a pagination object. For example:

{
   pagination: {
      limitedTo: 100
   }
}

This might change depending on another poll.

  • limitedTo 👍
  • maxTotalHits 🎉
  • totalHitLimit 🚀
  • other 👀

See previous comments for arguments.

@bidoubiwa
Copy link
Contributor

bidoubiwa commented Jun 15, 2022

Maximum values returned by each facet

Little context, the setting is inside a faceting object. For example:

{
   faceting: {
      maxValuesPerFacet: 100
   }
}
  • maxValuesPerFacet 👍
  • facetValuesLimit 🚀
  • maxFacetValues 🎉
  • other 👀

See previous comments for arguments.

@bidoubiwa
Copy link
Contributor

bidoubiwa commented Jun 15, 2022

Pagination object for nested limit on hits

Currently, the limitedTo representing the maximum number of search results on which a user can paginate.

Example:

{
   // other settings
   pagination: {
      limitedTo: 100
   }
}

The question is, do we want to call it pagination do we want to put the setting at the root:

  • Keep pagination as the root object containing limitedTo (name can very based on this poll) 👍
  • Put limitedTo at the root of the settings 🎉
  • other 👀

See previous comments for arguments.

gmourier and others added 5 commits July 7, 2022 17:56
@gmourier gmourier force-pushed the add-settings-to-customize-default-limits branch from fb8b067 to a1cac76 Compare July 7, 2022 15:59
@gmourier gmourier merged commit 63429e0 into develop Jul 7, 2022
@gmourier gmourier deleted the add-settings-to-customize-default-limits branch July 7, 2022 16:00
gmourier added a commit that referenced this pull request Jul 7, 2022
…ng` (#157)

* Introduces specification files

* Update files name

* branch telemetry

* Update open-api.yml

* Update text/0034-telemetry-policies.md

Co-authored-by: Clément Renault <renault.cle@gmail.com>

* update open-api.yml

* Update text/157-faceting-setting-api.md

Co-authored-by: Clément Renault <renault.cle@gmail.com>

* Rename limitedTo to maxTotalHits

* Specify order of returned facet

Co-authored-by: Clément Renault <renault.cle@gmail.com>
gmourier added a commit that referenced this pull request Jul 11, 2022
* Bump open-api.yml to v0.28

* Telemetry - Add `x-meilisearch-client` query parameter (#145)

* Introduce the x-meilisearch-client query parameter

* Update text/0034-telemetry-policies.md

Co-authored-by: Bruno Casali <brunoocasali@gmail.com>

* Update text/0034-telemetry-policies.md

Co-authored-by: Bruno Casali <brunoocasali@gmail.com>

Co-authored-by: Bruno Casali <brunoocasali@gmail.com>

* GeoSearch — Support string type for `_geo` `lat` and `lng` fields (#83)

* Update specification to support string type for _geo lat and lng fields

* mention types mixing for _geo object

* Tasks API - Rename `uid` to `taskUid` in the `202 - Accepted` Summarized Task Response (#144)

* Rename 202 uid to taskUid

* Update text/0060-tasks-api.md

Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>

Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>

* Tasks API - Seek/Keyset based pagination (#115)

* Move cursor based pagination spec to tasks API spec

* remove pagination as a future capability

* Clarify boundaries for limit query parameter

* Update OpenApi specification

* Remove limit field boundaries

* Apply suggestions from code review

Co-authored-by: Clément Renault <renault.cle@gmail.com>

* Update open-api.yml and removes cursor term mentions

* Update text/0060-tasks-api.md

Co-authored-by: Tommy <68053732+dichotommy@users.noreply.github.com>

* Remove  route mention in seek-keyset pagination section

Co-authored-by: Clément Renault <renault.cle@gmail.com>
Co-authored-by: Tommy <68053732+dichotommy@users.noreply.github.com>

* Tasks API - Filter tasks list by `type`/`status`/`indexUid` (#116)

* move filtering tasks by status/type parameter to task api spec

* Update specification

* Add details about case-sensitivy + rework error message

* Introducing naming changes plus make the specification a source of truth instead of a changelog

* Remove a future possibility being introduced

* misc - replace createIndex to the right type and add the missing type field to the 202 Response resource

* Dumps API - Make dump creation an asynchronous task (#139)

* wip

* Make a dump creation a visible asynchronous task

* Add precisions

* Update open-api.yml

* Add ommited type field for summarized task response

* Add future possibilities

* Apply suggestions from code review

Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>

* Precise that indexUid can be null

* Precise priorization of dumpCreation task over other task types

* Keep taskUid for 202 response

* remove dumps.get API key action

Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>

* Search API - Remove/Rename confusing fields (#135)

* Rename nbHits, remove exhaustive* boolean fields

* Rename approximativeNbHits to estimatedTotalHits

* Update open-api.yaml

* Apply naming changes for facet distribution and showing matches position

* Add a telemetry for facet distribution usage

* API Guideline - Return list of API resources under a `results` array (#138)

* Place list of documents under a results array on /documents

* Add results array for indexes object list

* Add the future of indexes pagination

* Update open-api.yml

* Fix typo

* Apply suggestions from code review

Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>

* Add offset/limit pagination for indexes and API keys

* Try to add multipe refs to a response object

Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>

* Remove name field (#140)

* Documents API - `displayedAttributes` should not impact the documents API / Rename `attributesToRetrieve` to `fields` (#143)

* Specifies that displayedAttributes setting does not impact the GET documents endpoint

* Rename attributeToRetrieve to fields on /documents

* Add a future possibily to rejectt a field from a document in the given response

* Precise behavior details about fields query parameter

* Add fields query parameter on GET /indexes/{index}/documents/{docId}

* API Keys - Determinist API Keys + Security changes (#148)

* Add an uid to make API Keys determinists, plus add a non-unique human readable name field to ease reading information

* Describe errors for uid and name fields

* Apply suggestions from code review

Co-authored-by: Bruno Casali <brunoocasali@gmail.com>

* misc: add precisions

* Reorganize route descriptions

* Update error_code when API Key already exists for a given uid

* Apply suggestions from code review

Co-authored-by: Many the fish <legendre.maxime.isn@gmail.com>

* Add new keys actions, remove master-key changes, introduce a new error for immutable field and update tenant token

* Update open-api spec

* Update immutable_field error message

* Apply suggestions from code review

Co-authored-by: Many the fish <legendre.maxime.isn@gmail.com>

* Mention that the Default Admin API Key can manage keys

* Specify that the JWT Tenant Token must be enrypted with the API Key value

* Update the spec regarding the description of the Admin API Key to be up-to-date

* Add uid_or_key url param to update and delete a key

* Update text/0085-api-keys.md

Co-authored-by: Many the fish <legendre.maxime.isn@gmail.com>

* Update text/0085-api-keys.md

Co-authored-by: Many the fish <legendre.maxime.isn@gmail.com>

Co-authored-by: Bruno Casali <brunoocasali@gmail.com>
Co-authored-by: Many the fish <legendre.maxime.isn@gmail.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>

* Geosearch - Enhance lat/lng format error messages (#149)

* Update the geosearch error message

* misc: organiser error message in the right specification

Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com>

* Introduces HTTP Verbs changesto be compliant regarding a Rest API (#152)

* Telemetry - Replace `x-meilisearch-client` query parameter by `X-Meilisearch-Client` header (#150)

* Removes x-meilisearch-client, replace it by a header

* Remove capslock

* fix typo (#151)

* Mention telemetry.meilisearch.com (#153)

* update ranking rules error message (#154)

* Misc — Update dump versions compatibility table (#156)

* Update dump table

* Update text/0105-dumps-api.md

* Settings API - Customize the hard limits for `pagination` and `faceting` (#157)

* Introduces specification files

* Update files name

* branch telemetry

* Update open-api.yml

* Update text/0034-telemetry-policies.md

Co-authored-by: Clément Renault <renault.cle@gmail.com>

* update open-api.yml

* Update text/157-faceting-setting-api.md

Co-authored-by: Clément Renault <renault.cle@gmail.com>

* Rename limitedTo to maxTotalHits

* Specify order of returned facet

Co-authored-by: Clément Renault <renault.cle@gmail.com>

* Add dumpCreation task type to OpenAPI.yml

* Tasks filtering params to be in query instead of path on OpenAPI spec

Co-authored-by: Bruno Casali <brunoocasali@gmail.com>
Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>
Co-authored-by: Clément Renault <renault.cle@gmail.com>
Co-authored-by: Tommy <68053732+dichotommy@users.noreply.github.com>
Co-authored-by: Many the fish <legendre.maxime.isn@gmail.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ad hoc <postma.marin@protonmail.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
This was referenced Jul 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
OpenAPI Update OpenAPI specification. Q3:2022 Ready For Review Feature specification must be reviewed. Telemetry Update the telemetry collect. v0.28
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants