Expensive queries are causing unnecessary load and delays on Elasticsearch #93770

rudolf · 2021-03-05T13:51:00Z

Until #89915 (v7.12.0) saved objects didn't support paging through large result sets. Now that we have _search_after support, plugins who previously paged through "all" results by setting size: 10000 should be refactored to use search after instead.

The problem with creating searches with large batches of 10000 is that it blocks the Elasticsearch thread pool for a long time which negatively impacts the performance of other search queries. Since Kibana started using system indices for the saved objects index in 7.11, this has had a much bigger impact because these searches share a thread pool with the security index. Paging with smaller batches means faster responses per request, allowing the thread pool to interleave Kibana searches with other requests.

In addition to the performance impact on Elasticsearch, large searches also mean large response payloads which blocks the Kibana thread for an extended amount of time. This causes spikes in the event loop delay which impacts the performance of all plugins.

Short term: fix all 10k searches against the saved object indices

The following is a list of plugins performing searches with perPage: 10000. Please audit each occurrence and mark the task as complete with a link to the PR once it has been resolved. These links are based on a quick search, if the linked code isn't searching against a saved objects index with size > 1000 please mark the item as done.

~~Blocked on #91175 because that will make it significantly easier for teams to address these issues.~~
Done. Here are docs on the new point-in-time finder.

Medium term

Introduce a soft-limit that throws if Kibana searches with a size > 1000 in development mode
Introduce better traceability
- Stack traces to locate the exact line that created a request which exceeded the soft-limit
- Identify the plugin that initiated a request Include the calling plugin ID in a request header on all Elasticsearch API calls #77214
Replace savedObjects:listingLimit advanced setting with a better UI pattern since users sometimes set this to 10k (some more context in Reassign ownership of plugins/saved_objects away from Core team #46435)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-03-05T13:51:02Z

Pinging @elastic/kibana-core (Team:Core)

lukeelmers · 2021-03-25T14:38:05Z

#91175 has been addressed, so teams should now be unblocked on moving forward with the short-term fixes outlined here.

smith · 2021-03-31T15:01:48Z

None of the APM items listed are querying saved object indices. Checked them off.

…s on Elasticsearch Part of elastic#93770

…lasticsearch Part of: elastic#93770

…elays on Elasticsearch Part of: elastic#93770

alexwizp · 2021-05-05T11:59:33Z

changes related to the KibanaApp team are ready (#99031, #99023, #98914, #98903) but blocked by #99044

…lasticsearch (#98914) * [TSVB] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: #93770 * remove globalConfig * fix tests * fix finder.close * cleanup code * run queries concurrently * add namespaces: ['*'], Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

…lasticsearch (elastic#98914) * [TSVB] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: elastic#93770 * remove globalConfig * fix tests * fix finder.close * cleanup code * run queries concurrently * add namespaces: ['*'], Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

…elays on Elasticsearch (#99031) * [Visualizations] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: #93770 * fix CI * fix typo * fix namespaces issue * fix tests Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

…elays on Elasticsearch (elastic#99031) * [Visualizations] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: elastic#93770 * fix CI * fix typo * fix namespaces issue * fix tests Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

…lasticsearch (#99023) * [Vega] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: #93770 * Update get_usage_collector.test.ts * add namespaces: ['*'] Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

…lasticsearch (elastic#99023) * [Vega] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: elastic#93770 * Update get_usage_collector.test.ts * add namespaces: ['*'] Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

…s on Elasticsearch (#98903) * [Data Table] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of #93770 * remove extra cycles * fix PR comments * fix finder.close * code cleanup * add namespaces: ['*'], * fix jest Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

…s on Elasticsearch (elastic#98903) * [Data Table] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of elastic#93770 * remove extra cycles * fix PR comments * fix finder.close * code cleanup * add namespaces: ['*'], * fix jest Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

…lasticsearch (#98914) (#110446) * [TSVB] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: #93770 * remove globalConfig * fix tests * fix finder.close * cleanup code * run queries concurrently * add namespaces: ['*'], Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

…lasticsearch (#99023) (#110448) * [Vega] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: #93770 * Update get_usage_collector.test.ts * add namespaces: ['*'] Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

…elays on Elasticsearch (#99031) (#110447) * [Visualizations] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: #93770 * fix CI * fix typo * fix namespaces issue * fix tests Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

…s on Elasticsearch (#98903) (#110457) * [Data Table] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of #93770 * remove extra cycles * fix PR comments * fix finder.close * code cleanup * add namespaces: ['*'], * fix jest Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

afharo · 2022-01-27T16:15:13Z

All the items for @elastic/kibana-telemetry are listed in #96715. We'll try to prioritize them.

yctercero · 2022-01-27T19:12:34Z

Just FYI, security solution platform has this ticket we've been trying to get to of moving over to PIT - #103944

…int in Time) and restructuring of folders (#124912) ## Summary Changes the usage collector telemetry within security solutions to use PIT (Point in Time) and a few other bug fixes and restructuring. * The main goal is to change the full queries for up to 10k items to be instead using 1k batched items at a time and PIT (Point in Time). See [this ticket](#93770) for more information and [here](#99031) for an example where they changed there code to use 1k batched items. I use PIT with SO object API, searches, and then composite aggregations which all support the PIT. The PIT timeouts are all set to 5 minutes and all the maximums of 10k to not increase memory more is still there. However, we should be able to increase the 10k limit at this point if we wanted to for usage collector to count beyond the 10k. The initial 10k was an elastic limitation that PIT now avoids. * This also fixes a bug where the aggregations were only returning the top 10 items instead of the full 10k. That is changed to use [composite aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html). * This restructuring the folder structure to try and do [reductionism](https://en.wikipedia.org/wiki/Reductionism#In_computer_science) best we can. I could not do reductionism with the schema as the tooling does not allow it. But the rest is self-repeating in the way hopefully developers expect it to be. And also make it easier for developers to add new telemetry usage collector counters in the same fashion. * This exchanges the hand spun TypeScript types in favor of using the `caseComments` and the `Sanitized Alerts` and the `ML job types` using Partial and other TypeScript tricks. * This removes the [Cyclomatic Complexity](https://en.wikipedia.org/wiki/Cyclomatic_complexity) warnings coming from the linters by breaking down the functions into smaller units. * This removes the "as casts" in all but 1 area which can lead to subtle TypeScript problems. * This pushes down the logger and uses the logger to report errors and some debug information ### Checklist Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

… from saved objects to exception lists (#125182) ## Summary Exposes the functionality of * search_after * point in time (pit) From saved objects to the exception lists. This _DOES NOT_ expose these to the REST API just yet. Rather this exposes it at the API level to start with and changes code that had hard limits of 10k and other limited loops. I use the batching of 1k for this at a time as I thought that would be a decent batch guess and I see other parts of the code changed to it. It's easy to change the 1k if we find we need to throttle back more as we get feedback from others. See this PR where `PIT` and `search_after` were first introduced: #89915 See these 2 issues where we should be using more paging and PIT (Point in Time) with search_after: #93770 #103944 The new methods added to the `exception_list_client.ts` client class are: * openPointInTime * closePointInTime * findExceptionListItemPointInTimeFinder * findExceptionListPointInTimeFinder * findExceptionListsItemPointInTimeFinder * findValueListExceptionListItemsPointInTimeFinder The areas of functionality that have been changed: * Exception list exports * Deletion of lists * Getting exception list items when generating signals Note that currently we use our own ways of looping over the saved objects which you can see in the codebase such as this older way below which does work but had a limitation of 10k against saved objects and did not do point in time (PIT) Older way example (deprecated): ```ts let page = 1; let ids: string[] = []; let foundExceptionListItems = await findExceptionListItem({ filter: undefined, listId, namespaceType, page, perPage: PER_PAGE, pit: undefined, savedObjectsClient, searchAfter: undefined, sortField: 'tie_breaker_id', sortOrder: 'desc', }); while (foundExceptionListItems != null && foundExceptionListItems.data.length > 0) { ids = [ ...ids, ...foundExceptionListItems.data.map((exceptionListItem) => exceptionListItem.id), ]; page += 1; foundExceptionListItems = await findExceptionListItem({ filter: undefined, listId, namespaceType, page, perPage: PER_PAGE, pit: undefined, savedObjectsClient, searchAfter: undefined, sortField: 'tie_breaker_id', sortOrder: 'desc', }); } return ids; ``` But now that is replaced with this newer way using PIT: ```ts // Stream the results from the Point In Time (PIT) finder into this array let ids: string[] = []; const executeFunctionOnStream = (response: FoundExceptionListItemSchema): void => { const responseIds = response.data.map((exceptionListItem) => exceptionListItem.id); ids = [...ids, ...responseIds]; }; await findExceptionListItemPointInTimeFinder({ executeFunctionOnStream, filter: undefined, listId, maxSize: undefined, // NOTE: This is unbounded when it is "undefined" namespaceType, perPage: 1_000, savedObjectsClient, sortField: 'tie_breaker_id', sortOrder: 'desc', }); return ids; ``` We also have areas of code that has perPage listed at 10k or a constant that represents 10k which this removes in most areas (but not all areas): ```ts const items = await client.findExceptionListsItem({ listId: listIds, namespaceType: namespaceTypes, page: 1, pit: undefined, perPage: MAX_EXCEPTION_LIST_SIZE, // <--- Really bad to send in 10k per page at a time searchAfter: undefined, filter: [], sortOrder: undefined, sortField: undefined, }); ``` That is now: ```ts // Stream the results from the Point In Time (PIT) finder into this array let items: ExceptionListItemSchema[] = []; const executeFunctionOnStream = (response: FoundExceptionListItemSchema): void => { items = [...items, ...response.data]; }; await client.findExceptionListsItemPointInTimeFinder({ executeFunctionOnStream, listId: listIds, namespaceType: namespaceTypes, perPage: 1_000, filter: [], maxSize: undefined, // NOTE: This is unbounded when it is "undefined" sortOrder: undefined, sortField: undefined, }); ``` Left over areas will be handled in separate PR's because they are in other people's code ownership areas. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

afharo · 2022-07-05T11:01:56Z

All the @elastic/kibana-telemetry items are handled in #135689

rudolf added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Mar 5, 2021

rudolf changed the title ~~[wip] Expensive queries are causing unnecessary load and delays on Elasticsearch~~ Expensive queries are causing unnecessary load and delays on Elasticsearch Mar 9, 2021

lukeelmers mentioned this issue Mar 9, 2021

[core.savedObjects] Add helper for using find with pit and search_after. #92981

Merged

rudolf mentioned this issue Mar 10, 2021

Reassign ownership of plugins/saved_objects away from Core team #46435

Open

20 tasks

alexwizp added a commit to alexwizp/kibana that referenced this issue Apr 30, 2021

[Data Table] Expensive queries are causing unnecessary load and delay…

f6795f5

…s on Elasticsearch Part of elastic#93770

alexwizp mentioned this issue Apr 30, 2021

[Data Table] Expensive queries are causing unnecessary load and delays on Elasticsearch #98903

Merged

alexwizp added a commit to alexwizp/kibana that referenced this issue Apr 30, 2021

[TSVB] Expensive queries are causing unnecessary load and delays on E…

be14769

…lasticsearch Part of: elastic#93770

alexwizp mentioned this issue Apr 30, 2021

[TSVB] Expensive queries are causing unnecessary load and delays on Elasticsearch #98914

Merged

alexwizp added a commit to alexwizp/kibana that referenced this issue May 3, 2021

[Vega] Expensive queries are causing unnecessary load and delays on E…

1740605

…lasticsearch Part of: elastic#93770

alexwizp mentioned this issue May 3, 2021

[Vega] Expensive queries are causing unnecessary load and delays on Elasticsearch #99023

Merged

alexwizp added a commit to alexwizp/kibana that referenced this issue May 3, 2021

[Visualizations] Expensive queries are causing unnecessary load and d…

57dcc8b

…elays on Elasticsearch Part of: elastic#93770

alexwizp mentioned this issue May 3, 2021

[Visualizations] Expensive queries are causing unnecessary load and delays on Elasticsearch #99031

Merged

jportner mentioned this issue May 12, 2021

Sharing saved objects phase 3 #94383

Merged

Dosant mentioned this issue May 27, 2021

[Index Patterns] Use deprecation api for scripted fields #100781

Merged

2 tasks

majagrubic removed the Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. label Dec 16, 2021

mikecote mentioned this issue Dec 17, 2021

[Alerting] Expensive queries by the alerting framework are causing unnecessary load and delays on Elasticsearch #121563

Closed

afharo mentioned this issue Jan 14, 2022

[Meta][Telemetry] Reduce telemetry footprint #119466

Closed

19 tasks

yctercero mentioned this issue Jan 27, 2022

[Security Solution][Detections Engine] Update search_after to use PIT to work with tie breakers and issues in @timestamp fields and overrides #103944

Closed

afharo mentioned this issue Jan 31, 2022

[Osquery] Add telemetry for packs and saved queries #122501

Merged

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

lukasolson mentioned this issue Jan 31, 2022

Refactor saved query request to not set page size of 10000 #124187

Merged

9 tasks

This was referenced Feb 8, 2022

[Security Solutions] Updates usage collector telemetry to use PIT (Point in Time) and restructuring of folders #124912

Merged

[Security Solutions] Exposes the search_after and point in time (pit) from saved objects to exception lists #125182

Merged

afharo mentioned this issue Mar 15, 2022

[Monitoring] telemetry fetchers use broken pagination logic #91654

Closed

afharo mentioned this issue Jul 5, 2022

[Usage Collection] Use PIT for collecting UI/Usage Counters & AppUsage #135689

Merged

1 task

afharo mentioned this issue Jul 8, 2022

[Telemetry] Instrument APM around snapshot telemetry generation #135922

Open

petrklapka added Feature:Search Querying infrastructure in Kibana Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. and removed Team:AppServicesSv labels Nov 23, 2022

rudolf mentioned this issue Oct 6, 2023

Saved objects point in time finder: maximum total response size circuit breaker #168244

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expensive queries are causing unnecessary load and delays on Elasticsearch #93770

Expensive queries are causing unnecessary load and delays on Elasticsearch #93770

rudolf commented Mar 5, 2021 •

edited by afharo

Loading

elasticmachine commented Mar 5, 2021

lukeelmers commented Mar 25, 2021

smith commented Mar 31, 2021

alexwizp commented May 5, 2021 •

edited

Loading

afharo commented Jan 27, 2022

yctercero commented Jan 27, 2022

afharo commented Jul 5, 2022

Expensive queries are causing unnecessary load and delays on Elasticsearch #93770

Expensive queries are causing unnecessary load and delays on Elasticsearch #93770

Comments

rudolf commented Mar 5, 2021 • edited by afharo Loading

Short term: fix all 10k searches against the saved object indices

Medium term

elasticmachine commented Mar 5, 2021

lukeelmers commented Mar 25, 2021

smith commented Mar 31, 2021

alexwizp commented May 5, 2021 • edited Loading

afharo commented Jan 27, 2022

yctercero commented Jan 27, 2022

afharo commented Jul 5, 2022

rudolf commented Mar 5, 2021 •

edited by afharo

Loading

alexwizp commented May 5, 2021 •

edited

Loading