-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expensive queries are causing unnecessary load and delays on Elasticsearch #93770
Comments
Pinging @elastic/kibana-core (Team:Core) |
#91175 has been addressed, so teams should now be unblocked on moving forward with the short-term fixes outlined here. |
None of the APM items listed are querying saved object indices. Checked them off. |
…s on Elasticsearch Part of elastic#93770
…lasticsearch Part of: elastic#93770
…lasticsearch Part of: elastic#93770
…elays on Elasticsearch Part of: elastic#93770
…lasticsearch (#98914) * [TSVB] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: #93770 * remove globalConfig * fix tests * fix finder.close * cleanup code * run queries concurrently * add namespaces: ['*'], Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
…lasticsearch (elastic#98914) * [TSVB] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: elastic#93770 * remove globalConfig * fix tests * fix finder.close * cleanup code * run queries concurrently * add namespaces: ['*'], Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
…elays on Elasticsearch (elastic#99031) * [Visualizations] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: elastic#93770 * fix CI * fix typo * fix namespaces issue * fix tests Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
…lasticsearch (elastic#99023) * [Vega] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: elastic#93770 * Update get_usage_collector.test.ts * add namespaces: ['*'] Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
…s on Elasticsearch (#98903) * [Data Table] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of #93770 * remove extra cycles * fix PR comments * fix finder.close * code cleanup * add namespaces: ['*'], * fix jest Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
…s on Elasticsearch (elastic#98903) * [Data Table] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of elastic#93770 * remove extra cycles * fix PR comments * fix finder.close * code cleanup * add namespaces: ['*'], * fix jest Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
…lasticsearch (#98914) (#110446) * [TSVB] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: #93770 * remove globalConfig * fix tests * fix finder.close * cleanup code * run queries concurrently * add namespaces: ['*'], Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
…lasticsearch (#99023) (#110448) * [Vega] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: #93770 * Update get_usage_collector.test.ts * add namespaces: ['*'] Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
…elays on Elasticsearch (#99031) (#110447) * [Visualizations] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: #93770 * fix CI * fix typo * fix namespaces issue * fix tests Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
…s on Elasticsearch (#98903) (#110457) * [Data Table] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of #93770 * remove extra cycles * fix PR comments * fix finder.close * code cleanup * add namespaces: ['*'], * fix jest Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
All the items for @elastic/kibana-telemetry are listed in #96715. We'll try to prioritize them. |
Just FYI, security solution platform has this ticket we've been trying to get to of moving over to PIT - #103944 |
…int in Time) and restructuring of folders (#124912) ## Summary Changes the usage collector telemetry within security solutions to use PIT (Point in Time) and a few other bug fixes and restructuring. * The main goal is to change the full queries for up to 10k items to be instead using 1k batched items at a time and PIT (Point in Time). See [this ticket](#93770) for more information and [here](#99031) for an example where they changed there code to use 1k batched items. I use PIT with SO object API, searches, and then composite aggregations which all support the PIT. The PIT timeouts are all set to 5 minutes and all the maximums of 10k to not increase memory more is still there. However, we should be able to increase the 10k limit at this point if we wanted to for usage collector to count beyond the 10k. The initial 10k was an elastic limitation that PIT now avoids. * This also fixes a bug where the aggregations were only returning the top 10 items instead of the full 10k. That is changed to use [composite aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html). * This restructuring the folder structure to try and do [reductionism](https://en.wikipedia.org/wiki/Reductionism#In_computer_science) best we can. I could not do reductionism with the schema as the tooling does not allow it. But the rest is self-repeating in the way hopefully developers expect it to be. And also make it easier for developers to add new telemetry usage collector counters in the same fashion. * This exchanges the hand spun TypeScript types in favor of using the `caseComments` and the `Sanitized Alerts` and the `ML job types` using Partial and other TypeScript tricks. * This removes the [Cyclomatic Complexity](https://en.wikipedia.org/wiki/Cyclomatic_complexity) warnings coming from the linters by breaking down the functions into smaller units. * This removes the "as casts" in all but 1 area which can lead to subtle TypeScript problems. * This pushes down the logger and uses the logger to report errors and some debug information ### Checklist Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
… from saved objects to exception lists (#125182) ## Summary Exposes the functionality of * search_after * point in time (pit) From saved objects to the exception lists. This _DOES NOT_ expose these to the REST API just yet. Rather this exposes it at the API level to start with and changes code that had hard limits of 10k and other limited loops. I use the batching of 1k for this at a time as I thought that would be a decent batch guess and I see other parts of the code changed to it. It's easy to change the 1k if we find we need to throttle back more as we get feedback from others. See this PR where `PIT` and `search_after` were first introduced: #89915 See these 2 issues where we should be using more paging and PIT (Point in Time) with search_after: #93770 #103944 The new methods added to the `exception_list_client.ts` client class are: * openPointInTime * closePointInTime * findExceptionListItemPointInTimeFinder * findExceptionListPointInTimeFinder * findExceptionListsItemPointInTimeFinder * findValueListExceptionListItemsPointInTimeFinder The areas of functionality that have been changed: * Exception list exports * Deletion of lists * Getting exception list items when generating signals Note that currently we use our own ways of looping over the saved objects which you can see in the codebase such as this older way below which does work but had a limitation of 10k against saved objects and did not do point in time (PIT) Older way example (deprecated): ```ts let page = 1; let ids: string[] = []; let foundExceptionListItems = await findExceptionListItem({ filter: undefined, listId, namespaceType, page, perPage: PER_PAGE, pit: undefined, savedObjectsClient, searchAfter: undefined, sortField: 'tie_breaker_id', sortOrder: 'desc', }); while (foundExceptionListItems != null && foundExceptionListItems.data.length > 0) { ids = [ ...ids, ...foundExceptionListItems.data.map((exceptionListItem) => exceptionListItem.id), ]; page += 1; foundExceptionListItems = await findExceptionListItem({ filter: undefined, listId, namespaceType, page, perPage: PER_PAGE, pit: undefined, savedObjectsClient, searchAfter: undefined, sortField: 'tie_breaker_id', sortOrder: 'desc', }); } return ids; ``` But now that is replaced with this newer way using PIT: ```ts // Stream the results from the Point In Time (PIT) finder into this array let ids: string[] = []; const executeFunctionOnStream = (response: FoundExceptionListItemSchema): void => { const responseIds = response.data.map((exceptionListItem) => exceptionListItem.id); ids = [...ids, ...responseIds]; }; await findExceptionListItemPointInTimeFinder({ executeFunctionOnStream, filter: undefined, listId, maxSize: undefined, // NOTE: This is unbounded when it is "undefined" namespaceType, perPage: 1_000, savedObjectsClient, sortField: 'tie_breaker_id', sortOrder: 'desc', }); return ids; ``` We also have areas of code that has perPage listed at 10k or a constant that represents 10k which this removes in most areas (but not all areas): ```ts const items = await client.findExceptionListsItem({ listId: listIds, namespaceType: namespaceTypes, page: 1, pit: undefined, perPage: MAX_EXCEPTION_LIST_SIZE, // <--- Really bad to send in 10k per page at a time searchAfter: undefined, filter: [], sortOrder: undefined, sortField: undefined, }); ``` That is now: ```ts // Stream the results from the Point In Time (PIT) finder into this array let items: ExceptionListItemSchema[] = []; const executeFunctionOnStream = (response: FoundExceptionListItemSchema): void => { items = [...items, ...response.data]; }; await client.findExceptionListsItemPointInTimeFinder({ executeFunctionOnStream, listId: listIds, namespaceType: namespaceTypes, perPage: 1_000, filter: [], maxSize: undefined, // NOTE: This is unbounded when it is "undefined" sortOrder: undefined, sortField: undefined, }); ``` Left over areas will be handled in separate PR's because they are in other people's code ownership areas. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
All the @elastic/kibana-telemetry items are handled in #135689 |
Until #89915 (v7.12.0) saved objects didn't support paging through large result sets. Now that we have
_search_after
support, plugins who previously paged through "all" results by settingsize: 10000
should be refactored to use search after instead.The problem with creating searches with large batches of 10000 is that it blocks the Elasticsearch thread pool for a long time which negatively impacts the performance of other search queries. Since Kibana started using system indices for the saved objects index in 7.11, this has had a much bigger impact because these searches share a thread pool with the security index. Paging with smaller batches means faster responses per request, allowing the thread pool to interleave Kibana searches with other requests.
In addition to the performance impact on Elasticsearch, large searches also mean large response payloads which blocks the Kibana thread for an extended amount of time. This causes spikes in the event loop delay which impacts the performance of all plugins.
Short term: fix all 10k searches against the saved object indices
The following is a list of plugins performing searches with
perPage: 10000
. Please audit each occurrence and mark the task as complete with a link to the PR once it has been resolved. These links are based on a quick search, if the linked code isn't searching against a saved objects index withsize > 1000
please mark the item as done.Blocked on #91175 because that will make it significantly easier for teams to address these issues.Done. Here are docs on the new point-in-time finder.
getListItemByValues
to use PIT #127688searchListItemByValues
to use PIT #127689@elastic/security-solution https://github.com/elastic/kibana/blob/master/x-pack/plugins/security_solution/public/cases/components/case_view/helpers.ts#L39findPreviousThresholdSignals
to use PIT #127690Medium term
savedObjects:listingLimit
advanced setting with a better UI pattern since users sometimes set this to 10k (some more context in Reassign ownership of plugins/saved_objects away from Core team #46435)The text was updated successfully, but these errors were encountered: