[Alerting + Task Manager] Benchmarking 7.14 #95194

gmmorris · 2021-03-23T16:40:03Z

Following the release of 7.14 and with it the fresh perf work we've done we decided we need to do the following:

Run fresh and comprehensive performance tests on the 7.14 release
Use the results of the perf test as a basis for a sizing your Kibana cluster for alerting blogpost, similar to https://www.elastic.co/blog/benchmarking-and-sizing-your-elasticsearch-cluster-for-logs-and-metrics

note: 2021-08-09: we split item 2 off to a separate issue: [alerting] public facing doc on alerting performance #107979

elasticmachine · 2021-03-23T16:40:22Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

gmmorris · 2021-04-07T09:51:04Z

Following the latest RAC leads sync (held 6th April), I think we should bump this to a later point so we can include Alerts-as-Data in the benchmarks.

This does raise some questions about how we want to measure our API performance.
Should we add timing measurements for the various APIs that rely on Alerts-as-data?
Such as the APIs fetching alerts for the Alert table?

gmmorris · 2021-05-04T10:41:13Z

It looks like ES have been able to address the issue we identified in UpdateByQuery: elastic/elasticsearch#63671

We expect this to address the degradation we've identified when there are more than 30 Kibana instances or so, so it's something we should validate at part of this issue.

gmmorris · 2021-07-01T18:03:35Z

Bumped to 7.14 😬

pmuellr · 2021-08-09T23:27:53Z

The stress tester needed some updates, as ecctl has changed a bit since the last time we ran the tester, and a few enhancements have been added:

commits:

changes:

added 7.14 as the top-level version
changed running alerts at 1m interval, to 3s interval; similar
change to decrease the number of rules we actually create;
thinking is, we can simulate more alerts by running them at a
smaller interval
increased wait for deployments to finish creation from 5 minutes to 10
changed how kibana config is set (change to ecctl data shape)
now finds closest match to platform's choice of RAM usage

pmuellr · 2021-08-09T23:32:34Z

Here are three runs I made with the tester; I didn't see any significant regressions, nor any significant increase in performance - as expected

210809-223758-tm-max-workers-400 - kbn-alert-load report.pdf
compare running with 10, 15, 20 task manager max_workers
210805-154404-deployment-size-1000 - kbn-alert-load report.pdf
compare running with 4, 6, 8, 10 Kibanas, each running with 8GB RAM
210805-135403-stack-versions-500 - kbn-alert-load report.pdf
compare running 7.14, 7.13, 7.12, 7.11

gmmorris added Feature:Alerting Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Mar 23, 2021

gmmorris changed the title ~~[Alerting + Task Manager] Benchmarking 7.12 and blogpost~~ [Alerting + Task Manager] Benchmarking 7.13 and blogpost May 4, 2021

mikecote mentioned this issue May 12, 2021

Test impact of using refresh: false for task manager internals #99444

Closed

gmmorris changed the title ~~[Alerting + Task Manager] Benchmarking 7.13 and blogpost~~ [Alerting + Task Manager] Benchmarking 7.14 and blogpost Jul 1, 2021

gmmorris added the loe:needs-research This issue requires some research before it can be worked on or estimated label Jul 6, 2021

chrisronline mentioned this issue Jul 12, 2021

[Alerting] Next steps for O11y of Alerting #105306

Open

gmmorris added the resilience Issues related to Platform resilience in terms of scale, performance & backwards compatibility label Jul 15, 2021

pmuellr self-assigned this Aug 3, 2021

pmuellr mentioned this issue Aug 9, 2021

[alerting] public facing doc on alerting performance #107979

Open

pmuellr changed the title ~~[Alerting + Task Manager] Benchmarking 7.14 and blogpost~~ [Alerting + Task Manager] Benchmarking 7.14 Aug 9, 2021

pmuellr closed this as completed Aug 9, 2021

gmmorris added the estimate:needs-research Estimated as too large and requires research to break down into workable issues label Aug 18, 2021

gmmorris removed the loe:needs-research This issue requires some research before it can be worked on or estimated label Sep 2, 2021

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Alerting + Task Manager] Benchmarking 7.14 #95194

[Alerting + Task Manager] Benchmarking 7.14 #95194

gmmorris commented Mar 23, 2021 •

edited by pmuellr

Loading

elasticmachine commented Mar 23, 2021

gmmorris commented Apr 7, 2021

gmmorris commented May 4, 2021

gmmorris commented Jul 1, 2021

pmuellr commented Aug 9, 2021

pmuellr commented Aug 9, 2021

[Alerting + Task Manager] Benchmarking 7.14 #95194

[Alerting + Task Manager] Benchmarking 7.14 #95194

Comments

gmmorris commented Mar 23, 2021 • edited by pmuellr Loading

elasticmachine commented Mar 23, 2021

gmmorris commented Apr 7, 2021

gmmorris commented May 4, 2021

gmmorris commented Jul 1, 2021

pmuellr commented Aug 9, 2021

pmuellr commented Aug 9, 2021

gmmorris commented Mar 23, 2021 •

edited by pmuellr

Loading