Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] Investigate risk of performance regressions from share-capable saved object types #115197

Closed
mikecote opened this issue Oct 15, 2021 · 2 comments
Assignees
Labels
estimate:needs-research Estimated as too large and requires research to break down into workable issues Feature:Actions Feature:Alerting performance Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.0.0

Comments

@mikecote
Copy link
Contributor

mikecote commented Oct 15, 2021

See #113743

We should investigate the performance impact on our usage of the SO APIs (running rules, CRUD rules, etc). We should also work with the @elastic/security-detections-response team and see if they experience performance regressions when doing bulk operations using our rules client (ex: enable all).

@mikecote mikecote added performance Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Oct 15, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@gmmorris gmmorris added estimate:needs-research Estimated as too large and requires research to break down into workable issues Feature:Actions Feature:Alerting labels Oct 20, 2021
@lizozom lizozom changed the title [Alerting] Investigate risk of performance regression from share-capable saved object types [Alerting] Investigate risk of performance regressions from share-capable saved object types Nov 10, 2021
@chrisronline chrisronline self-assigned this Nov 16, 2021
@chrisronline
Copy link
Contributor

I did some investigation and synced with @mikecote on the approach here.

Rather than dive into each individual call and understand the exact performance implications from the start, we decided to look holistically at performance before and after these changes.

As a way to start, we want to measure performance of operations that might happen quite frequently and small increases in underlying function calls could make a significant difference.

To test the changes, I used master as is which reflects using the code with potential latency and then I tested it against master but I changed all namespaceType: 'multiple-isolated' to namespaceType: 'single' (thanks to the tip from @joeportner) and ran the same scenario (Note: I started fresh with each creation scenario where I spun up a brand new ES)

I looked at three main areas:

  1. Mass creation of connectors and rules
  2. Mass update of connectors and rules
  3. TM health stats with mass amount of rules/connectors running

For my testing, I used 200 rules and connectors (one unique connector for each unique rule)

namespaceType Creation time Update time p50 drift
multiple-isolated 643s 420s 62169s
single 637s 422s 63093s

As a result of this testing, I'm concluding that there isn't a significant performance change with the new share-capable related changes.

@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
estimate:needs-research Estimated as too large and requires research to break down into workable issues Feature:Actions Feature:Alerting performance Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.0.0
Projects
None yet
Development

No branches or pull requests

5 participants