Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solutions] Improves query performance of first and last events #91790

Merged

Conversation

FrankHassanabad
Copy link
Contributor

@FrankHassanabad FrankHassanabad commented Feb 18, 2021

Summary

Fixes performance issues and timeouts with first and last events. Previously we were using aggregations and max/min but that is slower in orders of magnitude vs. using sorting + size: 1 + track_total_hits: false. This PR also splits out the aggregate of first/last into two separate calls as we were calling the same aggregate twice for first/last twice rather than calling one query for first even and a second query for last event which is also faster when you don't use an aggregate even if you ran them multiple times back to back compared to using aggregates for min/max's.

For how we determined performance numbers quantifiably and how we did profiling see this comment:
#91269 (comment)

Shoutout to @dplumlee for the help and the first cut and draft of this PR that I morphed into this PR.

For any manual testing, please test:

  • Hosts overview page and look at "last events"
  • Network overview page and look at "last events"
  • "last events" on the timeline
  • Host details page the first and last seen event for a particular host. The more records the better.
  • Network details page and the first and last seen for a particular network. 127.0.0.1 is good as that has a lot of records.

For qualitative viewing you should see a noticeable difference like so where the first seen/last seen/last events show up quicker on detailed pages from the left side compared to the right side:
quantify_perf

Checklist

Delete any items that are not applicable to this PR.

@FrankHassanabad FrankHassanabad changed the title Improve query perf times [Security Solutions] Improves query performance of first and last events Feb 18, 2021
@FrankHassanabad FrankHassanabad marked this pull request as ready for review February 18, 2021 04:57
@FrankHassanabad FrankHassanabad requested a review from a team as a code owner February 18, 2021 04:57
@FrankHassanabad FrankHassanabad self-assigned this Feb 18, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@FrankHassanabad FrankHassanabad added Feature:SecurityOverview Security Solution Overview feature v7.12.0 v8.0.0 labels Feb 18, 2021
@FrankHassanabad FrankHassanabad linked an issue Feb 18, 2021 that may be closed by this pull request
Copy link
Contributor

@dplumlee dplumlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

way faster results 🚀 looks good

firstSeen: get('firstSeen.value_as_string', aggregations),
lastSeen: get('lastSeen.value_as_string', aggregations),
};
if (options.order === 'asc') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice fix with the double query

Copy link
Contributor

@peluja1012 peluja1012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@FrankHassanabad FrankHassanabad added the auto-backport Deprecated - use backport:version if exact versions are needed label Feb 18, 2021
@FrankHassanabad FrankHassanabad enabled auto-merge (squash) February 18, 2021 06:39
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 7.7MB 7.7MB +100.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@FrankHassanabad FrankHassanabad merged commit 0472367 into elastic:master Feb 18, 2021
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Feb 18, 2021
…nts (elastic#91790)

## Summary

Fixes performance issues and timeouts with first and last events. Previously we were using aggregations and max/min but that is slower in orders of magnitude vs. using sorting + size: 1 + track_total_hits: false. This PR also splits out the aggregate of first/last into two separate calls as we were calling the same aggregate twice for first/last twice rather than calling one query for first even and a second query for last event which is also faster when you don't use an aggregate even if you ran them multiple times back to back compared to using aggregates for min/max's.

For how we determined performance numbers quantifiably and how we did profiling see this comment:
elastic#91269 (comment)

Shoutout to @dplumlee for the help and the first cut and draft of this PR that I morphed into this PR.

For any manual testing, please test:
* Hosts overview page and look at "last events"
* Network overview page and look at "last events"
* "last events" on the timeline
* Host details page the first and last seen event for a particular host. The more records the better.
* Network details page and the first and last seen for a particular network. 127.0.0.1 is good as that has a lot of records. 

For qualitative viewing you should see a noticeable difference like so where the first seen/last seen/last events show up quicker on detailed pages from the left side compared to the right side:
![quantify_perf](https://user-images.githubusercontent.com/1151048/108309335-99773600-716e-11eb-8dae-c289947b188c.gif)

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
@kibanamachine
Copy link
Contributor

💚 Backport successful

7.x / #91797

Successful backport PRs will be merged automatically after passing CI.

@FrankHassanabad FrankHassanabad deleted the improve-query-perf-times branch February 18, 2021 07:24
kibanamachine added a commit that referenced this pull request Feb 18, 2021
…nts (#91790) (#91797)

## Summary

Fixes performance issues and timeouts with first and last events. Previously we were using aggregations and max/min but that is slower in orders of magnitude vs. using sorting + size: 1 + track_total_hits: false. This PR also splits out the aggregate of first/last into two separate calls as we were calling the same aggregate twice for first/last twice rather than calling one query for first even and a second query for last event which is also faster when you don't use an aggregate even if you ran them multiple times back to back compared to using aggregates for min/max's.

For how we determined performance numbers quantifiably and how we did profiling see this comment:
#91269 (comment)

Shoutout to @dplumlee for the help and the first cut and draft of this PR that I morphed into this PR.

For any manual testing, please test:
* Hosts overview page and look at "last events"
* Network overview page and look at "last events"
* "last events" on the timeline
* Host details page the first and last seen event for a particular host. The more records the better.
* Network details page and the first and last seen for a particular network. 127.0.0.1 is good as that has a lot of records. 

For qualitative viewing you should see a noticeable difference like so where the first seen/last seen/last events show up quicker on detailed pages from the left side compared to the right side:
![quantify_perf](https://user-images.githubusercontent.com/1151048/108309335-99773600-716e-11eb-8dae-c289947b188c.gif)

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

Co-authored-by: Frank Hassanabad <frank.hassanabad@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed Feature:SecurityOverview Security Solution Overview feature performance release_note:fix Team:Detections and Resp Security Detection Response Team v7.12.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Security Solution] Search Strategy slowdown
5 participants