Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Detections] Adds Bulk edit API #120472

Merged
merged 54 commits into from
Jan 6, 2022

Conversation

vitaliidm
Copy link
Contributor

@vitaliidm vitaliidm commented Dec 6, 2021

Summary

  • Adds bulk update API on server side by introducing new bulk action edit to /rules/_bulk_action API. Draft documentation is available by [Security Solution][Detections]Rules _bulk_action edit API security-docs#1317
  • Added unit and functional tests
  • Added actions for editing: tags, index, timeline(the rest of required actions can be added when needed)
  • Little cleanup of updateRules/patchRules typing(removed unused props)
  • Limited concurrency to 5, for /rules/_bulk_action API
  • Moved security_solution/server/endpoint/routes/limited_concurrency.ts to security_solution/server/routes/limited_concurrency.ts . Cleaned up unused constants in endpoint directory
  • Refactored registerLimitedConcurrencyRoutes method: rate limit is applying per route path now, rather than for all routes, as previously
  • Added error handling for single rule update in _bulk_action API, so if one rule action fails, it won't fail the whole request. Caught errors for single are returning in response
  • Added chunking of concurrent update requests. Rules are split into chunks of 50 rules, rules within each chunk are processed in parallel, chunks are processed sequentially.

Example

curl --location --request POST 'http://localhost:5601/kbn/api/detection_engine/rules/_bulk_action' \
--header 'kbn-xsrf: reporting' \
--header 'Authorization: Basic PUT_ENCODED_USER:PASSWORD' \
--header 'Content-Type: application/json' \
--data-raw '{
    "action": "edit",
    "query": "",
    "edit":  [{
        "type": "add_tags",
        "value": ["sample_tag"]
    }
    ]
}'

where PUT_ENCODED_USER:PASSWORD equals to base64 encoded string 'user:password'

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@vitaliidm vitaliidm self-assigned this Dec 6, 2021
@vitaliidm vitaliidm added v8.1.0 Feature:Rule Management Security Solution Detection Rule Management Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team labels Dec 6, 2021
@vitaliidm
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

merge conflict between base and head

@vitaliidm
Copy link
Contributor Author

@elasticmachine merge upstream

@elastic elastic deleted a comment from kibana-ci Dec 9, 2021
@vitaliidm vitaliidm added the release_note:skip Skip the PR/issue when compiling release notes label Dec 9, 2021
@banderror
Copy link
Contributor

banderror commented Dec 23, 2021

Brainstormed the issue described above with @vitaliidm @xcrzx @banderror. Ideas:

  1. Find out if it's possible to check from the route handler that its execution was cancelled by the router/kibana core.
    • done
  2. Implement our own timeout logic in the route handler. E.g. after X minutes stop executing bulk updates and return a response with the details of what rules have been updated, failed, skipped.
    • done
  3. Rate limiting on the route level (per Kibana instance per route). It should reject requests if the limit has been exceeded.
    • done
    • @banderror create ticket for supporting rate limiting at the Core level
  4. Chat with Kibana Core team on the problem and our ideas.
  5. UI should not allow the user to send a lot of concurrent bulk actions - meaning that it should be blocking for the user until the bulk request finishes (blocking modal window, blocking flyout).
  6. Do a perf audit (using APM etc) of the existing RulesClient.update. Try to identify slow methods/requests and opportunities for optimization.
  7. Optimize performance of the existing RulesClient.update.
    • implement bulk update and bulk get methods in RulesClient
    • @banderror create a ticket for supporting ES's refresh parameter in RulesClient
  8. Implement efficient bulk operations in the RulesClient: bulk get, bulk update. (existing ticket Bulk create, update, delete abilities for the rules client #53144)
  9. Process bulk actions using a queue based on Task Manager.

@vitaliidm
Copy link
Contributor Author

@elasticmachine merge upstream

@vitaliidm
Copy link
Contributor Author

@banderror , @xcrzx
A few updates:

  1. Find out if it's possible to check from the route handler that its execution was cancelled by the router/kibana core.

Kibana request has events.completed$ observable, that will get fired when request either aborted or completed. I've implemented handling of it in one of the commits: ffb568c
So now, abortSignal is getting passed to promisePool util and it will abort processing executions further, once request is aborted/completed.

  1. Rate limiting on the route level (per Kibana instance per route). It should reject requests if the limit has been exceeded.
    There is already similar method in endpoint directory of security_solution plugin (https://github.com/elastic/kibana/blob/master/x-pack/plugins/security_solution/server/endpoint/routes/limited_concurrency.ts#L45)

Looks like it's what we need, it will return 429, when number of requests exceeds limit.
But currently it works in a way that applied limit is shared between all routes, that configured to have rate limit.
For example, if we have 2 routes configured with limit 10. If first one takes 10 requests, a second one won't be accessible, and will return 429.

So, this method needs a refactoring:

  1. Rate limit can be configured per route, per tag(?)
  2. Limit should apply to a single route, not be shared with all configured routes

I working on refactoring and additional checking/testing. Would it be preferable to include rate limit part in this PR or maybe in a separate?

@vitaliidm
Copy link
Contributor Author

@elasticmachine merge upstream

@vitaliidm vitaliidm requested a review from a team as a code owner January 5, 2022 11:44
@vitaliidm
Copy link
Contributor Author

vitaliidm commented Jan 5, 2022

@xcrzx , @banderror

  • added rate limit for bulk edit API
  • moved limitedConcurrency to common plugin area from endpoint
  • removed unused code in endpoint directory

Checked as well APM transactions

Screenshot 2022-01-04 at 17 45 01

In patchRules method, rulesClient.update does 4 requests under the hood: authorization check, getting rule and 2 requests for update.
As spotted during heavy load, all of these 4 requests could start to take much longer time:

Screenshot 2022-01-05 at 12 19 45

With limited rate as 5, for ~ 600 rules, timing for requests are following:

http.requests: .............................................. 5
http.codes.200: ............................................. 1
http.responses: ............................................. 5
http.codes.500: ............................................. 4
http.response_time:
  min: ...................................................... 21323
  max: ...................................................... 24504
  median: ................................................... 23630.3
  p95: ...................................................... 24594.7
  p99: ...................................................... 24594.7

I see median request time increased to 23s from ~15s, in comparison with a single request. Visually, kibana seems slower under this load. But can take requests, and not crushing as previously, without rate limit

Further optimization could be:

  • reduce further number of parallel requests, when updating rules. From 50 to 30, 25, etc.

Having #53144 implemented, looks as the most preferable option in future.

@xcrzx
Copy link
Contributor

xcrzx commented Jan 5, 2022

With limited rate as 5, for ~ 600 rules, timing for requests are following:
http.codes.500: ............................................. 4

@vitaliidm Why the response code is 500 with the rate limit enabled? Shouldn't it be 429?

@vitaliidm
Copy link
Contributor Author

With limited rate as 5, for ~ 600 rules, timing for requests are following:
http.codes.500: ............................................. 4

@vitaliidm Why the response code is 500 with the rate limit enabled? Shouldn't it be 429?

@xcrzx , I sent only 5 requests, which within rate limit. 500 error was because of conflict for some rules updates

If I start run with 10 requests in parallel for example:

--------------------------------------
Metrics for period to: 14:33:40(+0000) (width: 0.022s)
--------------------------------------

vusers.created_by_name.bulk edit: ........................... 10
vusers.created.total: ....................................... 10
http.request_rate: .......................................... 10/sec
http.requests: .............................................. 10


--------------------------------------
Metrics for period to: 14:33:50(+0000) (width: 0.019s)
--------------------------------------
http.codes.429: ............................................. 5
http.responses: ............................................. 5
http.response_time:
  min: ...................................................... 9718
  max: ...................................................... 9735
  median: ................................................... 9801.2
  p95: ...................................................... 9801.2
  p99: ...................................................... 9801.2


--------------------------------------
Metrics for period to: 14:34:20(+0000) (width: 5.446s)
--------------------------------------

http.codes.500: ............................................. 5
http.responses: ............................................. 5
http.response_time:
  min: ...................................................... 33041
  max: ...................................................... 38486
  median: ................................................... 36691.5
  p95: ...................................................... 37432.7
  p99: ...................................................... 37432.7

5 will be completed with 429
5 with 500

also, it's captured in functional test https://github.com/elastic/kibana/pull/120472/files#diff-033a39080b5edd054aecd3ddfa4a06fc9d7e145e71bda681a8bd3647efb21956R303

Copy link
Contributor

@xcrzx xcrzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the _bulk_action endpoint against concurrent load locally; everything seems to work fine, and Kibana doesn't hang anymore. Request abortion also works as expected. Thank you, Vitalii 👍

I added a small comment about unsubscription, not sure how it should work tbh, so maybe require further investigation.


// subscribing to completed$, because it handles both cases when request was completed and aborted.
// when route is finished by timeout, aborted$ is not getting fired
request.events.completed$.subscribe(() => abortController.abort());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completed$.subscribe() returns an unsubscribe method which contains clean up logic. Do we need to call it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be fine and don't need to unsubscribe.
Under the hood completed$

    const completed$ = merge<void, void>(finish$, aborted$).pipe(shareReplay(1), first());

If I not mistaken, first operator will take care of unsubscribing

@vitaliidm
Copy link
Contributor Author

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
securitySolution 243.9KB 244.5KB +674.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @vitaliidm

Copy link
Member

@pzl pzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving to satisfy onboarding team block here

@vitaliidm vitaliidm merged commit 7dfad91 into elastic:main Jan 6, 2022
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jan 6, 2022
gbamparop pushed a commit to gbamparop/kibana that referenced this pull request Jan 12, 2022
vitaliidm added a commit to elastic/security-docs that referenced this pull request Mar 7, 2022
mergify bot pushed a commit to elastic/security-docs that referenced this pull request Mar 7, 2022
jmikell821 pushed a commit to elastic/security-docs that referenced this pull request Mar 7, 2022
…1651)

Adds edit action to rules _bulk_action API

Adresses: elastic/kibana#120472

Preview [here](https://security-docs_1317.docs-preview.app.elstc.co/guide/en/security/master/bulk-actions-rules-api.html).

(cherry picked from commit 878cb38)

Co-authored-by: Vitalii Dmyterko <92328789+vitaliidm@users.noreply.github.com>
@vitaliidm vitaliidm deleted the security-solution/bulk-edit-api branch March 4, 2024 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Rule Management Security Solution Detection Rule Management release_note:skip Skip the PR/issue when compiling release notes Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Security Solution][Detections] Server-side implementation of bulk editing of detection rules
8 participants