[Security Solution] Smart limits for the package with prebuilt rules #187645
Labels
8.18 candidate
Feature:Prebuilt Detection Rules
Security Solution Prebuilt Detection Rules area
Team:Detection Rule Management
Security Detection Rule Management Team
Team:Detections and Resp
Security Detection Response Team
Team: SecuritySolution
Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
v8.17.0
v8.18.0
Epics: https://github.com/elastic/security-team/issues/1974 (internal), #174168
Summary
Recently we had an incident in Serverless where Kibana instances would crash with an OOM because of an installation of the
security_detection_engine
Fleet package that Security Solution uses to distribute prebuilt detection rules. Fleet loads whole packages into memory before installing their assets, and this package had become too big for that. The incident has been mitigated by temporarily decreasing the number of assets in the package by ~50%. However, this is a short-term measure that we cannot keep for a long time, because we won't be able to release Milestone 3 of the prebuilt rule customization feature with the current limit of 2 versions per rule in the package.Before we can release Milestone 3, we will need to increase back the number of versions per rule we ship in the package. In general, the more versions we ship, the better is the UX for upgrading prebuilt rules; the fewer versions we ship, the lighter is the package which also positively affects the UX and increases reliability.
Our goal is to find a balance between reliability and good UX and achieve both. For that, we need to come up with smart and efficient limits for the package with prebuilt rules.
Ideas
Total limits for the package as a whole:
Per rule limits:
<= X
number of versions no matter what. Exclude older versions and keep newer ones.now - X days
.<= X
versions created within last 3 months,<= Y
within last 6 months,<= Z
within last 12 months, etc;X < Y < Z < ...
, e.g.4 < 6 < 7
. Time ranges grow exponentially while limits grow slower than that: logarithmically or linearly.X
of them within aY
time window. E.g. it could be more than2
per each3 months
. This could help prevent some "noisy" rules (in terms of the frequency of updates to them) from "eating" too much space of the package, as well as evicting older versions of a noisy rule by newer versions of the same rule.Todo
The text was updated successfully, but these errors were encountered: