[Security Solution][Detections] Inconsistent handling of gap detection and max signals #100181
Labels
8.16 candidate
bug
Fixes for quality problems that affect the customer experience
Feature:Detection Alerts
Security Solution Detection Alerts Feature
Feature:Gap Remediation
impact:low
Addressing this issue will have a low level of impact on the quality/strength of our product.
Team: CTI
Team:Detection Engine
Security Solution Detection Engine Area
Team:Detections and Resp
Security Detection Response Team
Team: SecuritySolution
Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
The general idea behind gap detection is to run a rule over multiple time ranges in a single alert execution task if too much time has passed since the last rule execution. Each time range the rule searches is generally treated as an independent and complete execution semantically, so each time range can create up to
maxSignals
signals. Since we limit gap detection to 4 extra rule intervals, this means a rule that hasn't run in a long time could generate up to 5*maxSignals signals. However, we limit each time range to 1*maxSignals so that a single time range can't create all the signals and crowd out signals from other time ranges.Threat match rules handle this slightly differently. Each
slicedChunk
of the threat list can contribute up tomaxSignals
number of signals, and since the chunks are evaluated in parallel it's possible to exceedmaxSignals
. In addition, sincesearchAfterAndBulkCreate
searches all provided time ranges and sums the created signals into a single counter which the threat match logic then compares tomaxSignals
(https://github.com/elastic/kibana/blob/master/x-pack/plugins/security_solution/server/lib/detection_engine/signals/threat_mapping/create_threat_signals.ts#L140), it's possible for one time range to hitmaxSignals
and crowd out signals from the other time ranges.To address this, we should fully move the gap remediation logic out of
searchAfterAndBulkCreate
and have the executor functions handle a single time range. The top-level executor code in signal_rule_alert_type.ts would then be responsible for calling the appropriate executor multiple times, once for each time range computed by the gap detection logic. This will likely incur a slight additional performance hit (exact magnitude TBD) for threat match rules when a gap is detected, since the full threat list will now be fetched once per time range instead of once overall in the executor. The benefit is that it prevents signals from one time range from crowding out signals in other time ranges. Extracting this logic also makes it possible to share with other rule types.To handle the possibility of maxSignals being exceeded, we would have to recombine the parallel searches before building and indexing the signals. As we pull logic out of
searchAfterAndBulkCreate
we may find that it is simpler to not share the function as a whole between threat match and KQL rules and only share components likebulkCreate
andsingleSearchAfter
.The text was updated successfully, but these errors were encountered: