-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution][Detections] Handle dupes when processing threshold rules #83062
Conversation
@marshallmain Is there any way we can force this new mapping to get applied, or are we dependent on the migration work? <----- @rylnd Also, any opinions on this new approach? Thanks! |
It'll be applied automatically if you bump the version number in https://github.com/elastic/kibana/blob/master/x-pack/plugins/security_solution/server/lib/detection_engine/routes/index/get_signals_template.ts#L10. I think adding the new fields is fine with a version bump and no reindex - I would just leave the original |
x-pack/plugins/security_solution/server/lib/detection_engine/routes/index/signals_mapping.json
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a few architectural questions but nothing that should necessarily block this being merged. Code looks good and we've talked through the logic a few times, but I'd definitely appreciate @marshallmain taking another pass here too if possible 😉 .
@@ -71,6 +71,7 @@ export const buildBulkBody = ({ | |||
...buildSignal([doc], rule), | |||
...additionalSignalFields(doc), | |||
}; | |||
delete doc._source.threshold_result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this merit a comment about preventing the "meta signals pollution"? Or should we make this more implicit by moving this into a simple filtering function, for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really about persisting values that belong to the signal before the the signal
object has been created yet. I like the idea of using a filter, or even maybe architecting this a little differently so that the signal fields can be merged into the signal
object, rather than having this function create the signal
object. Will look at this in a follow-up PR.
@@ -176,7 +179,10 @@ const getTransformedHits = ( | |||
|
|||
const source = { | |||
'@timestamp': get(timestampOverride ?? '@timestamp', hit._source), | |||
threshold_count: docCount, | |||
threshold_result: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that these "temporary" fields (@timestamp
, threshold_result
) are not intended to be persisted in this format to the final signal, but instead are used to pass context from the query to the signal generation. Had you considered making this process more explicit by nesting these fields in some declarative key like searchContext
?
This seems indicative of some broader architectural issues: getThresholdSignalQueryFields
seems analogous to additionalSignalFields
except that it exists at a different level, but I could absolutely see it being moved up as additionalSourceFields
and similarly taking from _source.searchContext
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this isn't new code so definitely a NIT on the above; just curious what your thoughts are here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think I can address this with the signal properties merging code discussed above. getThresholdSignalQueryFields
is going away, per our discussion last week. Those fields are just copies of the data that can be reconstructed using signal.rule.query
and signal.rule.filter
combined with the "searchContext" contained in threshold_result
. It may be cumbersome to get them all into one searchContext
object, but I'll take a look at it. The query and filter already exist in the signal.rule
section of the signal, while threshold_result
belongs to the signal itself (it's computed after the data are aggregated). I think this can be cleaned up a bit though, so I'll see what I can do, and then run it by you. Thanks!
buildRuleMessage, | ||
}); | ||
|
||
previousSignals.aggregations.threshold.buckets.forEach((bucket: ThresholdQueryBucket) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move this logic to a function? We've got similar filter-generating functions and this file is already chonky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally!
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]Distributable file count
History
To update your PR or re-run it, just comment with: |
…d rules (elastic#83062) * Fix threshold rule synthetic signal generation * Use top_hits aggregation * Find signals and aggregate over search terms * Exclude dupes * Fixes to algorithm * Sync timestamps with events/signals * Add timestampOverride * Revert changes in signal creation * Simplify query, return 10k buckets * Account for when threshold.field is not supplied * Ensure we're getting the last event when threshold.field is not provided * Add missing import * Handle case where threshold field not supplied * Fix type errors * Handle non-ECS fields * Regorganize * Address comments * Fix type error * Add unit test for buildBulkBody on threshold results * Add threshold_count back to mapping (and deprecate) * Timestamp fixes Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
…d rules (#83062) (#84466) * Fix threshold rule synthetic signal generation * Use top_hits aggregation * Find signals and aggregate over search terms * Exclude dupes * Fixes to algorithm * Sync timestamps with events/signals * Add timestampOverride * Revert changes in signal creation * Simplify query, return 10k buckets * Account for when threshold.field is not supplied * Ensure we're getting the last event when threshold.field is not provided * Add missing import * Handle case where threshold field not supplied * Fix type errors * Handle non-ECS fields * Regorganize * Address comments * Fix type error * Add unit test for buildBulkBody on threshold results * Add threshold_count back to mapping (and deprecate) * Timestamp fixes Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
* master: [Security Solution] Exceptions Cypress tests (elastic#81759) [ML] Fix spaces job ID check (elastic#84404) [Security Solution][Detections] Handle dupes when processing threshold rules (elastic#83062) skip flaky suite (elastic#84440) skip flaky suite (elastic#84445) [APM] Fix missing `service.node.name` (elastic#84269) Upgrade fp-ts to 2.8.6 (elastic#83866) Added data streams privileges to better control delete actions in UI (elastic#83573) Improve short-url redirect validation (elastic#84366) TSVB offsets (elastic#83051) [Discover] Fix navigating back when changing index pattern (elastic#84061) [Logs UI] Polish the UI for the log entry examples in the anomaly table (elastic#82139) [Logs UI] Limit the height of the "view in context" container (elastic#83178) [Application Usage] Update `schema` with new `fleet` rename (elastic#84327) fix identation in list (elastic#84301)
…bana into add-metadata-to-node-details * 'add-metadata-to-node-details' of github.com:phillipb/kibana: [APM] ML anomaly detection integration: Displaying anomaly job results in the Transaction duration chart is not as intended (elastic#84415) Support for painless language autocomplete within monaco (elastic#80577) [Lens] Time scale ui (elastic#83904) removing beta callouts (elastic#84510) [Lens] (Accessibility) add aria-label to chart type icon (elastic#84493) Trusted Apps signer API. (elastic#83661) increase stdout max listeners for legacy logging (elastic#84497) [APM] Service overview: Add throughput chart (elastic#84439) [Discover] Unskip main functional tests (elastic#84300) Uptime overview overhaul (elastic#83406) [APM] Adjust time formats based on the difference between start and end (elastic#84470) [ML] Renaming saved object repair to sync (elastic#84311) [UsageCollection] Remove `formatBulkUpload` and other unused APIs (elastic#84313) [Visualizations] Adds visConfig.title and uiState to build pipeline function (elastic#84456) [Elasticsearch Migration] Update docs re UsageCollection (elastic#84322) TSVB field list performance issue on using annotations (elastic#84407) [Security Solution] Exceptions Cypress tests (elastic#81759) [ML] Fix spaces job ID check (elastic#84404) [Security Solution][Detections] Handle dupes when processing threshold rules (elastic#83062)
Addresses: #77256, #82534
Summary
This PR makes changes to processing of threshold rules in order to prevent duplicates when evaluating thresholds over long lookback intervals. To do so, we query for any signals that have already been created on the interval, and exclude those buckets from the search during the timespan that the signal(s) already covered.
This builds upon the work done in #82444.
To address in next PR
signal
subfields (see [Security Solution][Detections] Handle dupes when processing threshold rules #83062 (comment) and [Security Solution][Detections] Handle dupes when processing threshold rules #83062 (comment))Checklist
There are currently very few tests for threshold rules. More tests coming in follow-up PR.
For maintainers