-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution] max_signals
validation follow up fixes
#182643
Conversation
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
Pinging @elastic/security-solution (Team: SecuritySolution) |
Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management) |
@jpdjere Does this mean that 1000 would become a hard-coded limit, regardless of the value of the global config setting Also would it be possible to tweak the error messages in the images above? We don't really use the name
|
Re: the 1000 limit via API validation - is it definitely the case that users can't set |
function postprocessErrorString(str: string): string { | ||
// Remove the `[request body]` prefix added by Zod for request validation errors | ||
return str.replace(/\[request body\]:/g, ''); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to remove the [request body]
part of the string? How can we distinguish errors in the body
from errors in the query
part of the request?
@@ -200,7 +200,7 @@ export type AlertsIndexNamespace = z.infer<typeof AlertsIndexNamespace>; | |||
export const AlertsIndexNamespace = z.string(); | |||
|
|||
export type MaxSignals = z.infer<typeof MaxSignals>; | |||
export const MaxSignals = z.number().int().min(1); | |||
export const MaxSignals = z.number().int().min(1).max(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to be coordinated with a detection-rules release as some default to 10k as well, right?
Or do the pre-built rules flow through a different part of the API?
I'm sure you're tracking this, so don't mind me....just saw these errors with my rules this morning so that's how If found myself here 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great question, I didn't know we had detection rules with that high of a max_signals
count. That could potentially cause problems when trying to duplicate rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @spong for this catch 🙏
Additionally, the way it's implemented is a breaking change in the API. If the value is higher than 1000, attempts to create or update a rule will fail with 400, which wasn't the case before. There shouldn't be a max limit at the schema level, we should calculate the final value of max_signals to use in rule executors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great question, I didn't know we had detection rules with that high of a max_signals count. That could potentially cause problems when trying to duplicate rules.
Yep, I think rule installation and upgrade will be broken as well for those prebuilt rules that have this value higher than 1000.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there are 18 rules in the detection-rules repo that have max_signals set to 10k.
Do I get it right that even if we don't merge this PR, in the default Kibana configuration where xpack.alerting.rules.run.alerts.max == 1000
all these rules, including the Endpoint Security one, will be executing with warnings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be the expected action for rule executions with max_signals
> 1000 now so those 18 rules would run with warnings out of the box
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Not a strong opinion, but I think this UX might not be the most optimal. If the expectation from the TRaDE team is that those 18 rules should promote as many external alerts as possible (hense the 10k setting), I would suggest to:
- Either override the system's max limit with the rule's max_signals, for these rules. This could be done via an additional rule parameter
max_signals_force: boolean
and an additional checkbox in the UI. (not sure that it's a good idea). - Or not generate a Warning status if
max_signals > xpack.alerting.rules.run.alerts.max
AND it's a prebuilt rule. (looks like a good tradeoff to me). - Or maybe set the
max_signals
value for those 18 rules to 1000, if the TRaDE team would be ok with that. (might be not an option if the goal is to promote as many external alerts as possible and let users increase thexpack.alerting.rules.run.alerts.max
setting for that).
Maybe there's something else we could do. But I think we should try to make sure rules work out of the box with default settings. The Endpoint Security rule is the most basic one, it is installed and enabled automatically with Elastic Defend, and it would be weird to see it not succeeding, unless the user changes some system setting in the Kibana config, for all Kibana instances in their cluster.
@jpdjere @dplumlee Could we please sync on this in the simplified protections channel with the TRADE team, Kseniia and Alex and see what they have to say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think options 2 or 3 would be my preferences off the top of my head. Something to consider is that even if we don't display a warning, the rule would still be hard capped by the xpack.alerting.rules.run.alerts.max
config value out of the box and, should the rule ever reach that value (however unlikely), would throw an error from the alerting side of things letting us know we hit it. With that in mind, maybe option 3 would be best, but we'd need to check with TRaDE team for the intent behind that 10k setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like option 3 is viable based on Justin's response:
As an example if an environment had say 100k hosts and an upstream integration, such as elastic endpoint happens to generate a batch of alerts over that time, we want to allow them all to match
As you can see, the larger the deployment, the greater the likelihood of hitting that capacity.
So I guess my vote goes to the 2nd option + documenting this in multiple places + maybe even showing a warning on the Rule Details page (not a warning rule status, but a static callout or something like that with a link to the public docs). I added a topic to discuss this to the Simplified Protection's agenda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: #173593 (comment)
@@ -200,7 +200,7 @@ export type AlertsIndexNamespace = z.infer<typeof AlertsIndexNamespace>; | |||
export const AlertsIndexNamespace = z.string(); | |||
|
|||
export type MaxSignals = z.infer<typeof MaxSignals>; | |||
export const MaxSignals = z.number().int().min(1); | |||
export const MaxSignals = z.number().int().min(1).max(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great question, I didn't know we had detection rules with that high of a max_signals
count. That could potentially cause problems when trying to duplicate rules.
@@ -200,7 +200,7 @@ export type AlertsIndexNamespace = z.infer<typeof AlertsIndexNamespace>; | |||
export const AlertsIndexNamespace = z.string(); | |||
|
|||
export type MaxSignals = z.infer<typeof MaxSignals>; | |||
export const MaxSignals = z.number().int().min(1); | |||
export const MaxSignals = z.number().int().min(1).max(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the argument for changing this to max of 1000? In addition to Garrett's comment, I'm not entirely sure we can't still set xpack.alerting.rules.run.alerts.max
to something greater than 1000, I think we just "don't support it" as in we can't guarantee things don't break on your cluster. Is there something else that would limit it outside of that config value? I get the argument that it's weird to give users the ability to put insanely big numbers if we just use the minimum of 2 values but we could do that same kind of wacky stuff with other form components too that wouldn't actually end up having an impact on the rule. And we also have the same workflow if the config value was set to 10 and the user entered a value like 50 - we would just use the value they didn't enter.
The initial reason we uncapped the limit was to make it easier to still create/edit rules with the default or existing values even after the alerting config value had changed. Now users can change unrelated fields in existing rules without having to go in and also modify max_signals
in order to get the rule update to pass API validation. I also understand that setting it that high makes it a very rare edge case so I don't care too much about it (outside of it potentially causing issues with prebuilt rules), but just wanted to explain the genesis of why it was originally set that way!
@@ -200,7 +200,7 @@ export type AlertsIndexNamespace = z.infer<typeof AlertsIndexNamespace>; | |||
export const AlertsIndexNamespace = z.string(); | |||
|
|||
export type MaxSignals = z.infer<typeof MaxSignals>; | |||
export const MaxSignals = z.number().int().min(1); | |||
export const MaxSignals = z.number().int().min(1).max(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @spong for this catch 🙏
Additionally, the way it's implemented is a breaking change in the API. If the value is higher than 1000, attempts to create or update a rule will fail with 400, which wasn't the case before. There shouldn't be a max limit at the schema level, we should calculate the final value of max_signals to use in rule executors.
@dplumlee @banderror @joepeeples I had misunderstood the behaviour of Kibana: I thought that 1000 was a hard limit in the rule executors, which overrided both the rule's value and the Kibana config. I understand now that it's just the default Kibana config and that the config can be set to a value higher than that. Given the feedback here, I'll discard setting the validation on the API schema layer.
I'll give this a try with an actual prebuilt rule.
@banderror
then calculated value in the rule executor should be 1000? Knowing now that there are Prebuilt Rules with a max_signals of 10,000, this doesn't look very clean now. |
@jpdjere I think he's saying we already calculate the effective "max signals" during rule execution from logic added in the previous PR. Right now we take the minimum of |
Thanks @dplumlee! Update on testing a prebuilt rule
First, as soon as I install a rule with a default Second, I indexed 10,000 documents to force the rule to create that amount of alerts, but I got an unexpected result: it indexed only 100 rules (not 1,000 and not 10,000). I did it two times, and each time it created 100 alerts: I'll delve a little deeper to understand what's going on here. |
Thank you @jpdjere!
I meant what Davis says in #182643 (comment). For a 10k example it should be: rule value: 10000 |
Ok, so, interesting discovery about the amount of alerts that are actually created. As I mentioned in this comment I have been testing the Ransomware - Detected - Elastic Endgame prebuilt rule, that comes with 10,000 as value for With the current behaviour of the app, the amount of alerts that we thought should be created is 1,000, which is the default value of the Digging into the code I found out: 1. In Line 150 in 810e820
That is passed over to the rule executor: Lines 408 to 441 in 810e820
2. In the query executor, the kibana/x-pack/plugins/security_solution/server/lib/detection_engine/rule_types/query/query.ts Lines 67 to 97 in 810e820
3. The method Lines 91 to 107 in 810e820
Lines 160 to 176 in 810e820
The problem that I see here is that what is passed to In the screenshot above, you can see that 4. Back in Lines 20 to 36 in aa1df6d
I'm trying to understand if this is by design or I bumped into a bug here. Maybe @marshallmain or @vitaliidm or someone else from Detection Engine team can guide us here. It looks from the code above that any value of |
@jpdjere We have an API integration test that tries to ensure that max_signals values greater than 100 are respected, can you verify if that test is working on this branch? It's expected that a maximum of 100 alerts would be created on the first iteration of the loop in |
@marshallmain Thanks for your reply. Yes, indeed that bug was the issue. I was running a script to index my docs and all had identical timestamps, randomizing their secs and milisecs did the trick. So I guess that leaves us back with the question of how we want to handle this issue. @banderror's three alternatives:
are probably a good starting point to resume the conversation with the TRaDE team and Product in the next Simplified Protections meeting next week. I will revert this PR back to draft since the changes are not what we want, keeping it open because of the discussion. |
Closing this PR, see #173593 (comment) |
Follow up to: #179680
Summary
max_signals
as editable field. [Security Solution] Allow users to edit max_signals field for custom rules #179680Changes
max_signals
to 1000 via Zod validation. This aligns the API contract with Kibana's behaviour: even if a user setsmax_signals
to a value over 1000, only a maximum of 1000 alerts are generated. This change aligns the contract with the actual behaviour.[request body]
prefix.Before:
After:
Docs note
FYI @joepeeples
If we en up merging this PR with the API validation for 1000 being the max value for
max_signals
, we'll have to revisit the docs that are being added in this PR, which explains that values higher than 1000 will be ignored by Kibana, which will only generate a maximum of 1000 alerts.Checklist
Delete any items that are not applicable to this PR.
For maintainers