Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Rule's EQL query validation fails unexpectedly when fields missing #178611

Closed
Tracked by #201502
jpdjere opened this issue Mar 13, 2024 · 17 comments
Closed
Tracked by #201502
Assignees
Labels
8.16 candidate bug Fixes for quality problems that affect the customer experience Feature:Rule Creation Security Solution Detection Rule Creation workflow Feature:Rule Edit Security Solution Detection Rule Editing workflow fixed impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.16.0

Comments

@jpdjere
Copy link
Contributor

jpdjere commented Mar 13, 2024

Epics: https://github.com/elastic/security-team/issues/1974 (internal), #174168
Kibana version: Latest and all previous

Summary

Describe the bug:

When creating or editing an EQL rule, and typing in a valid query, we currently run validation every 300ms.

This validation checks that the fields mentioned in the EQL query exist in a document that should be already indexed into one of the rule's Index Patterns. If it's missing, we show a (rather cryptic) toaster error and prevent from saving the rule.

image

Steps to reproduce:

  1. On a clean Kibana and ES, navigate to Rule Creation and select EQL rule.
  2. Copy the EQL query file where host.os.type == "linux" into the query field.

Expected behavior:

  • Validation shouldn't throw an error, mark the field as containing an error or block the saving of the rule.
  • If the validation fails, show a Warning toaster explaining that no such fields exist.

Provide logs and/or server output (if relevant):

An error occurred while validating your EQL query

index_not_found_exception Caused by: verification_exception: Found 1 problem line -1:-1: Unknown index [*,-*] Root causes: verification_exception: Found 1 problem line -1:-1: Unknown index [*,-*]

Any additional context:

This becomes especially important in the context of the Prebuilt Rule Customization epic, since we will allow users to customize their EQL prebuilt rule. It is almost a certainty that users will install EQL rules that have queries that mention fields for which they have no data load, which will block the customization of the field (even for other fields). This is bad UX and should be changed.

@jpdjere jpdjere added bug Fixes for quality problems that affect the customer experience triage_needed Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area labels Mar 13, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)

@yctercero
Copy link
Contributor

I think this might be on detection engine side (though will happily accept the help). In thinking how to fix this, perhaps warning the user that a result of a field missing, the rule will error and could be very noisy - see https://github.com/elastic/security-team/issues/7871. That issue was closed because a different one is being tracked but we weren't able to come to decision on a good way of mitigating the behavior a rule exhibits when validation fails because a document is not yet written that would contain the field.

@rylnd
Copy link
Contributor

rylnd commented Mar 14, 2024

If I'm understanding correctly, it sounds like we now want rules to be editable independent of any underlying data. EQL validation breaks this pattern, since it requires an index (and potentially data) to work. Required fields also doesn't (currently) allow this behavior, and I imagine other validations might also cause issue.

Should we consider some app-level "ignore rule validations" configuration to allow all of these validations to a) still function and give info but b) not prevent rules from being edited/created? Ignoring all validations might be a little too crazy, but perhaps we could make a slightly more specific "ignore rule data validations" (if that line between "data" and "other" validations is clear-cut) configuration?

@jpdjere I'll have to read through your RFC, but: by not requiring a valid field, we're effectively allowing a (potentially) invalid field. How do we intend that to impact UX, and do we have plans to e.g. audit code paths that might not be expecting invalid rules? Rule Preview, for example, isn't going to provide any useful information if there's no source data (or an invalid query).

@rylnd
Copy link
Contributor

rylnd commented Mar 14, 2024

@jpdjere I looked through the prebuilt customization RFC and wasn't able to find any specific references to the "data validation" issue, so happy to discuss more here.

If editing rules independent of source data is the requirement (and it sounds like it is), we should probably create a dedicated issue to discuss all the consequences of that change (what features currently rely on source data, and how they might behave in the absence of data).

In terms of this ticket: I think we need a little product/UX guidance on how we convey this situation to the user. It does look like the form hook lib has a concept of validation warnings, so we could potentially leverage that, but we still need to decide whether this should only be in the case of no source data, and how exactly to present to the user "We can't validate this query, so you're on your own."

@jpdjere
Copy link
Contributor Author

jpdjere commented Mar 18, 2024

@rylnd Thanks for looking into this!

we should probably create a dedicated issue to discuss all the consequences of that change (what features currently rely on source data, and how they might behave in the absence of data).

This sounds reasonable, since I am surely not aware of all the features that might be impacted by this change, or the consequences that it might have. I opened this discussion thread, we can list all of them and have the conversation there: #178867

Rule Preview, for example, isn't going to provide any useful information if there's no source data (or an invalid query).

For example, if you have a clean ES (no docs indexed) and install a Prebuilt Rule such as ESXI Timestomping using Touch Command, then go to its Rule Editing page, you can successfully run Rule Preview for it, even though there's no data for it, and the EQL validation of its data wouldn't pass. In fact, if you try to save the rule with no changes, the request fails because the EQL query validation failed, even though the user cannot even modify the query, since its an immutable (prebuilt) rule.

@marshallmain
Copy link
Contributor

For example, if you have a clean ES (no docs indexed) and install a Prebuilt Rule such as ESXI Timestomping using Touch Command, then go to its Rule Editing page, you can successfully run Rule Preview for it, even though there's no data for it, and the EQL validation of its data wouldn't pass. In fact, if you try to save the rule with no changes, the request fails because the EQL query validation failed, even though the user cannot even modify the query, since its an immutable (prebuilt) rule.

The rule preview and real rule execution do a pre-query check to see if any indices exist matching the rule's pattern, and skip the whole query if no indices are found, which is why we don't see the validation error in rule preview. Maybe using the same preflight check for the UI validation would be convenient for displaying a warning in the UI if the indices don't exist? There could be some benefit to having consistency between the rule create/edit UI and the rule executions. If the indices don't exist -> warning in create/edit UI, warning at rule execution time. If the indices exist and a field in the query doesn't -> error in create/edit UI, error at rule execution time.

@jpdjere
Copy link
Contributor Author

jpdjere commented Apr 9, 2024

@marshallmain

If the indices don't exist -> warning in create/edit UI, warning at rule execution time.
If the indices exist and a field in the query doesn't -> error in create/edit UI, error at rule execution time.

But our problem here, in the scenario where indices exist and a field in the query doesn't, is that the create/edit UI doesn't allow the user to save if an error is thrown. And this will cause many users to be unable to edit their prebuilt rules until matching indices with matching data are ingested.

We strongly feel that this should be changed: the user should be able to edit and save their prebuilt rule (or any rule, for that matter) even if that validation fails.

Do you see any reasons why we shouldn't change this validation behaviour? Or any other feature that might behave incorrectly if we allow to save rules anyway?

In summary: I think the behaviour should be:

  • If the indices don't exist -> warning in create/edit UI, warning at rule execution time. (agree with you here)
  • If the indices exist and a field in the query doesn't -> warning in create/edit UI, error at rule execution time.

@banderror banderror added needs product Feature:Rule Creation Security Solution Detection Rule Creation workflow Feature:Rule Edit Security Solution Detection Rule Editing workflow labels Apr 9, 2024
@banderror
Copy link
Contributor

banderror commented Apr 9, 2024

Moved the general discussion to #180407. Let's chat about EQL specifics here.

I think @rylnd had a good point:

If I'm understanding correctly, it sounds like we now want rules to be editable independent of any underlying data. EQL validation breaks this pattern, since it requires an index (and potentially data) to work.

@jpdjere We would need to find a way to validate that an EQL query is syntactically valid, regardless of the data stored in Elasticsearch.

@marshallmain
Copy link
Contributor

marshallmain commented Apr 9, 2024

In summary: I think the behaviour should be:

  • If the indices don't exist -> warning in create/edit UI, warning at rule execution time. (agree with you here)
  • If the indices exist and a field in the query doesn't -> warning in create/edit UI, error at rule execution time.

Once the indices exist, I expected that the appropriate fields for the query should exist and if they don't then an error would be appropriate. Do we see customers in situations where maybe some of the indices exist, but don't have all the fields in the query?

I could see that happening for data sources that use dynamic mappings (but dynamic mappings are really not advisable, so hopefully this is rare). I could also imagine in some cases maybe the rule references a variety of data sources, one data source is set up but does not contain all the fields and another data source that will contain all fields in the query is not set up yet. In that case, I'd consider the rule index pattern to be overly broad and include data sources that can't possibly match the query, since a referenced data source is missing some fields, so the real root of the problem is the index pattern but the error would appear with the EQL query. I was thinking that this scenario would be rare as well.

Perhaps a robust solution would be to add a confirmation modal when users attempt to save a rule that has validation errors like this that do not violate the rule schema. Something like "This rule has <validation errors/references a data view that doesn't exist/whatever other checks we want>, save anyway?" Then we could still present the validation error that will cause an error at runtime error as an error in the create/edit UI, so the importance is conveyed to the user, but their immediate workflow can continue.

Edit: the confirmation modal is not really specific to EQL, I'll add a summary and link to #180407 for continued discussion there

@yctercero
Copy link
Contributor

++ @marshallmain 's suggestion. I think that had been floated around of adding some kind of dismissible warning in the UI letting a user know that "this rule will be noisy due to missing indices or missing fields, etc - we suggest disabling or updating rule". Obviously something way more eloquent than that :) And with Marshall's latest PR flagging these errors as userErrors for alerting's dashboards, we should stop getting pinged about them.

@banderror banderror added impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. 8.15 candidate and removed triage_needed Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area labels Apr 13, 2024
@banderror banderror removed their assignment Apr 13, 2024
@banderror banderror changed the title [Security Solution] EQL query validation fails unexpectedly when fields missing [Security Solution] Rule's EQL query validation fails unexpectedly when fields missing Apr 13, 2024
e40pud added a commit that referenced this issue Aug 15, 2024
…error type (#10181) (#190149)

## Summary

Addresses elastic/security-team#10181

This PR is a refactoring of EQL query validator:
* to separate different validation errors passed from ES. Before we
marked `parsing_exception`, `verification_exception` and
`mapping_exception` as the same error of type `ERR_INVALID_EQL`. After
these changes we will split these errors into separate ones: syntax
(`parsing_exception`), invalid EQL (`verification_exception` and
`mapping_exception`; can be split in future if needed)
* to handle missing data source as a new EQL error of type
`MISSING_DATA_SOURCE`. Before
`data.search.search<EqlSearchStrategyRequest,
EqlSearchStrategyResponse>()` call would throw an exception in case data
source does not exist and we would handle it as a failed request and
show an error toast (see relevant ticket
#178611). After these changes we
would not show a toast and handle missing data source error as other EQL
validation errors - showing an error message in the EQL query bar.

This will allow us to distinguish between different types of EQL
validation errors and will help to decide on whether certain errors are
blocking during the rule creation/editing flow
(#180407).

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [Integration: Rule
execution](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6754)
(100 ESS & 100 Serverless)
- [Cypress: Detection
Engine](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6766)
(100 ESS & 100 Serverless)

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
e40pud added a commit that referenced this issue Aug 30, 2024
…) (#191487)

## Summary

Addresses #180407
Addresses #178611

With these changes we allow user to create and update a rule even if
there are certain query bar validation error exist. Right now, we will
make any non-syntax validation errors in `EQL` and `ES|QL` rules types
to be non-blocking during the rule creation and rule updating workflows.

### Screenshot of the EQL rule creation workflow with existing
non-blocking validation errors:


https://github.com/user-attachments/assets/06b7f76c-e600-4a99-8ead-1445d429e9d3

### Screenshot of the EQL rule updating workflow with existing
non-blocking validation errors:


https://github.com/user-attachments/assets/9b35e113-b127-487b-bc23-afecf704db9d

## UPDATE

After discussing confirmation modal with @approksiu, we decided to
simplify it and show only title with generic description to avoid too be
too literal in the modal. User can see the full error description during
rule creation/editing workflows in the query bar where we show each
validation error as part of the query bar form item.

<img width="702" alt="Screenshot 2024-08-28 at 12 50 14"
src="https://github.com/user-attachments/assets/edfb791e-4e45-4fa5-8a46-c7e2772abdf9">

### Some test cases for local testing

<details>
  <summary><b>Create EQL rule with missing data source</b></summary>

#### Steps:
1. Open rules management page
2. Click create new rule button
3. Select EQL rule type
4. Set non-existing index in index patterns field
5. Add some valid EQL query (for example `any where true`)
6. Continue with other steps
7. Click create rule button

**Expected**: You will see the confirmation modal that warns user about
potentially failing rule executions. Clicking `Confirm` button will
create a rule.

</details>

<details>
  <summary><b>Create EQL rule with missing data field</b></summary>

#### Steps:
1. Open rules management page
2. Click create new rule button
3. Select EQL rule type
4. Set existing indices in index patterns field
5. Add some valid EQL query referring non-existing data field (for
example `any where agent.non_existing_field`)
6. Continue with other steps
7. Click create rule button

**Expected**: You will see the confirmation modal that warns user about
potentially failing rule executions. Clicking `Confirm` button will
create a rule.

</details>

<details>
<summary><b>Create EQL rule with syntax error in the query</b></summary>

#### Steps:
1. Open rules management page
2. Click create new rule button
3. Select EQL rule type
4. Set existing indices in index patterns field
5. Add some syntactically invalid EQL query (for example `hello world`)

**Expected**: The continue button does not allow user to proceed to the
About step due to existing syntax error.

</details>

<details>
  <summary><b>Create ES|QL rule with missing data source</b></summary>

#### Steps:
1. Open rules management page
2. Click create new rule button
3. Select ES|QL rule type
4. Add some valid ES|QL query with non-existing data source (for example
`from non-existing-index-* metadata _id, _version, _index | SORT
@timestamp`)
6. Continue with other steps
7. Click create rule button

**Expected**: You will see the confirmation modal that warns user about
potentially failing rule executions. Clicking `Confirm` button will
create a rule.

</details>

<details>
  <summary><b>Create ES|QL rule with missing data field</b></summary>

#### Steps:
1. Open rules management page
2. Click create new rule button
3. Select ES|QL rule type
4. Add some valid ES|QL query with non-existing data field (for example
`from logs-* metadata _id, _version, _index | SORT
agent.non_existing_field`)
6. Continue with other steps
7. Click create rule button

**Expected**: You will see the confirmation modal that warns user about
potentially failing rule executions. Clicking `Confirm` button will
create a rule.

</details>

<details>
<summary><b>Create ES|QL rule with syntax error in the
query</b></summary>

#### Steps:
1. Open rules management page
2. Click create new rule button
3. Select ES|QL rule type
4. Add some syntactically invalid ES|QL query (for example `hello
world`)

**Expected**: The continue button does not allow user to proceed to the
About step due to existing syntax error.

</details>

Same behaviour applies to the rule updating workflow. For example, you
can try to install one of the EQL or ES|QL rules that point to
non-existing data source or uses non-existing data field. User can still
update (add rule actions) to such installed pre-built rules.

### Checklist

Delete any items that are not applicable to this PR.

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
  * elastic/security-docs#5758
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
* [Detection Engine -
Cypress](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6831)
(100 ESS & 100 Serverless)
* [Rule Management -
Cypress](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6830)
(100 ESS & 100 Serverless)

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Vitalii Dmyterko <92328789+vitaliidm@users.noreply.github.com>
@e40pud e40pud added the fixed label Aug 30, 2024
@e40pud
Copy link
Contributor

e40pud commented Aug 30, 2024

@MadameSheema @banderror this ticket was fixed and my changes were merged.

@MadameSheema
Copy link
Member

@pborgonovi can you please take a look at this? Thanks! :)

@pborgonovi
Copy link
Contributor

pborgonovi commented Sep 3, 2024

HI @e40pud
I've validated this fix in main and here's what I got:

When attempting to save the rule, user gets a confirmation alert that there's a validation error which can lead to failure executions:

Image

Rule saved without any errors:

Image

Could you just confirm this is the expected behavior?

@e40pud
Copy link
Contributor

e40pud commented Sep 4, 2024

Thank you @pborgonovi! This is indeed the expected behaviour.

@pborgonovi
Copy link
Contributor

Perfect. Thank you @e40pud
I'm closing this bug as fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.16 candidate bug Fixes for quality problems that affect the customer experience Feature:Rule Creation Security Solution Detection Rule Creation workflow Feature:Rule Edit Security Solution Detection Rule Editing workflow fixed impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.16.0
Projects
None yet
Development

No branches or pull requests

10 participants