Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asset criticality alert enrichment #171241

Merged
merged 46 commits into from
Dec 22, 2023

Conversation

oatkiller
Copy link
Contributor

@oatkiller oatkiller commented Nov 14, 2023

Summary

This will allow analysts to filter alerts by the analyst-defined criticality of the related host or user.

This introduces two new kibana fields to alerts. These fields allow us to model the criticality of the alert's most relevant host and user, in context of analyst workflows.

field name type description example
kibana.alert.host.criticality_level keyword Contains an enum describing the criticality of the host, as defined by the analyst in a previously used workflow. Used by analysts to filter alerts by host criticality. 'very important'
kibana.alert.user.criticality_level keyword Contains an enum describing the criticality of the user, as defined by the analyst in a previously used workflow. Used by analysts to filter alerts by user criticality. 'very important'

Design

Analysts can assign criticality to their assets. These are stored in the asset criticality index (.asset-criticality.asset-criticality-${space}). This PR will detect user names and host names in events, query the asset criticality index for the associated criticality (if any) and then add that value to the resulting alert under kibana.alert.host.criticality_level or kibana.alert.user.criticality_level

Design Exploration

What if we want to filter by other criticalities?

We may want to allow analysts to filter on the criticality of other entities. For example, alerts can refer to multiple users and multiple hosts. Also, we may introduce other types of entities, e.g. IP addresses, files, registry keys.

If we were to follow the same approach as we did here, that could lead to a mapping explosion if we had a lot. However:

  • we don't think we'll add very many classes of entities. less than 8 more.
  • we don't necessarily want to filter alerts by other ones.

Using keyword vs nested or flattened

We could use a flattened or nested field here to store more criticalities and have them all indexed. However we don't need to sort or order alerts by their entity's criticalities so we don't see a need.

KQL support

The fields we proposed here are of keyword type. This type of field works intuitively for KQL and since analyst workflows are the focus of this change, this lines up well.

How to test

until asset criticality UI is not enabled let's create a mapping for asset criticality and index some docs

PUT .asset-criticality.asset-criticality-default
{
  "mappings": {
    "properties": {
      "id_value": {
        "type": "keyword"
      },
      "id_field": {
        "type": "keyword"
      },
      "criticality_level": {
        "type": "keyword"
      },
      "@timestamp": {
        "type": "date"
      },
      "updated_at": {
        "type": "date"
      }
    }
  }
}
POST .asset-criticality.asset-criticality-default/_doc
{
  "id_field": "user.name",
  "id_value": "User 1",
  "criticality_level": "very important",
  "@timestamp": 1701860267617
}

POST .asset-criticality.asset-criticality-default/_doc
{
  "id_field": "host.name",
  "id_value": "Host 3",
  "criticality_level": "normal",
  "@timestamp": 1701860267617
}

Then create rules, which have alerts from events with host.name or user.name which match asset criticality documents.

Then add fields to the alerts table, you should see some values in those columns

Screenshot 2023-12-07 at 11 24 24

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk Probability Severity Mitigation/Notes
Multiple Spaces—unexpected behavior in non-default Kibana Space. Low High Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces.
Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. High Low Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure.
Code should gracefully handle cases when feature X or plugin Y are disabled. Medium High Unit tests will verify that any feature flag or plugin combination still results in our service operational.
See more potential risk examples

For maintainers

@oatkiller oatkiller force-pushed the asset-criticality-alert-enrichment branch 2 times, most recently from 95bb2e1 to 577bc3d Compare November 28, 2023 15:54
@nkhristinin nkhristinin force-pushed the asset-criticality-alert-enrichment branch from 3777033 to 12aa184 Compare November 30, 2023 09:54
@nkhristinin
Copy link
Contributor

@elasticmachine merge upstream

@nkhristinin
Copy link
Contributor

@elasticmachine merge upstream

@nkhristinin
Copy link
Contributor

@elasticmachine merge upstream

@nkhristinin
Copy link
Contributor

@elasticmachine merge upstream

@nkhristinin
Copy link
Contributor

@elasticmachine merge upstream

@nkhristinin
Copy link
Contributor

@elasticmachine merge upstream

Copy link
Contributor

@e40pud e40pud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -23,7 +23,7 @@ export const applyEnrichmentsToEvents: ApplyEnrichmentsToEvents = ({
enrichmentsList,
logger,
}) => {
const mergedEnrichments = mergeEnrichments(enrichmentsList);
const mergedEnrichments: EventsMapByEnrichments = mergeEnrichments(enrichmentsList);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need : EventsMapByEnrichments here? it looks like mergeEnrichments return type is explicitly set to EventsMapByEnrichments, so the type should be deduced automatically?

@nkhristinin
Copy link
Contributor

@elasticmachine merge upstream

Copy link
Contributor

@rylnd rylnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made another pass here; enrichment additions make sense and look good (although I can't speak to performance here; how many more ES queries does this add to each rule execution?)

My broad concern is still the fields in which we're placing these enrichments: I made this comment a few weeks ago that wasn't addressed. If we define e.g. host.risk.criticality_level to generally be "the criticality level of the relevant entity at the time the document was created," that should cover both of these use cases and eliminate the need for these additional non-ECS fields.

Comment on lines 30 to 31
// gets just the events we will enrich
const eventsWithField = events.filter((event) => getEventValue(event, mappingField.eventField));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oatkiller were these meant to stick around, or were these just notes for you? Typically when I see comments like this in the code, it's making up for an unhelpful variable name. IMO these variables are good at describing what they are, but not why they are. Perhaps the comments wouldn't be needed if the intention was more clear?

Suggested change
// gets just the events we will enrich
const eventsWithField = events.filter((event) => getEventValue(event, mappingField.eventField));
const eventsToEnrich = events.filter((event) => getEventValue(event, mappingField.eventField));

@@ -26,6 +31,7 @@ export const makeSingleFieldMatchQuery: MakeSingleFieldMatchQuery = ({ values, s
query: {
bool: {
should: shouldClauses,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nkhristinin I know this isn't part of the PR, but I was exploring how best to construct a query like this for asset criticality, and found that wrapping the should in a filter query has the same behavior with the performance benefit of not scoring the results.

@@ -17,6 +17,7 @@ export default createTestConfig({
'/testing_regex*/',
])}`, // See tests within the file "ignore_fields.ts" which use these values in "alertIgnoreFields"
`--xpack.securitySolution.enableExperimental=${JSON.stringify([
'entityAnalyticsAssetCriticalityEnabled',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these flags actually apply in serverless? If not, they should be deleted. If they do, but serverless doesn't provide an actual way to set these in production, then they should also be deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But as I understand those tests will run in a serverless environment anyway, which allows us to test and integrate our functionality during development, and catch some serverless issues earlier.

If we remove the flag and skip tests, then in time when we enable it and try to run it serverless we can have some problems. For example we didn't have serverless tests for risk engine enablement, and then there were errors for permissions

const previewAlerts = await getPreviewAlerts({ es, previewId });
const fullAlert = previewAlerts[0]._source;
if (!fullAlert) {
return expect(fullAlert).to.be.ok();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the early return needed here? At first I thought it was so that line 645 doesn't throw a ReferenceError, but shouldn't the optional chaining prevent that anyway?


describe('with asset criticality', async () => {
before(async () => {
await esArchiver.load('x-pack/test/functional/es_archives/asset_criticality');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these tests hinge on a specific foreign key across these data sources, it would be good to call that out (similar to how @marshallmain does with ID here.

Comment on lines 72 to 75
.map((result) => (result as PromiseFulfilledResult<EnrichmentType[]>)?.value);

// search hits.
const enrichments = flatten(enrichmentsResults);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map((result) => (result as PromiseFulfilledResult<EnrichmentType[]>)?.value);
// search hits.
const enrichments = flatten(enrichmentsResults);
.flatMap((result) => (result as PromiseFulfilledResult<EnrichmentType[]>)?.value);

(you'd also need to rename enrichmentResults to just enrichments, but I couldn't include that line in this comment)

Comment on lines 35 to 38
jest.mock('./enrichment_by_type/asset_criticality', () => ({
...jest.requireActual('./enrichment_by_type/asset_criticality'),
doesAssetCriticalityIndexExist: jest.fn(),
}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this is an indication that doesAssetCriticalityIndexExists (sic) should live independent to the enrichments themselves: we're trying to mock a file that we also want to test.

@nkhristinin
Copy link
Contributor

My broad concern is still the fields in which we're placing these enrichments: I made this comment a few weeks ago that wasn't addressed if we define e.g. host.risk.criticality_level to generally be "the criticality level of the relevant entity at the time the document was created," that should cover both of these use cases and eliminate the need for these additional non-ECS fields.

@oatkiller @kobelb I know that you have some discussion about field names, can you provide here some additional context?

@nkhristinin
Copy link
Contributor

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 11.3MB 11.3MB +2.6KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Member

@maryam-saeidi maryam-saeidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nkhristinin nkhristinin merged commit cf9ff4d into elastic:main Dec 22, 2023
40 checks passed
@kibanamachine kibanamachine added v8.13.0 backport:skip This commit does not require backporting labels Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants