-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution][Alerts] Replace schemas derived from FieldMaps with versioned alert schema #127218
[Security Solution][Alerts] Replace schemas derived from FieldMaps with versioned alert schema #127218
Conversation
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This is a really cool idea and I thought that these comments were very helpful.
On a slightly related note I was wondering about what you may think about versioning alert instances. On the product roadmap we have "Persistence of Investigation Time Enrichments", but I can also see it being useful for things like Host Risk Score trend of an alert etc. And conceptually we could have a record of the migrations that an alert may go through.
ALERT_RULE_TAGS, | ||
TIMESTAMP, | ||
]; | ||
export type CommonAlertFieldName800 = Values<typeof commonAlertFieldNames>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious to hear your thoughts about the naming convention here with CommonAlertFieldName800
. If we plan to associate the version numbers with the release numbers I can see it getting a little awkward with no separators for the 800
part. If we are going to bump it by one with each change then we probably don't need to start with 800. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that ending with 800
is a bit awkward. I initially wanted to use something more like CommonAlertFieldName_8_0_0
at first, but the naming convention linter complains about using underscores if the type isn't all caps. Alerting uses a similar convention in the SO migrations logic which seems to have worked so far so I adopted that instead of trying to disable the linter rule for these type names.
If we are going to bump it by one with each change then we probably don't need to start with 800
With this proposal we wouldn't bump the number by 1, instead we'd increment it to the version of Kibana that's shipping the change, e.g. the next one might be CommonAlertFieldName830
for 8.3.0 if we make changes soon. But we haven't made changes in 8.1 or 8.2 so there would never be a CommonAlertFieldName810
or 820
.
In the past we used integer version numbers, starting at 1
and incrementing on each Kibana version that shipped changes, for the legacy signals mappings and it was pretty confusing IMO. Most of the time when looking at version numbers what I wanted to now was what version of Kibana a particular version number shipped with so it would have saved time to just version the signals mappings with the Kibana version. Instead with the integer it has to be cross-referenced with Git history to figure out when it shipped.
The other possible option I considered was dropping the 800
from the name entirely and relying on the 8.0.0
in the folder path to differentiate this CommonAlertFieldName
type from possible future CommonAlertFieldName
types. I worried that that would make it too easy to accidentally import the wrong type though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty nitpicky, but I'd be in favor of dropping the suffix entirely. This naming schema will become ambiguous if there's ever a minor or patch version > 10, or once we hit 10.x. Sure, it makes it easier to import the wrong version, but import paths matter and devs should be able to navigate that. If nothing else, code reviewers should be able to mitigate this possibility.
@@ -27,7 +29,9 @@ export type PersistenceAlertService = <T>( | |||
) => Promise<PersistenceAlertServiceResult<T>>; | |||
|
|||
export interface PersistenceAlertServiceResult<T> { | |||
createdAlerts: Array<T & { _id: string; _index: string }>; | |||
createdAlerts: Array< | |||
AlertWithCommonFieldsLatest<T> & { _id: string; _index: string; [key: string]: SearchTypes } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it not be possible to include { _id: string; _index: string; [key: string]: SearchTypes }
in AlertWithCommonFieldsLatest
by default, or use a more specific T
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what's happening here is the PersistenceAlertService
takes in documents of type T
and writes them to Elasticsearch after injecting the common fields into each document - making them AlertWithCommonFieldsLatest<T>
. AlertWithCommonFieldsLatest<T>
is intended to be the type of the _source
of the written documents. When the function returns though, it also includes the _id
and _index
metadata that Elasticsearch returns so that we can make those values available in the actions context for the Cases connector, even though those metadata fields aren't part of _source
.
I think you're right that [key: string]: SearchTypes
index signature is unnecessary though so I'll remove it.
@elasticmachine merge upstream |
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
Pinging @elastic/security-solution (Team: SecuritySolution) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes that have been made to the rule_registry
plugin are fine.
Having separate types for creating new alerts versus reading/updating existing alerts makes a lot of sense to me and I see the benefit there. Additionally, I get the reason why we'd want a historical record of the alerts schema during the different versions. However, the approach that this PR implements will result in quite a few type definitions that aren't used, as only the latest types should actually be used. It'll be interesting to see how this approach works out, but it seems fine for now.
If you are adding new fields for a new release of Kibana, create a new sibling folder to this one | ||
for the version to be released and add the field(s) to the schema in that folder. | ||
|
||
Then, update `../index.ts` to import from the new folder that has the latest schemas and add the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might also want to mention updating the *Latest
exports below. https://github.com/elastic/kibana/pull/127218/files#diff-f3b181c03305d4fb9648d9940a91dc9e1ac8411d6e1be7dee7b1013700892032R15
|
||
export type GenericAlert800 = AlertWithCommonFields800<BaseFields800>; | ||
|
||
// This is the type of the final generated alert including base fields, common fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all really great, thank you!
for the version to be released and add the field(s) to the schema in that folder. | ||
|
||
Then, update `../index.ts` to import from the new folder that has the latest schemas and add the | ||
new schemas to the union of all alert schemas. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
export type NotificationRuleTypeParams = RuleParams & { | ||
id: string; | ||
name: string; | ||
}; | ||
|
||
const convertToLegacyAlert = (alert: RACAlert) => | ||
const convertToLegacyAlert = (alert: DetectionAlert) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I believe the comment above is inaccurate, as every schema will be used at least for reading (in the intersection of types). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM... couple minor things that could be addressed later, but seems to work as-is. There's a lot here, so it's possible something got by me... but I performed some sanity checks, such as: created query, threshold, and eql rules. Verified that the generated alerts have a proper UUID and ancestry tree. Building block alerts are shown correctly for EQL alerts. Timelines load properly for each rule type. Pre-built rules load and can be activated. Types resolve correctly in IDE (VSCode). Removing a required field from an alert or changing a value to an invalid type causes static type errors.
This was a monumental undertaking and will be really, really great for development moving forward. Thanks, @marshallmain !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marshallmain I think it's a fantastic proposal and I find the concept itself very much right.
Have some questions about the implementation details though 🙂
I went through most of the code changes and left some comments, some of which I find important, especially the one about the union types.
I'm going to post another bunch of comments and questions that are related to some of the ideas mentioned in the PR description.
const commonAlertIdFieldNames = [ALERT_INSTANCE_ID, ALERT_UUID]; | ||
export type CommonAlertIdFieldName800 = Values<typeof commonAlertIdFieldNames>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why these two id fields are not included in CommonAlertFields800
and have their own union type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are used by Observability for the lifecycle rule executor logic. I'm not sure why they're separated out, but I kept them the same way they were in get_common_alert_fields.ts
x-pack/plugins/security_solution/common/detection_engine/schemas/alerts/8.0.0/index.ts
Outdated
Show resolved
Hide resolved
[ALERT_RULE_CONSUMER]: string; | ||
[ALERT_ANCESTORS]: Ancestor800[]; | ||
[ALERT_STATUS]: string; | ||
[ALERT_WORKFLOW_STATUS]: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off-topic, but it would be great to eventually have a jsdoc comment for every field in the alert schema containing a description (what it means), examples of values, and any other important information.
|
||
// This is the type of the final generated alert including base fields, common fields | ||
// added by the alertWithPersistence function, and arbitrary fields copied from source documents | ||
export type DetectionAlert800 = GenericAlert800 | EqlShellAlert800 | EqlBuildingBlockAlert800; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how union type is gonna work here, maybe missing something.
Please check this example where properties x
, y
and z
are not accessible without an additional type casting:
I'd expect similar issues with access to fields in the real DetectionAlert800
: ALERT_GROUP_INDEX
is defined as a number
but available as SearchTypes
in DetectionAlert800
export interface EqlBuildingBlockFields800 extends BaseFields800 {
[ALERT_GROUP_ID]: string;
[ALERT_GROUP_INDEX]: number;
[ALERT_BUILDING_BLOCK_TYPE]: 'default';
}
I think it's going to get worse when we'll need to start adding more versions of DetectionAlert
to the final union type:
// When new Alert schemas are created for new Kibana versions, add the DetectionAlert type from the new version
// here, e.g. `export type DetectionAlert = DetectionAlert800 | DetectionAlert820` if a new schema is created in 8.2.0
export type DetectionAlert = DetectionAlert800;
I think what we need here instead of using the union type is constructing an interface manually that would:
- Make all the common (base) fields required
- Make all the alert type-specific fields optional (
x | undefined
)
export type DetectionAlert800 =
CommonAlertFields800 &
BaseFields800 &
Partial<EqlShellFields800> &
Partial<EqlBuildingBlockFields800>;
Still, combining multiple versions into a single DetectionAlert
interface is a little bit unclear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, TS has a pretty nasty bug in the implementation of deeply nested type intersection (microsoft/TypeScript#47935) which can become a source of bugs in the code (unless all the fields in the alert schema will be flat).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect similar issues with access to fields in the real
DetectionAlert800
:ALERT_GROUP_INDEX
is defined as anumber
but available asSearchTypes
inDetectionAlert800
This is working as intended IMO - when an alert is first retrieved, without additional checks at runtime the type of ALERT_GROUP_INDEX
really could be anything. Once we add the full types for ALERT_RULE_PARAMETERS
though, the field alert[ALERT_RULE_PARAMETERS].type
can be used as a discriminant at runtime to narrow the type down to EQL alerts only. At that point the type should be EqlBuildingBlockAlert800 | EqlShellAlert800
, so we'd still need some other discriminant between those 2 types. But in general for alerts from different types of rules, we can use a known field like alert[ALERT_RULE_PARAMETERS].type
as a discriminant.
Alternatively, developers can fetch ALERT_GROUP_INDEX
, accept that its type is SearchTypes
, and then do extra runtime validation on the retrieved value to ensure that it's not undefined or an object or some other unexpected value.
In both cases though I think it's a feature that ALERT_GROUP_INDEX
has a very general type if it's retrieved from DetectionAlert
- we're warning developers that the value could be anything and they need to do more validation there. For fields that are common and required across all alert types, e.g. ALERT_RULE_DESCRIPTION
, the type system can convey that the value received from any DetectionAlert
will always be string
and extra validation isn't needed.
A few other thoughts and questions.
Fully support the idea of having separate write and read models! How could we make it more explicit in the code? E.g. it's not immediately obvious that
I remember discussions within RAC around breaking changes in the alert schema and how we could support schema evolution with runtime fields. By adding runtime fields to an old concrete index, we'd effectively change the schema of the old alerts and would have to reflect it in the corresponding Regarding breaking changes. I remember the last decision was "we're not gonna introduce them". But let's say at some point it happens. How would we support (in the static TS schema) something like a type change of a field, field rename, etc?
Extends in the sense of
It would definitely be great to fix our rule schemas (#80792). Regarding versioning, I think whatever is written to
|
💚 Build SucceededMetrics [docs]Public APIs missing comments
Async chunks
Public APIs missing exports
Page load bundle
Unknown metric groupsAPI count
ESLint disabled in files
ESLint disabled line counts
References to deprecated APIs
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: |
I'm not sure how far we want to go on naming types based on how they're used (read vs write) rather than what they are (v8.0.0, Latest, etc). E.g. if we end up wanting to write an old version for some reason, maybe updating alerts at some point, it may break the naming model if we try to name the types based on how we think they're being used. That's not to say we can never change the names, but the
For these types we wouldn't support runtime fields. The types here only represent the
Afaik
All great points here. The rule schemas are fairly complex in their own right (part of why I avoided including them in this PR) so I'm a bit nervous about re-implementing any parts of them and attempting to ensure that the re-implementation is in sync with the source-of-truth io-ts schemas. I agree that there isn't a ton of use in rule management/executors for old rule schemas since we can ensure that rules get migrated, but we'll need them in some form to include in the alert schemas. |
Thanks to everyone who reviewed this for the thoughtful comments and feedback! |
Summary
The goal of this PR is to create a system of schemas that are:
Current alert schemas fail these criteria because they are generated from FieldMaps (adding complexity), updated in place, fail to document all fields accurately (e.g.
kibana.alert.group.index
is not required by the schema, but should exist for EQL alerts), and are defined server side.Motivation - Development speed and quality
We have already run into one bug where a required field was not populated in some alert documents. Once a bug ships that creates documents incorrectly, any fix requires user action to initiate a re-index of the alerts in addition to the developer time to create and validate the fix. The changes proposed here would catch this bug at compile time. These sorts of bugs become harder to catch as the schema evolves over time and fields get added, removed, and changed. Keeping the schemas separated by version will help reduce the risk of repeated schema changes over time causing fields to be incorrectly included in or omitted from alert documents.
We are also spending more time than necessary communicating details of the alerts schema over Slack and Zoom. It will be far more efficient for the code to clearly communicate more details about the alert schema. With a more comprehensive static schema, the knowledge will transfer to new developers more efficiently.
Static types are a powerful tool for ensuring code correctness. However, each deviation of the static type from the actual runtime structure adds places where developers may need to cast, assert, or use conditional logic to satisfy the compiler. The current static types require frequent workarounds when the static types don't match what developers know or believe is true about the runtime type of the alert documents. These runtime workarounds establish patterns that evade the type system - costing developer time to create and maintain in addition to increasing the risk of bugs due to the additional complexity. Accurate static types are excellent documentation of the data structures we use but it's crucial that the static types are comprehensive to minimize cases where runtime checks are needed.
Current Structure
Alert schemas are defined by the FieldMap structures that also define the Elasticsearch alert index mappings. These FieldMaps are unversioned and may be modified in the future to add more fields. In addition, the Elasticsearch index mappings do not always align perfectly with the
_source
of the alerts that we intend to write. For example, thekibana.alert.rule.parameters
field is typeflattened
in Elasticsearch mappings but we know that alerts from a particular rule type will populatekibana.alert.rule.parameters
with some specific set of fields defined by the rule type. Also, multiple types of alerts may live in the same alerts index with similar but not identical sets of fields.Proposed structure - Common Alert Schema Directory
This PR has 2 primary code changes: (1) separate the alert document schemas from the FieldMaps, and (2) set up a code structure that enables easy versioning of alert schemas. During the Detection Engine migration to the rule registry we used the FieldMaps to define the alert schema, but ended up with numerous type casts and some bugs in the process. This PR creates a new common directory
x-pack/plugins/security_solution/common/detection_engine/schemas/alerts
to store the various Security alert schemas by Kibana version. This PR also fully removes theRACAlert
andWrappedRACAlert
types that were derived from the FieldMaps. However, the schemas defined in this PR are not 100% comprehensive - some fields are not typed as strictly as they could be, particularly in thekibana.alert.rule.parameters
field. Follow up work will add more detail to the schemas and provide more comprehensive coverage.x-pack/plugins/security_solution/common/detection_engine/schemas/alerts
initially containsindex.ts
and one folder,8.0.0
.index.ts
imports the schemas from8.0.0
and re-exports them as...Latest
, denoting that those are the "write" schemas. The reason for this is that as we add new schemas, there are many places server side where we want to ensure that we're writing the latest alert schema. By havingindex.ts
re-export8.0.0
schemas, when we add make a new alert schema in the future (e.g. adding an additional field in 8.x) we can simply updateindex.ts
to re-export the new schema instead of the previous schema.index.ts
also exports aDetectionAlert
which is the "read" schema - this type will be maintained as a union of all versioned alert schemas, which is needed to accurately type alerts that are read from the alerts index.Reading vs writing alerts
When writing code that deals with creating a new alert document, always use the schema from
alerts/index.ts
, not from a specific version folder. This way when the schema is updated in the future, your code will automatically use the latest alert schema and the static type system will tell us if code is writing alerts that don't conform to the new schema.When writing code that deals with reading alerts, it must be able to handle alerts from any schema version. The "read schema" in
index.ts
DetectionAlert
is a union of all of the versioned alert schemas since a valid alert from the.alerts
index could be from any version. Initially there is only one versioned schema, soDetectionAlert
is identical toDetectionAlert800
.Generally, Solution code should not be directly importing alert schemas from a specific version. Alert writing code should use the latest schema, and alert reading code should use the union of all schemas.
Adding new schemas
In the future, when we want to add new fields, we should create a new folder named with the version the field is being added in, create the updated schema in the new folder, and update
index.ts
to re-export the schemas for the new version instead of the previous version. Also, update the "read schema"DetectionAlert
type inindex.ts
to include the new schema in addition to the previous schemas. The schema in the new version folder can either build on the previous version, e.g. 8.4.0 could import the schema from 8.0.0 and simply add a few new fields, or for larger changes the new version could build the schema from scratch. Old schemas should not change when new fields are added!Changing existing schemas
The schema in the 8.0.0 folder, and any future versioned folders after the version is released, should not be updated with new fields. Old schemas should only be updated if a bug is discovered and it is determined that the schema does not accurately represent the alert documents that were actually written by that version, e.g. if a field is typed as
string
in the schema but was actually written asstring[]
. The goal of these schemas is to represent documents accurately as they were written and since we aren't changing the documents that already exist, the schema should generally not change.No changes
If a version of Kibana makes no changes to the schema, a new folder for that version is not needed.
Design decisions
FieldMaps are integrated tightly with Elasticsearch mappings already, with minimal support for accurate TypeScript types of the fields. I wanted to avoid adding tons of extra information in to the FieldMaps that would not be used for the Elasticsearch mappings. Instead later we can write a bit of code to ensure that the alert schemas are compatible with the FieldMap schemas, essentially ensuring that the alert schemas extend the FieldMap schemas.
| undefined
used in field definitions instead of making fields optional?Making all fields required, but some
| undefined
in the type, helps ensure that we don't forget to copy over fields that may be undefined. If the field is optional, e.g.[ALERT_RULE_NOTE]?: string
, then the compiler won't complain if the field is completely left out when we build the alert document. However, when it's defined as[ALERT_RULE_NOTE]: string | undefined
instead, the field must be explicitly provided when creating an object literal of the alert type - even if the value is undefined. This makes it harder to forget to populate all of the fields. This can be seen inbuild_alert.ts
where removing one of the optional fields from thereturn
value results in a compiler error.| undefined
?Adding new fields as
| undefined
when they're actually required reduces the accuracy of the schema, which makes it less useful and harder to work with. If we decide to add a new field and always populate it going forward then accurately representing that in the static type makes it easier to work with alerts during the alert creation process. When a field is typed as| undefined
but a developer knows that it should always exist, it encourages patterns that fight the type system through type-casting, assertions, using?? <some default value>
, etc. This makes the code harder to read, harder to reason about, and thus harder to maintain because the knowledge of "this field is typed as| undefined
but actually always exists here" is not represented in the code and only lives in developers minds. Versioned alert schemas aim to turn the static types into an asset that precisely documents what the alert document structure is.Potential Future Work
kibana.alert.rule.parameters
type for each type of alert