[Alerting][Event Log] Consider adding `uuid` to active alert spans #101749

ymao1 · 2021-06-09T13:12:14Z

For this issue, we added start/duration/end times to the *-instance actions in the event log and considered adding a uuid to identify unique active spans for an alert. We decided to hold off after reviewing what SIEM and RAC were doing for this and how they are using event.id.

Currently, the lifecycle rule type in the rule registry is doing something similar but storing it in the kibana.rac.alert.uuid field. SIEM is using event.id to store the original source document id when a source document is copied into the signals index. When the signal generated is an aggregate over multiple source documents, the event.id field is not populated.

Given these other usages, do we want to add a uuid field to identify active alert spans? If we do, should we use the event.id field to store it? Or consolidate it with a RAC field?

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-06-09T13:12:33Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

pmuellr · 2021-07-20T14:07:28Z

I'm hesitant to use event.id for this, since I don't know it's purpose, and it seems fairly "global". I was thinking something in rule. or kibana.alerting; a field in rule would be the best, if we can agree on a field in there - but maybe there's no good fit there.

Currently feels like alerting should be creating the UUID for the new "span" of alerts, and then make it available to the rule registry somehow, for it's uses. Not quite sure yet how we'll thread the value through, but you can see the place the changes would go for RAC, around the following code. This is where the rule executor is actually invoked, and that code will be calling scheduleActions() - the alert UUIDs should have been generated by the time the executor has returned, and be made available to the rule registry framework.

kibana/x-pack/plugins/rule_registry/server/utils/create_lifecycle_executor.ts

Lines 162 to 175 in b58054c

 const nextWrappedState = await wrappedExecutor({ 

 ...options, 

 state: state.wrapped != null ? state.wrapped : ({} as State), 

 services: { 

 ...options.services, 

 ...lifecycleAlertServices, 

 }, 

 }); 

 const currentAlertIds = Object.keys(currentAlerts); 

 const trackedAlertIds = Object.keys(state.trackedAlerts); 

 const newAlertIds = currentAlertIds.filter((alertId) => !trackedAlertIds.includes(alertId)); 

 const allAlertIds = [...new Set(currentAlertIds.concat(trackedAlertIds))];

pmuellr · 2021-07-27T20:16:23Z

Taking another peek at this. Looks like RAC creates the UUIDs for lifecycle alerts, here:

kibana/x-pack/plugins/rule_registry/server/utils/create_lifecycle_executor.ts

Lines 259 to 262 in b58054c

 const { alertUuid, started } = state.trackedAlerts[alertId] ?? { 

 alertUuid: v4(), 

 started: timestamp, 

 };

So it appears the UUIDs are created after running the executor, so I think we can create/manage the UUIDs when scheduleActions() is run (need to deal with unscheduleActions() or any other mutators), and then arrange to be able to return that data in a new method on AlertServices, which could be called from the RAC wrapper. For example, something like:

interface AlertServices {
  ...
  getInstances(): Map<string, string> // key: existing alert instance ID; value: new alert instance UUID
}

pmuellr · 2021-07-27T20:19:10Z

Happened to remember we had a similar issue we had open a while back: #64268

For that one, we realized that some rule types were already using UUIDs as their instance ids, so we thought we should add a new "human readable" to associate with an instance. I think that ship has sailed at this point, since we now have an "official" UUID - we should continue to shoot to make the alert instance id's as human readable. But may need to revisit that over time, perhaps adding an explicit "description" to these alert instances would make sense later.

gmmorris · 2021-10-11T12:30:52Z

It's worth noting that without this there is actually no way of using the span as part of a dedup key in connectors such as PagerDuty.

This means that a customer can't set up actions on a rule so that they get a new incident whenever a specific alert ID reappears (so, for instance, get a new incident whenever the CPU exceeds 90% on Host #1, rather than reopen the incident form the last time it exceeded 90%).

This feels like a relatively basic missing feature.
What do you think @arisonl & @mikecote ?

mikecote · 2021-10-12T18:50:29Z

I agree, allowing access to some span ID would allow to mimic alerts as data on an external system, create new incidents whenever an alert comes back.

@arisonl should this even become the default dedup key? instead of {ruleId}:{alertId} it becomes {ruleId}:{spanId}?

pmuellr · 2024-07-10T17:48:15Z

We have since added kibana.alert.uuid as a unique identifier of alerts from when created till they recover.

ymao1 added Feature:EventLog Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jun 9, 2021

ymao1 mentioned this issue Jun 9, 2021

[Alerting][Event log] Persisting duration information for active alerts in event log #101387

Merged

1 task

gmmorris added Project:ObservabilityOfAlerting Alerting team project for observability of alerting. and removed Project:ObservabilityOfAlerting Alerting team project for observability of alerting. labels Jun 30, 2021

mikecote added the discuss label Jul 14, 2021

mikecote added the loe:needs-research This issue requires some research before it can be worked on or estimated label Jul 26, 2021

pmuellr mentioned this issue Jul 29, 2021

[Alerting] event log meta issue #62221

Closed

21 tasks

gmmorris added insight Issues related to user insight into platform operations and resilience estimate:needs-research Estimated as too large and requires research to break down into workable issues labels Aug 13, 2021

gmmorris removed the loe:needs-research This issue requires some research before it can be worked on or estimated label Sep 2, 2021

gmmorris added the impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. label Oct 11, 2021

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

pmuellr closed this as completed Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Alerting][Event Log] Consider adding `uuid` to active alert spans #101749

[Alerting][Event Log] Consider adding `uuid` to active alert spans #101749

ymao1 commented Jun 9, 2021

elasticmachine commented Jun 9, 2021

pmuellr commented Jul 20, 2021

pmuellr commented Jul 27, 2021

pmuellr commented Jul 27, 2021

gmmorris commented Oct 11, 2021

mikecote commented Oct 12, 2021

pmuellr commented Jul 10, 2024

[Alerting][Event Log] Consider adding uuid to active alert spans #101749

[Alerting][Event Log] Consider adding uuid to active alert spans #101749

Comments

ymao1 commented Jun 9, 2021

elasticmachine commented Jun 9, 2021

pmuellr commented Jul 20, 2021

pmuellr commented Jul 27, 2021

pmuellr commented Jul 27, 2021

gmmorris commented Oct 11, 2021

mikecote commented Oct 12, 2021

pmuellr commented Jul 10, 2024

[Alerting][Event Log] Consider adding `uuid` to active alert spans #101749

[Alerting][Event Log] Consider adding `uuid` to active alert spans #101749