[discuss] extending event log for faster/easier access to active instance date information #93704
Labels
discuss
Feature:EventLog
Team:ResponseOps
Label for the ResponseOps team (formerly the Cases and Alerting teams)
Currently in alerting, we generate events for the event log as follows:
execute
- when the alert executesactive-instance
- for every active instance, after alert executionnew-instance
- for every newly found instance, after alert executionrecovered-instance
- for all previously active instances, which are no longer active, after the alert executionThese are all "stateless" documents, so to get the duration of an active instance, you need to get the
new-instance
document to figure out when it started. Likewise, if you wanted to know the duration of the last time an instance was active, you'd have to start with it'srecovered-instance
document, and then search back in time to find thenew-instance
document. We've built some aggs to do this, but ... it's complicated - seealerts_instance_summary_from_event_log.ts
in PR #89681.The "stateless" events were easy to implement, as none of the
*-instance
documents had to know anything about the previous state, they just wrote what they knew at the time.However, this greatly complicates trying to calculate these date ranges, which we show in the alert details page. We optimized writing the documents, and made it really hard to pull useful information back out.It struck me that we are not using the
event
fieldsstart
,stop
, andduration
for these*-instance
events, since they are point-in-time events. The action and alertexecute
events however, do use those fields to document the execution time of the alert and action type executors.Maybe we should start using those fields for the
*-instance
events as well?The
new-instance
event would not use those fields, but theactive-instance
andrecovered-instance
could. Bothactive-instance
andrecovered-instance
could store the timestamp of thenew-instance
event instart
, and thenduration
would essentially becurrentTime - start
. Therecovered-instance
event could certainly store theend
date as well, but not sure about putting that inactive-instance
, since the active state has not really "ended" yet - but that's just semantics. Would it be weird to havestart
andduration
but noend
? It may also be confusing to have theduration
inactive-instance
, since it's really just the "duration relative to the event's timestamp", and so would be changing for every subsequentactive-instance
document. But would be very useful to have.It seems like this would make calculation of the data for the alert details page a lot easier, since it wouldn't involve having to do searches over the
new-instance
documents at all.It would also be more useful when accessing the event log via Discover or Lens, since the
duration
is available for the interesting events, without having to search for earliernew-instance
events.I think this would involve storing the
new-instance
timestamp in the instance state, which I believe is typed here:alert_instance.ts
. Which seems straight-forward. We would need to deal with migration issues - older events and older instance state won't have these fields, so we can't rely on them ALWAYS being there.The text was updated successfully, but these errors were encountered: