[Alerting] Write executionStatus property to kibana event log #79785

dhurley14 · 2020-10-06T22:18:44Z

Describe the feature:

The executionStatus property on alerting saved objects (introduced here #75553) is a view into the current execution status of a kibana alert. It would be nice if each executionStatus was written to the kibana event log index .kibana-space-event-log-8.0.0 and we could query that for historical purposes.

Describe a specific use case for the feature:

The security solution currently keeps track of failures in a list-like structure of saved objects. With the addition of the executionStatus property to kibana alerts, we now have to manage merging each executionStatus into our rule status failure tracking system. It would be nice if we had a separate place to query for historical executions of kibana alerts rather than having to pull it directly off of the alert.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-10-06T22:19:42Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

pmuellr · 2020-10-06T23:10:53Z

This is a great idea. We'll need to find a place in ECS we can put this, or add a new extension field. I think we'd want to support the status, error.reason, and error.message fields, the date is redundent as the event doc would be built at the same time the execution status is built, and the event doc already has a timestamp. But maybe be easier to duplicate the entire structure. Not sure.

pmuellr · 2020-10-22T15:41:32Z

Here are the locations of some of the relevant spots in the code for this:

in run(), executionStatus is updated in the alert SO:

kibana/x-pack/plugins/alerts/server/task_runner/task_runner.ts

Lines 346 to 355 in c4b8f4c

    
           try { 
        
             await partiallyUpdateAlert(client, alertId, attributes, { 
        
               ignore404: true, 
        
               namespace, 
        
             }); 
        
           } catch (err) { 
        
             this.logger.error( 
        
               `error updating alert execution status for ${this.alertType.id}:${alertId} ${err.message}` 
        
             ); 
        
           }

in executeAlertInstances(), the event for the alert execute action is logged:

kibana/x-pack/plugins/alerts/server/task_runner/task_runner.ts

Lines 218 to 222 in c4b8f4c

    
           eventLogger.stopTiming(event); 
        
           event.message = `alert executed: ${alertLabel}`; 
        
           event.event = event.event || {}; 
        
           event.event.outcome = 'success'; 
        
           eventLogger.logEvent(event);

and the call tree looks like:

run()
- loadAlertAttributesAndRun()
  - validateAndExecuteAlert()
    - executeAlertInstances()

However, loadAlertAttributesAndRun() is called in run() (and thus the event doc written) before the code is run in run() to calculate the execution status. So, will require refactoring some bits to get the execution status calculated before the event doc is written.

pmuellr · 2020-10-22T16:48:26Z

Ya, looking to see how to refactor to do this and ... "it's complicated". :-)

One thing that would be straight-forward to do, is to add an indication of "ok" | "active" - ie, there are no active instances | there are active instances. But at that point, it might as well be a field indicating the number of active instances, which would be 0 for alert status of ok, and > 0 for alert status of active. Provides more "precise" data.

Note that the other interesting case from the alert execution status is the error conditions, but errors will already be reported in the event anyway. It won't have the reason (like decrypt) that the alert execution status has, but it's not clear to me that it's that important.

@dhurley14 thoughts? The use case described is to get failure information from the event log. I think some "errors" won't show up today in the event log, the alerting:execute event only gets logged when the executor actually runs, so on a decrypt error, I'm guessing there won't be an event log doc currently.

If that's the case, another option is to generate a new type of event that would basically for "we wanted to run an alert, but before we could even try, there was an error, and this is what it was".

dhurley14 · 2020-10-28T14:48:09Z

alerting:execute event only gets logged when the executor actually runs, so on a decrypt error, I'm guessing there won't be an event log doc currently.

Yeah this is what we've noticed is the decrypt errors aren't showing up in the event log.

If that's the case, another option is to generate a new type of event that would basically for "we wanted to run an alert, but before we could even try, there was an error, and this is what it was".

I think focusing on the "errors" piece of this is the more important part from the security solution perspective. To know that there are longer running / really really long running rules that never seem to complete via the "ok / active" statuses would be great too but I think the priority is to have some queryable log of failures for the rules. We keep track within our rules of the "last five failures" which occur within the functions we run in our alert executor function as a saved object separate from the rule, but to be able to integrate historical failures from the event log + our custom "last five failures" queue would be a nice-to-have.

pmuellr · 2020-10-28T19:22:52Z

At this point I'm wondering if an easy route is a new event type (ie, event action) for alerts called error, which we could use to indicate errors that aren't conveniently handled by things like the execute action. It would mean looking for errors would involve a more involved search in the event log; looking for both alerting:execute docs which have an error indicator, AND alerting:error docs.

I feel like we'll need something like this eventually anyway - there are too many things outside of the execution of alerts that can have "problems" that we don't have any way of reporting on via the event log, this would be way of getting them in.

I wanna take another look at getting this into the execute action though as well - it seems like we should be able to make this work somehow, and it is associated with the execution.

resolves elastic#79785 Until now, the execution status was available in the the event log document for the execute action. In this PR we add it. The event log is extended to add the following fields: - `kibana.alerting.status` - from executionStatus.status - `event.reason` - from executionStatus.error.reason The date from the executionStatus and start date in the event log will be set to the same value. Previously, errors encountered while trying to execute an alert executor, eg decrypting the alert, would not end up with an event doc generated. Now they will. In addition, there were a few places where events that could have had the action group in them did not, and one where the instance id was undefined - those were fixed up.

) resolves #79785 Until now, the execution status was available in the the event log document for the execute action. In this PR we add it. The event log is extended to add the following fields: - `kibana.alerting.status` - from executionStatus.status - `event.reason` - from executionStatus.error.reason The date from the executionStatus and start date in the event log will be set to the same value. Previously, errors encountered while trying to execute an alert executor, eg decrypting the alert, would not end up with an event doc generated. Now they will. In addition, there were a few places where events that could have had the action group in them did not, and one where the instance id was undefined - those were fixed up.

…stic#82401) resolves elastic#79785 Until now, the execution status was available in the the event log document for the execute action. In this PR we add it. The event log is extended to add the following fields: - `kibana.alerting.status` - from executionStatus.status - `event.reason` - from executionStatus.error.reason The date from the executionStatus and start date in the event log will be set to the same value. Previously, errors encountered while trying to execute an alert executor, eg decrypting the alert, would not end up with an event doc generated. Now they will. In addition, there were a few places where events that could have had the action group in them did not, and one where the instance id was undefined - those were fixed up.

) (#83289) resolves #79785 Until now, the execution status was available in the the event log document for the execute action. In this PR we add it. The event log is extended to add the following fields: - `kibana.alerting.status` - from executionStatus.status - `event.reason` - from executionStatus.error.reason The date from the executionStatus and start date in the event log will be set to the same value. Previously, errors encountered while trying to execute an alert executor, eg decrypting the alert, would not end up with an event doc generated. Now they will. In addition, there were a few places where events that could have had the action group in them did not, and one where the instance id was undefined - those were fixed up.

dhurley14 added the Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) label Oct 6, 2020

pmuellr added the Feature:Alerting label Oct 6, 2020

mikecote assigned mikecote and pmuellr and unassigned mikecote Oct 13, 2020

pmuellr mentioned this issue Nov 3, 2020

[alerts] add executionStatus to event log doc for action execute #82401

Merged

1 task

dhurley14 mentioned this issue Nov 12, 2020

[Security Solution] [Detections] Replace rule status saved object #83235

Closed

pmuellr closed this as completed in #82401 Nov 12, 2020

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Alerting] Write executionStatus property to kibana event log #79785

[Alerting] Write executionStatus property to kibana event log #79785

dhurley14 commented Oct 6, 2020

elasticmachine commented Oct 6, 2020

pmuellr commented Oct 6, 2020

pmuellr commented Oct 22, 2020

pmuellr commented Oct 22, 2020

dhurley14 commented Oct 28, 2020 •

edited

Loading

pmuellr commented Oct 28, 2020

[Alerting] Write executionStatus property to kibana event log #79785

[Alerting] Write executionStatus property to kibana event log #79785

Comments

dhurley14 commented Oct 6, 2020

elasticmachine commented Oct 6, 2020

pmuellr commented Oct 6, 2020

pmuellr commented Oct 22, 2020

pmuellr commented Oct 22, 2020

dhurley14 commented Oct 28, 2020 • edited Loading

pmuellr commented Oct 28, 2020

dhurley14 commented Oct 28, 2020 •

edited

Loading