[Alerts] Calling alertInstanceFactory and then scheduling no actions causes a recovery notification loop #91047

Zacqary · 2021-02-10T21:46:59Z

Root cause of #86507 and #91035. When an alert executor calls alertInstanceFactory but then fires no actions, the alert seems to perpetually trigger a recovery "status change," and will continue to send notifications over and over.

This can be reproduced in 7.11.0 with either the Inventory or Metric Threshold alert types.

A workaround is to only call alertInstanceFactory behind a conditional, when you're sure that scheduledActions is going to be called. Unfortunately, we don't have an easy linting rule for that, so it would be safer to fix this at the alerting plugin level.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-02-10T21:47:01Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

Zacqary · 2021-02-10T22:02:06Z

To reduce this to minimum complexity, test this by creating an alert executor that just does:

const alertInstance = alertInstanceFactory('example');
const shouldFire = // Run some code to read an ES document that returns either `true` or `false`
if (shouldFire) {
  alertInstance.scheduleActions(
      // ...
  );
}

Get the alert to fire once, then let it recover. It will continue to dispatch the "Recovered" action every time the executor runs.

pmuellr · 2021-02-11T14:11:57Z

Tried a repro with my typical setup using the index threshold - not able to repro. However, it only creates the instance when needed, so presumably doesn't have this problem:

kibana/x-pack/plugins/stack_alerts/server/alert_types/index_threshold/alert_type.ts

Lines 195 to 196 in e3f6729

    
           const alertInstance = options.services.alertInstanceFactory(instanceId); 
        
           alertInstance.scheduleActions(ActionGroupId, actionContext);

Zacqary · 2021-02-11T16:09:10Z

Duplicate of #91117

Zacqary added bug Fixes for quality problems that affect the customer experience Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Feb 10, 2021

Zacqary marked this as a duplicate of #91117 Feb 11, 2021

Zacqary closed this as completed Feb 11, 2021

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Alerts] Calling alertInstanceFactory and then scheduling no actions causes a recovery notification loop #91047

[Alerts] Calling alertInstanceFactory and then scheduling no actions causes a recovery notification loop #91047

Zacqary commented Feb 10, 2021

elasticmachine commented Feb 10, 2021

Zacqary commented Feb 10, 2021

pmuellr commented Feb 11, 2021

Zacqary commented Feb 11, 2021

[Alerts] Calling alertInstanceFactory and then scheduling no actions causes a recovery notification loop #91047

[Alerts] Calling alertInstanceFactory and then scheduling no actions causes a recovery notification loop #91047

Comments

Zacqary commented Feb 10, 2021

elasticmachine commented Feb 10, 2021

Zacqary commented Feb 10, 2021

pmuellr commented Feb 11, 2021

Zacqary commented Feb 11, 2021