Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerts] Calling alertInstanceFactory and then scheduling no actions causes a recovery notification loop #91047

Closed
Zacqary opened this issue Feb 10, 2021 · 4 comments
Labels
bug Fixes for quality problems that affect the customer experience Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@Zacqary
Copy link
Contributor

Zacqary commented Feb 10, 2021

Root cause of #86507 and #91035. When an alert executor calls alertInstanceFactory but then fires no actions, the alert seems to perpetually trigger a recovery "status change," and will continue to send notifications over and over.

This can be reproduced in 7.11.0 with either the Inventory or Metric Threshold alert types.

A workaround is to only call alertInstanceFactory behind a conditional, when you're sure that scheduledActions is going to be called. Unfortunately, we don't have an easy linting rule for that, so it would be safer to fix this at the alerting plugin level.

@Zacqary Zacqary added bug Fixes for quality problems that affect the customer experience Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Feb 10, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@Zacqary
Copy link
Contributor Author

Zacqary commented Feb 10, 2021

To reduce this to minimum complexity, test this by creating an alert executor that just does:

const alertInstance = alertInstanceFactory('example');
const shouldFire = // Run some code to read an ES document that returns either `true` or `false`
if (shouldFire) {
  alertInstance.scheduleActions(
      // ...
  );
}

Get the alert to fire once, then let it recover. It will continue to dispatch the "Recovered" action every time the executor runs.

@pmuellr
Copy link
Member

pmuellr commented Feb 11, 2021

Tried a repro with my typical setup using the index threshold - not able to repro. However, it only creates the instance when needed, so presumably doesn't have this problem:

const alertInstance = options.services.alertInstanceFactory(instanceId);
alertInstance.scheduleActions(ActionGroupId, actionContext);

@Zacqary
Copy link
Contributor Author

Zacqary commented Feb 11, 2021

Duplicate of #91117

@Zacqary Zacqary marked this as a duplicate of #91117 Feb 11, 2021
@Zacqary Zacqary closed this as completed Feb 11, 2021
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

No branches or pull requests

4 participants