Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] Sometimes, when there are multiple Kibana, some instances fail to pick up any tasks #87808

Closed
gmmorris opened this issue Jan 11, 2021 · 3 comments · Fixed by #88020
Assignees
Labels
Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@gmmorris
Copy link
Contributor

We've identified an issue where multiple Kibana running in parallel can clash in a manner that forces some of them to skip all work.
We're not sure how to recreate this consistently... but it seems to be a clash during the UpdateByQuery step,

We think this explains the drift that we see at times climb to 20s+

Screenshot 2021-01-11 at 09 46 44

Easiest way to recreate this is by running our perf tests at a high rate (1200+ alerts per minute) with at least 6 Kibana, though it can happen with less Kibana at times, it is less likely.

@gmmorris gmmorris added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jan 11, 2021
@gmmorris gmmorris self-assigned this Jan 11, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@gmmorris
Copy link
Contributor Author

We've confirmed the root cause is most likely version conflicts, so a new PR is being reviewed now which shifts the polling interval by a random amount to redistribute the polling when needed.
Screenshot 2021-01-12 at 11 45 45

@gmmorris
Copy link
Contributor Author

gmmorris commented Jan 15, 2021

32 Kibana, all balancing work, clashes are addressed by shifting polling.

Screenshot 2021-01-15 at 10 35 54

@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
3 participants