Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bucket-Level Alerting #86

Closed
adityaj1107 opened this issue Jun 2, 2021 · 4 comments
Closed

Bucket-Level Alerting #86

adityaj1107 opened this issue Jun 2, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@adityaj1107
Copy link
Contributor

Issue by qreshi
Friday Dec 18, 2020 at 23:27 GMT
Originally opened as opendistro-for-elasticsearch/alerting#326


The Document-Level Alerting feature enhancement seeks to address the concerns brought up in both #13 and #145 among others. Creating this issue to centralize discussion.

@adityaj1107 adityaj1107 added the enhancement New feature or request label Jun 2, 2021
@adityaj1107
Copy link
Contributor Author

Comment by mgiammarco
Friday Dec 25, 2020 at 11:12 GMT


Thank you for this thread.
An alerting system really useful for work should have these features:

  1. Easy and scalable: I do not need to create a new monitor/alert when I monitor a new host.
  2. Indipendent alerts for each host/resource.
  3. Possibility to choose if an alert will be autoclosed or not.

Consider this (I hope typical) use case:

  • 100 hosts to monitor
  • each host sends data with several agents (syslog, collectd, and so on) in different formats
  • if I have host1 and host2 with a failed backup I must have two alerts
  • if I have host1 and host2 and I must monitor average cpu usage I need to do groupby in an easy way and send again two separate alerts
  • for some alerts I do not want that they come to normal state automatically. For example high cpu usage at 3am and it stops at 5am. When I check it at 2pm I need to see an alert in red state.

One software that fulfills above criterias is InfluxDB. Another one is elastalert plugin for ElastiSearch. Please consider this one and eventually integrate it because it fulfills all needs.
Grafana has alerting too but it completely misses point 2.

@adityaj1107
Copy link
Contributor Author

Comment by verbecee
Monday Jan 11, 2021 at 19:03 GMT


Just got off the community forum and wanted to post 3 recommendations for alerting:

  1. For aggregation, there should flexibility on the groupby field. In our alerting implementation (we are using something besides open distro's alerting to accomplish our goals), we had an alert set up that would aggregate on field X. Initially, that field came in as a string, but then started coming in as an array of strings. So, we had to accommodate for this.
  2. The aggregation should be able to deal with dirty data. Similar to the example above, this same index started receiving logs with arrays composed of strings and the value null. At least in our implementation, null really screwed up our aggregation and needed to be handled. In our case, too, we also had to deal with ECS special characters in logs, but that also might only be an issue for us because we are interfacing with Elasticsearch.
  3. Suppression - provide context about what alert is suppressing. Is it a misconfigured server or a malware outbreak in the network?

@adityaj1107
Copy link
Contributor Author

Comment by rafael-gumiero
Tuesday Jan 19, 2021 at 01:07 GMT


Basically our use case is very similar to the ones listed above.

  1. Generate separate alerts based on a key to be defined (host, device type, etc).
  2. Grouping categorizes alerts of similar nature into a single notification. This is especially useful during larger outages when many systems fail at once and hundreds to thousands of alerts may be firing simultaneously.
  3. Inhibition is a concept of suppressing notifications for certain alerts if certain other alerts are already firing.

Use case breakdown:

  • 100+ hosts;
  • Metrics being captured via: metricbeat and filebeat;
  • It is necessary to generate separate alerts for each host/device or specific key that is out of the desired condition;
  • Create the most standardized alerts to avoid having to create endless separate rules (costly to maintain);
  • Alerts based on anomaly detection and threshold.

@qreshi
Copy link
Contributor

qreshi commented Nov 16, 2021

Closing this as this feature was launched as part of the OpenSearch 1.1 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants