Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch Query Stack Alert Aggregation Support #95161

Open
christophercutajar opened this issue Mar 23, 2021 · 16 comments
Open

Elasticsearch Query Stack Alert Aggregation Support #95161

christophercutajar opened this issue Mar 23, 2021 · 16 comments
Labels
enhancement New value added to drive a business result estimate:needs-research Estimated as too large and requires research to break down into workable issues Feature:Alerting/RuleTypes Issues related to specific Alerting Rules Types R&D Research and development ticket (not meant to produce code, but to make a decision) research Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@christophercutajar
Copy link

Describe the feature:

With 7.12.0, the alerting engine now supports creating an alert based on a query using the Elasticsearch query alert type. It would be super useful if such alert type would support aggregations and the contents of the aggregation can then be added to the action.

Describe a specific use case for the feature:

  • Monitoring Kubernetes CronJob failures

This is a very simple watcher that we have deployed which basically monitors k8s cronjobs metricbeat data and alerts the team with the cronjob name and how many times it failed during the past X minutes. If a cronjob failed for example 10 times during the past 10 minutes, we don't want to return a list containing with the same name but a single nice message saying

CronJob X had X number of failures in the last X minutes
  • Monitoring whether a third-party agent is deployed within our environment

This is a much more complex than the previous use-case. In a nutshell, we're ingesting third-party data into Elasticsearch for vulnerability management of Elastic's infrastructure. Using this data, we have a watcher in place that is checking this third-party data ingested into ES whether all assets in our asset inventory has a particular agent deployed. For those assets that doesn't have an agent installed, will trigger an action that will include the results from the below aggregation and send a message to the respective team:

"aggs": {
                  "cloud_provider": {
                    "terms": {
                      "field": "cloud.provider",
                      "missing": "unknown-cloud-provider",
                      "size": 100
                    },
                    "aggs": {
                      "cloud_project_id": {
                        "terms": {
                          "field": "cloud.project.name",
                          "missing": "unknown-cloud-project",
                          "size": 100
                        },
                        "aggs": {
                          "hostname": {
                            "terms": {
                              "field": "host.name",
                              "size": 100
                            }
                          }
                        }
                      }
                    }
                  }
                }
@christophercutajar christophercutajar added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Mar 23, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@gmmorris gmmorris added the Feature:Alerting/RuleTypes Issues related to specific Alerting Rules Types label Jul 1, 2021
@gmmorris gmmorris added the loe:needs-research This issue requires some research before it can be worked on or estimated label Jul 15, 2021
@gmmorris gmmorris added enhancement New value added to drive a business result estimate:needs-research Estimated as too large and requires research to break down into workable issues and removed Feature:Alerting labels Aug 13, 2021
@gmmorris gmmorris removed the loe:needs-research This issue requires some research before it can be worked on or estimated label Sep 2, 2021
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
@clement-fouque
Copy link

clement-fouque commented Feb 8, 2022

We have another use case for Vulnerability Management where we want to notify the different teams. We would need to have aggregations to notify based on the severity.

The moustache template allows us to access document through context.hits. We would be interested to be able to use aggregations in the Actions section.

@mikecote mikecote added R&D Research and development ticket (not meant to produce code, but to make a decision) research labels Sep 23, 2022
@kobelb
Copy link
Contributor

kobelb commented Sep 23, 2022

@mikeh-elastic recently ran into this issue as well. He wanted to do a "group by", which was impossible using the Elasticsearch Query rule.

@jeffvestal
Copy link

Since this issue has some recent activity I'm going to mention an issue I've had open for a while to support derivative aggregation.
There are a couple examples in the linked issue. Adding general aggs support looks like it would close mine also.

@mikeh-elastic
Copy link

If we can basically get the logs ui alert but allow us to put an index to search of our own, that would solve this. I too would like to be able to supply the dsl of the aggs since I can do some very powerful things with buckets and pipeline aggregations which is still only capable in watcher to alert and vega to visualize today.

@billfnt
Copy link

billfnt commented Nov 2, 2022

Can confirm that @mikeh-elastic's approach would solve this issue for us as well. We were actually able to implement one alert in Logs UI for a very specific use case, but for general use, it would be handy to have this option in the basic elasticsearch query alert type.

@berglh
Copy link

berglh commented Dec 22, 2022

We're wanting to be able to alert on things like:

  • for a time range, for the date histogram buckets, if the min of the cardinality for a field value > threshold; then send alert via a connector.

I can accomplish something like this with Watcher, although Watcher seems unable to make use of Rules and Connectors connectors in Observability, and needs to have separate alerting methods defined in elasticsearch.yml 🤦

Example Watcher Query
{
  "trigger": {
    "schedule": {
      "interval": "10m"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "metrics-kubernetes.state_service*"
        ],
        "rest_total_hits_as_int": true,
        "body": {
          "size": 0,
          "query": {
            "bool": {
              "must": [
                {
                  "range": {
                    "@timestamp": {
                      "gte": "now-24h",
                      "lte": "now"
                    }
                  }
                },
                {
                  "term": {
                    "data_stream.dataset": {
                      "value": "kubernetes.state_service"
                    }
                  }
                }
              ]
            }
          },
          "aggs": {
            "service_counts": {
              "date_histogram": {
                "field": "@timestamp",
                "fixed_interval": "1h"
              },
              "aggs": {
                "services": {
                  "cardinality": {
                    "field": "kubernetes.service.name"
                  }
                }
              }
            },
            "min_service_count": {
              "min_bucket": {
                "buckets_path": "service_counts>services"
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.aggregations.min_service_count.value": {
        "gte": 150
      }
    }
  },
  "actions": {
    "my-logging-action": {
      "logging": {
        "level": "info",
        "text": "Minimum of {{ctx.payload.aggregations.min_service_count.value}} Amzaon EKS services in the past 24 hours. Threshold is 225."
      }
    }
  }
}

I've made extensive use of things like this in other vis & alerting tools like Grafana in the past, so I was surprised that something like this would be so difficult after paying for Elastic Cloud Enterprise.

@mikecote
Copy link
Contributor

mikecote commented Mar 8, 2023

In 8.7 we will be adding support to the Elasticsearch Query rule to select a field to "group by". This will make the rule measure the aggregation per group and create an alert per grouped value (#144689). Are there use cases where this wouldn't work?

@billfnt
Copy link

billfnt commented Mar 9, 2023

Will the solution in 8.7 be able to handle multiple levels of aggregations or will it be limited to aggregation on a single field? The use case I've seen for aggregations would not be covered by #144689 if it is limited to aggregations on a single field.

@mikecote
Copy link
Contributor

mikecote commented Mar 9, 2023

Will the solution in 8.7 be able to handle multiple levels of aggregations or will it be limited to aggregation on a single field? The use case I've seen for aggregations would not be covered by #144689 if it is limited to aggregations on a single field.

It will be a limited to aggregate on a single field as of 8.7. Thanks for confirming your use case 🙏

@sorenlouv
Copy link
Member

sorenlouv commented Mar 9, 2023

Another use case that I don't think will be solved with the suggested approach:

I'm monitoring a system where a document is created per metric. I want an alert to trigger if the most recent state of 2 specific metrics have a specific value. Therefore I cannot simply use doc counts but have to use aggregations and then some custom logic to parse the agg response.

View query
GET all-hass-events/_search
{
  "track_total_hits": false,
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "hass.entity_id": [
              "binary_sensor.passat_gte_charging_cable_locked",
              "binary_sensor.passat_gte_charging_cable_connected"
            ]
          }
        }
      ]
    }
  },
  "aggs": {
    "entities": {
      "terms": {
        "field": "hass.entity_id",
        "size": 10
      },
      "aggs": {
        "tm": {
          "top_metrics": {
            "metrics": [ { "field": "hass.value.float" }],
            "sort": [ { "@timestamp": "desc"} ]
          }
        }
      }
    }
  }
}

Pseudo code for when an alert should trigger:

const triggerAlert = resp.metricA === 0 && resp.metricB === 1

@mikecote
Copy link
Contributor

Thanks @sqren! We'll keep this issue open to track requests beyond the terms aggregation that is releasing in 8.7 👍

cc @shanisagiv1

@TomonoriSoejima
Copy link
Contributor

@mikecote can you share any related issue link to that?

@mikecote
Copy link
Contributor

@mikecote can you share any related issue link to that?

@TomonoriSoejima you can find the basic capability here: #144689

@TomonoriSoejima
Copy link
Contributor

Right, I was reading it now!!

@juvenalguevara
Copy link

has this feature been released and what is the version?
We want to put in place aggregations (groupby and count) as part of the Query Alerts in Kibana.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result estimate:needs-research Estimated as too large and requires research to break down into workable issues Feature:Alerting/RuleTypes Issues related to specific Alerting Rules Types R&D Research and development ticket (not meant to produce code, but to make a decision) research Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
No open projects
Development

No branches or pull requests