Spike alert not using specified timeframe for querying event counts #3041

piehld · 2020-11-16T15:29:03Z

When running a spike rule with the following configuration (and verbose stdout), the time window over which elastalert is counting events appears to be that of the run_every time interval as opposed to the timeframe interval.

Global configuration: config.yaml

rules_folder: /home/ubuntu/elastalert/rules
run_every:
  minutes: 5
buffer_time:
  hours: 1
es_host: localhost
es_port: 9200
use_ssl: False
verify_certs: True
writeback_index: elastalert
writeback_alias: elastalert_alerts
alert_time_limit:
  days: 2

Spike rule configuration: web-traffic-spike.yaml

name: "Top-level LB stability - Dramatic change in web traffic"
es_host: my-elasticsearch-host.domain.org
es_port: 9200
index: my-index-*
type: spike
buffer_time:
  hours: 3
timeframe:
  hours: 3
spike_height: 2.5
spike_type: both
threshold_ref: 75000
# threshold_cur:
query_key: service.name
alert_on_new_data: False
use_count_query: True
doc_type: _doc
realert:
  hours: 3
alert:
- "email"
email:
- "my_email@domain.org"
from_addr: "it-services@domain.org"

After allowing this to run for 3 days (thus clearly surpassing the initial timeframe duration), my stdout is still reporting as follows:

INFO:elastalert:Background configuration change check run at 2020-11-16 07:21 PST
INFO:elastalert:Background alerts thread 0 pending alerts sent at 2020-11-16 07:21 PST
INFO:elastalert:Queried rule Top-level LB stability - Dramatic change in web traffic from 2020-11-16 07:16 PST to 2020-11-16 07:21 PST: 5661 hits
INFO:elastalert:Ran Top-level LB stability - Dramatic change in web traffic from 2020-11-16 07:16 PST to 2020-11-16 07:21 PST: 5661 query hits (0 already seen), 0 matches, 0 alerts sent
INFO:elastalert:Disabled rules are: []
INFO:elastalert:Sleeping for 299.999776 seconds

As you can see, the queries are only being performed over 5 min intervals (as defined for run_every), not 3 hours as specified for timeframe. Any idea on why this is happening?

The text was updated successfully, but these errors were encountered:

nsano-rururu · 2020-11-17T12:09:42Z

We also recommend asking questions on the Gitter channel
https://gitter.im/Yelp/elastalert

piehld · 2020-11-17T17:52:52Z

@nsano-rururu Thank you, I posted it there as well. However, after investigating this myself, I believe this actually is indicative of a bug in the verbose output.

It seems that the verbose output is reporting the count from the run_every time window, while the alert itself is also checking the count from the timeframe window but simply not printing that to the standard output. I noticed this because I ended up receiving an alert that explicitly stated the full time window, for which the count was accurate:

An abnormal number (921000) of events occurred around 2020-11-16 15:53 PST.
Preceding that time, there were only 366000 events within 3:00:00

So I guess there are two bugs:
(1) The program shouldn't be wasting time querying the run_every time window anyway, and
(2) The count from the timeframe window query should be what is printed to the verbose standard output

nsano-rururu · 2020-11-18T15:23:51Z

You have to choose between continuing to use it as it is or fixing the bug yourself.

piehld · 2020-11-18T15:32:39Z

OK I understand. It's not a major bug as it does not appear to affect the actual alert, just the verbose output. So it's definitely still usable as is. I will try to look into fixing it as I continue setting up new alerts for our systems.

Thank you for your response and all your support of this project.

findmyname666 · 2021-02-25T20:47:11Z

Hello all,

I did a few tests and it really seems that rules are executed based on run_every configuration and rule timeframe isn't respected.

The following example is for terms query.

We can see from and to dates in logs for each query - link.
The same start and end time is used a few lines above for reconstruction of elastic query - link.

It means that these dates should be used for query itself.
I run elastalert with argument --es_debug_trace to see executed queries.
There are really used these dates, probably calculated from run_every settings.
FYI - run_every is set to 1 minute in this case.

  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "range": {
                "@timestamp": {
                  "gt": "2021-02-25T16:35:30.857161Z",
                  "lte": "2021-02-25T16:35:34.273806Z"
                }
              }
            },

I see that there exists function called set_starttime and it seems that it takes into validation timeframe or buffer_time but only when rule don't have option use_count_query or use_terms_query - link. If the rule have one of these options the start time is set:

Based on run_every configuration option when scan_entire_timeframe isn't set in rule.
Based on timeframe interval if scan_entire_timeframe is set in rule.

Therefore the solution is to set scan_entire_timeframe in rule.

nsano-rururu mentioned this issue Nov 18, 2020

Please don't let this die. ElastAlert usecase is very strong. #2947

Open

findmyname666 mentioned this issue Mar 2, 2021

use rule timeframe when scan_entire_timeframe is set #3141

Open

rschirin mentioned this issue Jan 31, 2023

fix scan timeframe with use_terms_query jertel/elastalert2#1097

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike alert not using specified timeframe for querying event counts #3041

Spike alert not using specified timeframe for querying event counts #3041

piehld commented Nov 16, 2020

nsano-rururu commented Nov 17, 2020

piehld commented Nov 17, 2020

nsano-rururu commented Nov 18, 2020

piehld commented Nov 18, 2020

findmyname666 commented Feb 25, 2021 •

edited

Loading

Spike alert not using specified timeframe for querying event counts #3041

Spike alert not using specified timeframe for querying event counts #3041

Comments

piehld commented Nov 16, 2020

nsano-rururu commented Nov 17, 2020

piehld commented Nov 17, 2020

nsano-rururu commented Nov 18, 2020

piehld commented Nov 18, 2020

findmyname666 commented Feb 25, 2021 • edited Loading

findmyname666 commented Feb 25, 2021 •

edited

Loading