Using Elastalert to notify us when "Service are back online" #1919

Purfakt · 2018-09-20T07:57:18Z

I created a flatline type rule that sends an email when there is no more incoming message from a triplet that represent one of our service (this one works) :

name: Service Down
type: flatline

index: 'ourindex'

timestamp_field: 'ourtimestamp'
timestamp_type: unix_ms

query_key: ["s_serviceInfo-instanceId", "s_serviceInfo-replicaId", "s_serviceInfo-serviceName"]

realert:
  hours: 1

filter:
  - query_string:
      query: "s_qs_item-name: machine.cpu"

timeframe:
  minutes: 5

threshold: 1

alert:
  - email
from_addr: "alert@domain.com"
email: "me@domain.com"

alert_subject: "Service down on {0}"
#key represent the query key
alert_subject_args: 
  - "key"

alert_text: |
    Service not answering on {0} at {1}
alert_text_args: 
  - "key"
  - "ourtimestamp"

alert_text_type: alert_text_only

Now I need to create an alarm to notify us when this service is back online.
I thought I'd be smart with creating a new_term with term_window_size that matches the timeframe of the flatline alarm:

name: Service up

type: new_term

index: 'ourindex'

timestamp_field: 'ourtimestamp'
timestamp_type: unix_ms

fields: 
  - "s_serviceInfo-instanceId" 
  - "s_serviceInfo-replicaId" 
  - "s_serviceInfo-serviceName"

query_key: ["s_serviceInfo-instanceId", "s_serviceInfo-replicaId", "s_serviceInfo-serviceName"]

realert:
  hours: 1

filter:
  - query_string:
      query: "s_qs_item-name: machine.cpu"

terms_window_size:
  minutes: 5

window_step_size:
  minutes: 1

alert:
  - email
from_addr: "alert@domain.com"
email: "me@domain.com"

alert_subject: "Service up on {0}"
alert_subject_args: 
  - "s_serviceInfo-instanceId"

alert_text: |
    Service is now up and running on {0}, {1} at {2}
alert_text_args: 
  - "s_serviceInfo-instanceId"
  - "s_serviceInfo-replicaId"
  - "ourtimestamp"

alert_text_type: alert_text_only

Obviously, I'm either misunderstanding something or at least misusing it because the first alarm is working great on all services but the second one only triggers when it is first added to the rules folder. There is 0 match and the rule isn't silenced.

What am I doing wrong? Is there a less convoluted way to achieve this?

The text was updated successfully, but these errors were encountered:

Qmando · 2018-09-20T17:55:40Z

new_term is used for alerting only the first time a new value appears, so I don't think it's right for this purpose. Unfortunately there's not a nice mechanism to do this, but there is a slightly less convoluted way. You can create a flatline on the flatline itself matching.

Roughly, something like this:

type: flatline
index: elastalert_status # (may be different for you)
filter:
 - term:
      rule_name: "Service Down"
 - term:
      _type: elastalert # (not needed in ES 6)
forget_keys: true
timeframe:
  minutes: 70
threshold: 1

IE, "Alert if 'Service Down' hasn't alerted in at least 70 minutes". forget_keys will cause it to only alert once after the 'service down' alerts stop, until it happens again.

Purfakt · 2018-09-21T08:31:51Z

Thank you for the quick answer! I like the emphasis on the "slightly less convoluted". So I see how this would work but there is one big problem : See how I am using a triplet for the fields?

fields: 
  - "s_serviceInfo-instanceId" 
  - "s_serviceInfo-replicaId" 
  - "s_serviceInfo-serviceName"

While I was looking at the elastalert_status index, I found no way of telling which service is down or back up.
So what happen when several services are going down and only a part of them are back up? Or if a service A goes down, then back up, but B goes down, the alert won't be triggered as more "Service Down" alerts will be sent.

Purfakt · 2018-09-26T07:46:26Z

Hi,
Is there any update on this issue?
Thanks in advance!

damioune123 · 2019-02-14T15:24:26Z

up

damioune123 mentioned this issue Feb 28, 2019

Es6 writebackindex fix #2153

Closed

damioune123 mentioned this issue Mar 13, 2019

ES6 writeback index fix + extra features #2168

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Elastalert to notify us when "Service are back online" #1919

Using Elastalert to notify us when "Service are back online" #1919

Purfakt commented Sep 20, 2018 •

edited

Loading

Qmando commented Sep 20, 2018 •

edited

Loading

Purfakt commented Sep 21, 2018 •

edited

Loading

Purfakt commented Sep 26, 2018

damioune123 commented Feb 14, 2019

Using Elastalert to notify us when "Service are back online" #1919

Using Elastalert to notify us when "Service are back online" #1919

Comments

Purfakt commented Sep 20, 2018 • edited Loading

Qmando commented Sep 20, 2018 • edited Loading

Purfakt commented Sep 21, 2018 • edited Loading

Purfakt commented Sep 26, 2018

damioune123 commented Feb 14, 2019

Purfakt commented Sep 20, 2018 •

edited

Loading

Qmando commented Sep 20, 2018 •

edited

Loading

Purfakt commented Sep 21, 2018 •

edited

Loading