Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An issue that was a warning that goes to critical doesn't send the resolution state #30

Open
DaveWitchalls opened this issue Aug 10, 2016 · 19 comments

Comments

@DaveWitchalls
Copy link

I think this is semi by design, but any views appreciated.

We've come across this during testing. Using CPU use as an example;

If we set a warning threshold of 50%, an email gets sent. If it then goes to Critical at 70% the alert in PagerDuty is triggered. The problem for us is, if the CPU goes back to below the critical threshold, but still within the warning threshold, the resolution isn't sent PagerDuty as the incident is still live, even though not in a critical state.

Any thoughts or workarounds for this?

Thanks,
Dave.

@majormoses
Copy link
Member

Hmm I am seeing the same and need to spend some time looking into this. I wonder if maybe we can work around using event suppression rules?

@majormoses
Copy link
Member

Taking a look these are the only valid actions: https://sensuapp.org/docs/0.25/reference/events.html#how-are-sensu-events-created

  • create
  • resolve
  • flapping

I honestly have not played much with flapping in sensu but this might help with some of the situations but not all. By default a handler has handle_flapping to true. https://sensuapp.org/docs/0.24/reference/handlers.html#handler-configuration

@majormoses
Copy link
Member

I checked on the pagerduty side and event suppression can only be done on initial ingest.

@majormoses
Copy link
Member

@eheydrick what are your thoughts? This is not really pagerduty specific nor can I think of a pagerduty specific work around.

@majormoses
Copy link
Member

I checked with pageryduty and event spurpession will not serve as a work around. Unless we want to resolve on state change (I am torn on this). I think we might want to pose a generic question to the sensu community and see what people think.

@DaveWitchalls
Copy link
Author

Hi,

The workaround we had to go for was to include the resolved status in warning alerts. Far from ideal, but it does the job!

Dave.

@majormoses
Copy link
Member

majormoses commented Mar 27, 2017

@DaveWitchalls did you apply this via a filter or mutator? I have not had any time to really look into this much and would like to try out your workaround and see if it works for us.

@DaveWitchalls
Copy link
Author

DaveWitchalls commented Mar 28, 2017

Hi @majormoses

I asked the guy who did the work for me and got the below, it's reasonably long winded as it doesn't make much sense out of context, hope it helps!

NOTES ABOUT OUR USE OF SENSU

  1. By default we only send alerts if the occurrences count reaches 5, however this is configurable using the "occurrences" check variable.
  2. By default we send reminder emails if the occurrences count is divisible by 20, again this is configurable using a "remind_every" custom check variable.
  3. PagerDuty handles it's own reminders/escalations.
  4. We send all alerts (WARNING/CRITICAL/RESOLVED) via email.
  5. We only send CRITICAL alerts for specific checks via PagerDuty (as defined by the pagerduty_alert_filter).

Therefore if an check state changes from CRITICAL to WARNING we use a mutator to generate a fake RESOLVE message to be sent to PagerDuty to clear the alert.
In this situation PagerDuty will send a RESOLVED alert which will show the new status text as WARNING not OK, an email will also be sent showing the status text as WARNING.

For checks which only require email alerts use the following handlers list:
"handlers": ["default", "mail_alert_handler", "mail_recovery_handler", "mail_resolve_on_warning_handler"]

For checks which require both email and PagerDuty alerts (critical only) use the following handlers list:
"handlers": ["default", "mail_alert_handler", "mail_recovery_handler", "mail_resolve_on_warning_handler", "pagerduty_alert_handler", "pagerduty_recovery_handler", "pagerduty_resolve_on_warning_handler"]

FILTER CONFIGURATION

/etc/sensu/conf.d/filters/alert_filters.json

NOTE the use of a custom key/value pair - remind_every - which is setup on the check. If not set it will default to emailing reminders every 20 occurrences.

{
  "filters": {
    "mail_alert_filter": {
      "negate": false,
      "attributes": {
        "action": "create",
        "occurrences": "eval: value == :::check.occurrences|5::: || value % :::check.remind_every|20::: == 0"
      }
    },
    "pagerduty_alert_filter": {
      "negate": false,
      "attributes": {
        "check": {
          "status": 2
        },
        "action": "create",
        "occurrences": "eval: value == :::check.occurrences|5:::"
      }
    }
  }
}

/etc/sensu/conf.d/filters/recovery_filters.json

{
  "filters": {
    "recovery_filter": {
      "negate": false,
      "attributes": {
        "action": "resolve",
        "occurrences": "eval: value >= :::check.occurrences|5:::"
      }
    },
    "resolve_on_warning_filter": {
      "negate": false,
      "attributes": {
        "check": {
          "status": 1
        },
        "action": "create",
        "occurrences": "eval: value == 1"
      }
    }
  }
}

HANDLER CONFIGURATION

/etc/sensu/conf.d/handlers/mail_handlers.json

{
  "handlers": {
    "mail_alert_handler": {
      "type": "pipe",
      "command": "handler-mailer.rb -s 'SENSU TEST'",
      "filter": "mail_alert_filter"
    },
    "mail_recovery_handler": {
      "type": "pipe",
      "command": "handler-mailer.rb -s 'SENSU TEST'",
      "filter": "recovery_filter"
    },
    "mail_resolve_on_warning_handler": {
      "type": "pipe",
      "command": "handler-mailer.rb -s 'SENSU TEST'",
      "filter": "resolve_on_warning_filter",
      "mutator": "mail_resolve_on_warning_mutator"
    },
    "mail_handler": {
      "type": "pipe",
      "command": "handler-mailer.rb -s 'SENSU TEST'"
    }
  },
  "mailer": {
    "admin_gui": "https://xxx.xxx.xxx.xxx/",
    "mail_from": "sensu@xxxxxxxxx.xxxxxxxxx.xxxxxxxxx",
    "mail_to": ["xxxxxxxxx@xxxxxxxxx.xxxxxxxxx"],
    "smtp_address": "127.0.0.1",
    "smtp_port": "25",
    "smtp_domain": "xxxxxxxxx.xxxxxxxxx"
  }
}

/etc/sensu/conf.d/handlers/pagerduty_handlers.json

{
  "handlers": {
    "pagerduty_alert_handler": {
      "type": "pipe",
      "command": "handler-pagerduty.rb",
      "filter": "pagerduty_alert_filter"
    },
    "pagerduty_recovery_handler": {
      "type": "pipe",
      "command": "handler-pagerduty.rb",
      "filter": "recovery_filter"
    },
    "pagerduty_resolve_on_warning_handler": {
      "type": "pipe",
      "command": "handler-pagerduty.rb",
      "filter": "resolve_on_warning_filter",
      "mutator": "pagerduty_resolve_on_warning_mutator"
    },
    "pagerduty_handler": {
      "type": "pipe",
      "command": "handler-pagerduty.rb"
    }
  },
  "pagerduty": {
    "api_key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  }
}

MUTATOR CONFIGURATION

/etc/sensu/conf.d/mutators/mail_mutators.json

{
  "mutators": {
    "mail_resolve_on_warning_mutator": {
      "command": "/etc/sensu/mutators/mutator-mail-resolve-on-warning.rb"
    }
  }
}

/etc/sensu/conf.d/mutators/pagerduty_mutators.json

{
  "mutators": {
    "pagerduty_priority_override_mutator": {
      "command": "mutator-pagerduty-priority-override.rb"
    },
    "pagerduty_resolve_on_warning_mutator": {
      "command": "/etc/sensu/mutators/mutator-pagerduty-resolve-on-warning.rb"
    }
  }
}

MUTATORS

/etc/sensu/mutators/mutator-mail-resolve-on-warning.rb

#!/usr/bin/env ruby

require 'json'

module Sensu
  module Mutator
    class Mail
      class ResolveOnWarning
        def execute(input = STDIN)
          event = JSON.parse(input.read, symbolize_names: true)
          occurrences = event[:check][:occurrences]
          history = event[:check][:history]
          ### EDGE CASE CHECKS ###
          # Check if number of occurrences is > 20 (max history length) - if so change it to 20.
          occurrences = 20 if occurrences > 20
          # Exit if length of history < occurrences + 1 as it can't have been in an alert state beforehand.
          exit 1 if history.length < (occurrences+1)
          ########################
          test_array = Array.new(occurrences, "2")
          if history[0-(occurrences+1), occurrences] == test_array
            event[:action] = 'resolve'
            event[:check][:occurrences] = 1
            JSON.dump(event)
          else
            exit 1
          end
        end
      end
    end
  end
end

## Is called from Gem script. Program name is full path to this script
### __FILE__ is the initial script ran, which is
### /etc/sensu/mutators/mutator-mail-resolve-on-warning.rb
if $PROGRAM_NAME.include?(__FILE__.split('/').last)
  mutator = Sensu::Mutator::Mail::ResolveOnWarning.new
  puts mutator.execute
end

/etc/sensu/mutators/mutator-pagerduty-resolve-on-warning.rb

#!/usr/bin/env ruby

require 'json'

module Sensu
  module Mutator
    class PagerDuty
      class ResolveOnWarning
        def execute(input = STDIN)
          event = JSON.parse(input.read, symbolize_names: true)
          occurrences = event[:check][:occurrences]
          history = event[:check][:history]
          ### EDGE CASE CHECKS ###
          # Check if number of occurrences is > 20 (max history length) - if so change it to 20.
          occurrences = 20 if occurrences > 20
          # Exit if length of history < occurrences + 1 as it can't have been in an alert state beforehand.
          exit 1 if history.length < (occurrences+1)
          ########################
          test_array = Array.new(occurrences, "2")
          if history[0-(occurrences+1), occurrences] == test_array
            event[:action] = 'resolve'
            event[:check][:occurrences] = 1
            JSON.dump(event)
          else
            exit 1
          end
        end
      end
    end
  end
end

## Is called from Gem script. Program name is full path to this script
### __FILE__ is the initial script ran, which is
### /etc/sensu/mutators/mutator-pagerduty-resolve-on-warning.rb
if $PROGRAM_NAME.include?(__FILE__.split('/').last)
  mutator = Sensu::Mutator::PagerDuty::ResolveOnWarning.new
  puts mutator.execute
end

@majormoses
Copy link
Member

@DaveWitchalls thanks for the info I will take a look and see if I can find some bastard amalgamation based on yours that works for us.

@majormoses
Copy link
Member

majormoses commented Mar 28, 2017

interesting, though considering this I am not sure if its worth the effort for me right now to use a filter now that occurances is an extenstion: https://github.com/sensu-extensions/sensu-extensions-occurrences

@portertech
Copy link

Sensu events now have a "occurrences_watermark", the Sensu built-in "occurrence" filter now uses it instead of "occurrences" for the purpose of the resolve action. These changes are in the Sensu Core 0.29 release.

@majormoses
Copy link
Member

ok, cool we can have a reasonable path forward when 0.29 is supported by this plugin...

@kshep
Copy link

kshep commented Sep 26, 2017

ok, cool we can have a reasonable path forward when 0.29 is supported by this plugin...

@majormoses Any thoughts on whether/when that might happen? (for the record, Sensu skipped from 0.29 to 1.0.0 and subsequently 1.0.2 in July)

@majormoses
Copy link
Member

@majormoses Any thoughts on whether/when that might happen? (for the record, Sensu skipped from 0.29 to 1.0.0 and subsequently 1.0.2 in July)

When someone is motivated enough and has the time to work on it. There are ~200 plugins and realistically 2 active maintainers (neither of us working for sensu) and we rely mostly on other community members to contribute.

Now beyond my boilerplate 🤷‍♂️ answer of when...I don't think it would be too hard taking a quick look at the plugin.

@joshbenner
Copy link

Does this plugin not work with Sensu >= 0.29? Or is there just work required to support changes related specifically to this issue?

@majormoses
Copy link
Member

This plugin does work with all recent versions of sensu, I am currently running sensu 1.1.1 and do not have issues. The comment referenced is regarding a work around for moving from warning -> critical -> warning. The idea was to auto resolve the incident and create a new one based on occurrences_watermark. This is still a hack and I do see that pagerduty does potentially have a better solution now: https://www.pagerduty.com/blog/dynamic-notifications but I have not looked into that enough to know what the limitations are.

@ashleyabrooks
Copy link

Hello from the PagerDuty Team! Following up to confirm this is a limitation in the PagerDuty API rather than in the Sensu integration itself. At the moment, incidents are immutable so the parent incident can't be updated when the severity changes. You can click into the newest alert itself to get the latest data.

I've submitted a feature request from the maintainer of this integration so our product team knows mutable incidents are important to our customers. If you have any questions, please feel free to reach out to support@pagerduty.com.

@majormoses
Copy link
Member

Thanks for confirming this.

@joe-armstrong
Copy link

I recently did some testing with this issue and it looks like PagerDuty has updated their API to allow for escalating an incident from a warning/low urgency to critical/high if the Dynamic notifications based on alert severity option is used in the Assign and Notify configuration for a service.

For testing we triggered an alert start as a warning starting a low priority incident in PagerDuty. I then changed the alert to the critical threshold which escalated the incident to high urgency in PagerDuty. One thing to note is that it will not de-escalate back to a low urgency incident.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants