Initial stab at a new aggregate output check. #52

icebourg · 2018-01-17T23:21:03Z

Pull Request Checklist

Is this in reference to an existing issue?
This is related to issue #19, check-aggregate.rb does not work with newer Sensu servers to show collected output. That script looks like it has grown quite a lot over time and I don't use most of those features, so I wanted to get something out that supports output and some minimum number of default options. To that end, I've created this as a new check so that it won't break anybody that is currently relying on the features in the current check.

General

Update Changelog following the conventions laid out on Our CHANGELOG Guidelines
Update README with any necessary configuration snippets
[N/A] Binstubs are created if needed
RuboCop passes
Existing tests pass

New Plugins

Tests
Add the plugin to the README
Does it have a complete header as outlined here

Purpose

Very similar to check-aggregate.rb but it actually works with newer versions of the Sensu API to support output. It does not support all of the same features of check-aggregate.rb mostly because I don't use all of those features and some of the check-aggregate.rb code depends on API functionality that no longer exists. (such as the summarize feature) I'm not opposed to adding those features to this check, and have tried to leave room to enable that to be simple, but I have not taken that on in this PR.

Known Compatibility Issues

Requires Sensu 0.24 or greater

icebourg · 2018-01-17T23:44:15Z

Here's an example of it running against my staging sensu:

./check-aggregate-output.rb -a http://staging-sensu:4567 -c web_health --critical_count 3 -A 180 --message "Web Server Health Check"
CheckAggregateOutput OK: Web Server Health Check

majormoses · 2018-01-18T02:36:36Z

Thanks for talking through some of the stuff in slack and submitting this. I will take a closer look shortly.

majormoses

Overall I think this looks good, I do have some suggestions some in the form of preventing bugs others are things that we can do to enhance the check.

majormoses · 2018-01-18T02:48:24Z

bin/check-aggregate-output.rb

+         description: 'Collects all non-ok outputs',
+         default: true
+
+  option :warning,


While I realize this is following the existing pattern it might be a good time to make some would be breaking changes and make this a bit better. Rather than having four options I believe we can reduce this by having --warning, --critical, and --alert-on which we can default to whatever the community thinks makes the most sense. This reduces the required knobs to get the check working.

I read through this again and maybe it does make sense as is so that you can monitor both. For example you might want to alert on one handler definition for SLA threshold but another for do early detection. This really is expanding on the differences between severity and urgency. I do think that it would still be better served as 2 seperate check definitions as it provides more control and clarity to operators.

majormoses · 2018-01-18T02:57:43Z

bin/check-aggregate-output.rb

+         short: '-C PERCENT',
+         long: '--critical PERCENT',
+         description: 'PERCENT critical before critical (can be change with --ignore-severity)',
+         proc: proc(&:to_i)


same comment regarding floats vs integers.

majormoses · 2018-01-18T03:00:50Z

bin/check-aggregate-output.rb

+         short: '-W PERCENT',
+         long: '--warning PERCENT',
+         description: 'PERCENT warning before warning (can be change with --ignore-severity)',
+         proc: proc(&:to_i)


I think these should be floats rather than integers so that you can report and alert on SLA figures which are usually represented in the number of 9's. 99% of 1,000,000 events is not the same as 99.99% of the same number of events.

majormoses · 2018-01-18T03:05:16Z

bin/check-aggregate-output.rb

+         description: 'PERCENT warning before warning (can be change with --ignore-severity)',
+         proc: proc(&:to_i)
+
+  option :warning_count,


While I realize this is following the existing pattern it might be a good time to make some would be breaking changes and make this a bit better. Rather than having four options I believe we can reduce this by having --warning, --critical, and something like --alert-on to specify count or percentages. I would suggest default of percentages as we can make better sane defaults for small and larger setups. This reduces the required knobs to get the check working.

majormoses · 2018-01-18T03:05:35Z

bin/check-aggregate-output.rb

+         description: 'PERCENT critical before critical (can be change with --ignore-severity)',
+         proc: proc(&:to_i)
+
+  option :critical_count,


same comment about combining options.

majormoses · 2018-01-18T03:10:17Z

bin/check-aggregate-output.rb

+                                                                verify_ssl: verify_mode)
+    JSON.parse(request.get)
+  rescue Errno::ECONNREFUSED
+    warning 'Connection refused'


You don't need to change anything here but in general it is better to return a status and a message from a function rather than executing immediately as it makes it harder to write tests for small units of code.

majormoses · 2018-01-18T03:23:20Z

bin/check-aggregate-output.rb

+  end
+
+  def output_results
+    severities = %w(critical warning unknown)


might make more sense to leverage something like this for future proofing new severities, unlike the previous one we would want to reject ok. While this is unlikely to occur in the short term as sensu has adhered to the check spec of nagios. With sensu 2.x on the horizon I believe they will likely enhance upon the nagios check spec at some point in the future.

majormoses · 2018-01-18T03:23:47Z

bin/check-aggregate-output.rb

+
+  def aggregate_results
+    results = api_request("/aggregates/#{config[:check]}?max_age=#{config[:age]}")['results']
+    warning "No aggregates found in last #{config[:age]} seconds" if %w(ok warning critical unknown).all? { |x| results[x].zero? }


might make more sense to leverage something like this for future proofing new severities. While this is unlikely to occur in the short term as sensu has adhered to the check spec of nagios with sensu 2.x on the horizon it's possible they might want to enhance on top of the nagios check spec.

majormoses · 2018-09-10T07:39:31Z

closing due to lack of response, to get this through you can either re-open the PR or open a new one.

icebourg added 2 commits January 17, 2018 16:06

Initial stab at a new aggregate output check.

0c5e530

Update readme and changelog.

1515f6a

Fix rubocop issue.

1720409

majormoses self-requested a review January 18, 2018 02:35

majormoses added Type: Enhancement Status: Needs Review labels Jan 18, 2018

majormoses requested changes Jan 18, 2018

View reviewed changes

majormoses added Status: Awaiting Response and removed Status: Needs Review labels Jan 18, 2018

mbbroberg removed the Status: Awaiting Response label Feb 20, 2018

majormoses added the Status: Awaiting Response label Mar 20, 2018

majormoses closed this Sep 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial stab at a new aggregate output check. #52

Initial stab at a new aggregate output check. #52

icebourg commented Jan 17, 2018

icebourg commented Jan 17, 2018 •

edited

Loading

majormoses commented Jan 18, 2018

majormoses left a comment

majormoses Jan 18, 2018

majormoses Jan 18, 2018

majormoses Jan 18, 2018

majormoses Jan 18, 2018 •

edited

Loading

majormoses Jan 18, 2018

majormoses Jan 18, 2018

majormoses Jan 18, 2018

majormoses Jan 18, 2018

majormoses Jan 18, 2018

majormoses commented Sep 10, 2018

Initial stab at a new aggregate output check. #52

Initial stab at a new aggregate output check. #52

Conversation

icebourg commented Jan 17, 2018

Pull Request Checklist

General

New Plugins

Purpose

Known Compatibility Issues

icebourg commented Jan 17, 2018 • edited Loading

majormoses commented Jan 18, 2018

majormoses left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

majormoses Jan 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

majormoses commented Sep 10, 2018

icebourg commented Jan 17, 2018 •

edited

Loading

majormoses Jan 18, 2018 •

edited

Loading