Checks against projected future values #1

bittrance · 2017-01-30T22:27:30Z

Support alerting against a linear projection into the future of the time series retrieved from Graphite. E.g. --projection 30min means that a future value will be calculated using linear regression and that value will be offered to nagios_check for evaluation against warning/critical thresholds. The check also presents the p-value of that projection so that ops people has a chance of judging whether the alarm is the result of noisy data.

Using a simple projection is superior in some use cases where the rate of change is relatively stable, because it saves us from having to estimate the rate of change. Instead we can just say e.g.: "Give me 6 hours advance notice on the disk filling up."

iconara · 2017-01-31T11:36:38Z

check_graphite.gemspec

@@ -20,6 +20,7 @@ Gem::Specification.new do |s|

  # specify any dependencies here; for example:
  # s.add_development_dependency "rspec"
+  s.add_runtime_dependency "linear-regression"


How much code would it add if the linear regression code was inlined (or rewritten in this gem)? I'm asking mostly so that we don't add dependencies unless absolutely necessary.

Hm, philosophy. Define absolutely necessary. (If it added significant code, I would not have used the external module.) You are right however that the module needs to have its version fixed in the gemspec.

iconara · 2017-01-31T11:37:19Z

lib/attime.rb

+    raise "Unparseable time period #{text}" unless match
+    sign, scalar, unit = match[1..3]
+    raise "Missing scalar in time #{text}" if scalar == ""
+    raise "Missing unit in #{text}" if unit == ""


Use scalar.empty? instead of comparing to an empty string.

iconara · 2017-01-31T11:39:11Z

lib/attime.rb

+      lookup(x)
+    else
+      x
+    end


I would rewrite like this:

case x when String lookup(x) when nil raise "Bad unit #{unit}" else x end

to avoid a very easy-to-overlook trailing unless on the line just before the if.

Absolutely. Thx.

iconara · 2017-01-31T11:44:32Z

lib/attime.rb

+    "year" => 365 * DAY,
+    "years" => "year",
+  }
+  EXPR = /([+-])?([0-9]*)(.*)/


Shouldn't this be /\A([+-])?([0-9]*)(.*)\Z/ (i.e. anchored at the start and end of the string so that "1s 2d 3y" doesn't match?

The current implementation will yield "Bad unit s 2d 3y" which I feel is a good enough error message for this. .* at end is already anchored to end of input (unless there are newlines). However, the current implementation does not handle floats: 1.5days => Bad unit .5days which are easier to support than fail intelligently. Fixed.

iconara · 2017-01-31T11:48:16Z

lib/attime.rb

+    raise "Missing scalar in time #{text}" if scalar == ""
+    raise "Missing unit in #{text}" if unit == ""
+    time + Integer(scalar) * lookup(unit) * (sign == "-" ? -1 : 1)
+  end


I'm on the fence about trailing ifs and unlesss. I think they have their place and can add readability, but using them too much in a method, especially with calculations in between, IMO makes the code noisy.

Why does the expression even allow an empty scalar and/or unit? Does it really add that much to the error message to differentiate between the cases? Could it be enough to just say that the input is malformed when the regex doesn't match?

Yes, giving intelligent error messages on option parsing for data where it is not trivial to know what is possible is critical.

iconara · 2017-01-31T15:25:16Z

lib/check_graphite/projection.rb

+      p = Regression::CorrelationCoefficient.new(xs, ys).pearson
+      store_value options.name, value
+      # Unfortunately, nagios_check converts both primary and
+      # secondary values to float. Hence manual assignment instead:


I think this explanation belongs in a test and/or a commit message. Comments usually lie because they remain after the code is changed.

I disagree. While comments should be used sparingly, If I break or circumvent a contract, I need to explain why. If you try to change this to store_value 'p-value', p.abs you will get an exception from the undefined test. Creating a test with the title "make sure the contract is circumvented" is just guessing on the reason for a future breakage; there is no way a test can be that surgical.

If a developer disregards an immediately adjacent comment, that is no different from disregarding an immediately adjacent line of code. I don't really see how a review can objectively object to a useful comment when the review could equally have objected to that comment if it was orphaned.

iconara · 2017-01-31T15:26:09Z

lib/check_graphite/projection.rb

+    def self.included(base)
+      base.on '--projection FUTURE_TIMEFRAME', :default => '2days' do |timeframe|
+        options.send('processor=', method(:projected_value))
+        options.send('timeframe=', timeframe)


The Ruby-ish thing to do is to use :processor= and :timeframe= (i.e. symbols not strings when referring to methods).

Indeed. Fixed.

iconara · 2017-01-31T15:29:11Z

spec/attime_spec.rb

+  end
+
+  def timedelta(n)
+    Time.now + n


Let's hope there's no GC pauses during the test run... since .attime takes the current time as second argument I think you should have a let :now and use that here and in the .attime call to make sure that time doesn't elapse during the test run.

It hardly takes 0.9 secs to GC a test execution? But you are right, it makes for better specs. In fact, attime() should probably require the time since that is how it is used in production code.

iconara · 2017-01-31T15:30:49Z

spec/attime_spec.rb

@@ -0,0 +1,31 @@
+require 'attime'
+
+describe '#attime' do


I think this should be:

describe CheckGraphite do describe '.attime' do

or

describe 'CheckGraphite.attime' do

I.e. the description should include the module/class name, and in Ruby you refer to class/module methods as .foo and only instance methods as #foo.

Survived from a previous incarnation; changed to second version.

iconara · 2017-01-31T15:34:19Z

spec/projection_spec.rb

+      })
+      c = CheckGraphite::Command.new
+      STDOUT.should_receive(:puts).with(match(/ze-name.*in 5min.*p-value=0.9/))
+      lambda { c.run }.should raise_error SystemExit


I kind of get why this is needed, but boy is this a brittle test... both the stub of stdout and then rescuing SystemExit... It's not very much less ugly, but perhaps Kernel.should_receive(:exit).with(0)?

This is copied from check_graphite_spec.rb. I do not approve of the style, but solving the i/o problem would require to introduce a way to inject a StringIO. raise_error SystemExit is not so fragile because you will probably notice if it is not satisfied (but not_to raise_error SystemExit would be essentially worthless).

iconara · 2017-01-31T15:35:59Z

In general this is great 👍 , but I had a few comments.

…into the future

- support float timeframes e.g. --projection 1.5days - style improvements as suggested by iconara on PR #1

bittrance · 2017-01-31T22:50:29Z

Thanks for taking the time to review this. It was concieved in relative haste and is somewhat shoddy.

iconara

I'm missing a test.

iconara · 2017-02-04T15:27:44Z

lib/check_graphite/projection.rb

+       if v.nan?
+         'undefined'
+       else
+         BigDecimal.new(v, 3).to_s('F')


This thing needs a test. It needs to be documented why the formatting uses three significant digits (and not five, three or 99). Also, why not for example sprintf('%.3f', v) (because I assume this method only ever expects numbers in the 0.0-1.0 range)?

This line has actually been bugging me for a few days, but it was not until today when it just popped into my head that I realised why. It's one of those things that you kind of gloss over, like "yeah, sure and then there's some formatting". But it's not just formatting, it's very deliberate formatting, but with no test to motivate it.

sprintf() can only do number of decimals, but here we deal with numbers that can be any float, so there is no way to know how many decimals are relevant, if any. Thus we instead do number of significant digits. Why 3? It works for our use case; it will need to be configurable eventually, but I think it would be a rare use case that cares about changes < 1/1000. I don't think the helper merits being named a test unit in its own right. However, this means that the "integration" tests that tests against the full output should check that the number is formatted correctly. Fixed.

It turned out to make for messy rebases, so this is actually fixed on b2b1738 not in this PR.

- support float timeframes e.g. --projection 1.5days - style improvements as suggested by iconara on PR #1

bittrance changed the title ~~Checks agains projected future values~~ Checks against projected future values Jan 30, 2017

iconara reviewed Jan 31, 2017

View reviewed changes

Bittrance added 2 commits January 31, 2017 22:07

Allow controlling how retrieved datapoints are processed

42fb3aa

Check warn/crit against a linear projection of retrieved data series …

c377ea2

…into the future

bittrance force-pushed the linear-regression-projection-checks branch from 945b80d to 1bb2015 Compare January 31, 2017 22:42

bittrance pushed a commit that referenced this pull request Jan 31, 2017

- lock linear-regression to 0.0.2

1bb2015

- support float timeframes e.g. --projection 1.5days - style improvements as suggested by iconara on PR #1

iconara suggested changes Feb 4, 2017

View reviewed changes

bittrance force-pushed the linear-regression-projection-checks branch from 1bb2015 to 3e6219e Compare February 5, 2017 23:01

bittrance pushed a commit that referenced this pull request Feb 5, 2017

- lock linear-regression to 0.0.2

3e6219e

- support float timeframes e.g. --projection 1.5days - style improvements as suggested by iconara on PR #1

- lock linear-regression to 0.0.2

a42d93b

- support float timeframes e.g. --projection 1.5days - style improvements as suggested by iconara on PR #1

bittrance force-pushed the linear-regression-projection-checks branch from 3e6219e to a42d93b Compare February 5, 2017 23:07

bittrance merged commit 7145d73 into master Feb 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checks against projected future values #1

Checks against projected future values #1

bittrance commented Jan 30, 2017 •

edited

Loading

iconara Jan 31, 2017

bittrance Jan 31, 2017

iconara Jan 31, 2017

bittrance Jan 31, 2017

iconara Jan 31, 2017

bittrance Jan 31, 2017

iconara Jan 31, 2017

bittrance Jan 31, 2017

iconara Jan 31, 2017

bittrance Jan 31, 2017

iconara Jan 31, 2017

bittrance Jan 31, 2017

iconara Jan 31, 2017

bittrance Jan 31, 2017

iconara Jan 31, 2017

bittrance Jan 31, 2017

iconara Jan 31, 2017

bittrance Jan 31, 2017

iconara Jan 31, 2017

bittrance Jan 31, 2017

iconara commented Jan 31, 2017

bittrance commented Jan 31, 2017

iconara left a comment

iconara Feb 4, 2017

bittrance Feb 6, 2017

bittrance Feb 6, 2017

Checks against projected future values #1

Checks against projected future values #1

Conversation

bittrance commented Jan 30, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iconara commented Jan 31, 2017

bittrance commented Jan 31, 2017

iconara left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bittrance commented Jan 30, 2017 •

edited

Loading