Skip to content
This repository has been archived by the owner on Mar 21, 2023. It is now read-only.

Checks against projected future values #1

Merged
merged 3 commits into from
Feb 6, 2017

Conversation

bittrance
Copy link

@bittrance bittrance commented Jan 30, 2017

Support alerting against a linear projection into the future of the time series retrieved from Graphite. E.g. --projection 30min means that a future value will be calculated using linear regression and that value will be offered to nagios_check for evaluation against warning/critical thresholds. The check also presents the p-value of that projection so that ops people has a chance of judging whether the alarm is the result of noisy data.

Using a simple projection is superior in some use cases where the rate of change is relatively stable, because it saves us from having to estimate the rate of change. Instead we can just say e.g.: "Give me 6 hours advance notice on the disk filling up."

@bittrance bittrance changed the title Checks agains projected future values Checks against projected future values Jan 30, 2017
@@ -20,6 +20,7 @@ Gem::Specification.new do |s|

# specify any dependencies here; for example:
# s.add_development_dependency "rspec"
s.add_runtime_dependency "linear-regression"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much code would it add if the linear regression code was inlined (or rewritten in this gem)? I'm asking mostly so that we don't add dependencies unless absolutely necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, philosophy. Define absolutely necessary. (If it added significant code, I would not have used the external module.) You are right however that the module needs to have its version fixed in the gemspec.

lib/attime.rb Outdated
raise "Unparseable time period #{text}" unless match
sign, scalar, unit = match[1..3]
raise "Missing scalar in time #{text}" if scalar == ""
raise "Missing unit in #{text}" if unit == ""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use scalar.empty? instead of comparing to an empty string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

lib/attime.rb Outdated
lookup(x)
else
x
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rewrite like this:

case x
when String
  lookup(x)
when nil
  raise "Bad unit #{unit}"
else
  x
end

to avoid a very easy-to-overlook trailing unless on the line just before the if.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. Thx.

lib/attime.rb Outdated
"year" => 365 * DAY,
"years" => "year",
}
EXPR = /([+-])?([0-9]*)(.*)/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be /\A([+-])?([0-9]*)(.*)\Z/ (i.e. anchored at the start and end of the string so that "1s 2d 3y" doesn't match?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation will yield "Bad unit s 2d 3y" which I feel is a good enough error message for this. .* at end is already anchored to end of input (unless there are newlines). However, the current implementation does not handle floats: 1.5days => Bad unit .5days which are easier to support than fail intelligently. Fixed.

lib/attime.rb Outdated
raise "Missing scalar in time #{text}" if scalar == ""
raise "Missing unit in #{text}" if unit == ""
time + Integer(scalar) * lookup(unit) * (sign == "-" ? -1 : 1)
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence about trailing ifs and unlesss. I think they have their place and can add readability, but using them too much in a method, especially with calculations in between, IMO makes the code noisy.

Why does the expression even allow an empty scalar and/or unit? Does it really add that much to the error message to differentiate between the cases? Could it be enough to just say that the input is malformed when the regex doesn't match?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, giving intelligent error messages on option parsing for data where it is not trivial to know what is possible is critical.

p = Regression::CorrelationCoefficient.new(xs, ys).pearson
store_value options.name, value
# Unfortunately, nagios_check converts both primary and
# secondary values to float. Hence manual assignment instead:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this explanation belongs in a test and/or a commit message. Comments usually lie because they remain after the code is changed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. While comments should be used sparingly, If I break or circumvent a contract, I need to explain why. If you try to change this to store_value 'p-value', p.abs you will get an exception from the undefined test. Creating a test with the title "make sure the contract is circumvented" is just guessing on the reason for a future breakage; there is no way a test can be that surgical.

If a developer disregards an immediately adjacent comment, that is no different from disregarding an immediately adjacent line of code. I don't really see how a review can objectively object to a useful comment when the review could equally have objected to that comment if it was orphaned.

def self.included(base)
base.on '--projection FUTURE_TIMEFRAME', :default => '2days' do |timeframe|
options.send('processor=', method(:projected_value))
options.send('timeframe=', timeframe)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ruby-ish thing to do is to use :processor= and :timeframe= (i.e. symbols not strings when referring to methods).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Fixed.

end

def timedelta(n)
Time.now + n
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hope there's no GC pauses during the test run... since .attime takes the current time as second argument I think you should have a let :now and use that here and in the .attime call to make sure that time doesn't elapse during the test run.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It hardly takes 0.9 secs to GC a test execution? But you are right, it makes for better specs. In fact, attime() should probably require the time since that is how it is used in production code.

@@ -0,0 +1,31 @@
require 'attime'

describe '#attime' do
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be:

describe CheckGraphite do
  describe '.attime' do

or

describe 'CheckGraphite.attime' do

I.e. the description should include the module/class name, and in Ruby you refer to class/module methods as .foo and only instance methods as #foo.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Survived from a previous incarnation; changed to second version.

})
c = CheckGraphite::Command.new
STDOUT.should_receive(:puts).with(match(/ze-name.*in 5min.*p-value=0.9/))
lambda { c.run }.should raise_error SystemExit
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of get why this is needed, but boy is this a brittle test... both the stub of stdout and then rescuing SystemExit... It's not very much less ugly, but perhaps Kernel.should_receive(:exit).with(0)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from check_graphite_spec.rb. I do not approve of the style, but solving the i/o problem would require to introduce a way to inject a StringIO. raise_error SystemExit is not so fragile because you will probably notice if it is not satisfied (but not_to raise_error SystemExit would be essentially worthless).

@iconara
Copy link

iconara commented Jan 31, 2017

In general this is great 👍 , but I had a few comments.

@bittrance bittrance force-pushed the linear-regression-projection-checks branch from 945b80d to 1bb2015 Compare January 31, 2017 22:42
bittrance pushed a commit that referenced this pull request Jan 31, 2017
- support float timeframes e.g. --projection 1.5days
- style improvements as suggested by iconara on PR #1
@bittrance
Copy link
Author

Thanks for taking the time to review this. It was concieved in relative haste and is somewhat shoddy.

Copy link

@iconara iconara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing a test.

if v.nan?
'undefined'
else
BigDecimal.new(v, 3).to_s('F')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thing needs a test. It needs to be documented why the formatting uses three significant digits (and not five, three or 99). Also, why not for example sprintf('%.3f', v) (because I assume this method only ever expects numbers in the 0.0-1.0 range)?

This line has actually been bugging me for a few days, but it was not until today when it just popped into my head that I realised why. It's one of those things that you kind of gloss over, like "yeah, sure and then there's some formatting". But it's not just formatting, it's very deliberate formatting, but with no test to motivate it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sprintf() can only do number of decimals, but here we deal with numbers that can be any float, so there is no way to know how many decimals are relevant, if any. Thus we instead do number of significant digits. Why 3? It works for our use case; it will need to be configurable eventually, but I think it would be a rare use case that cares about changes < 1/1000. I don't think the helper merits being named a test unit in its own right. However, this means that the "integration" tests that tests against the full output should check that the number is formatted correctly. Fixed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out to make for messy rebases, so this is actually fixed on b2b1738 not in this PR.

@bittrance bittrance force-pushed the linear-regression-projection-checks branch from 1bb2015 to 3e6219e Compare February 5, 2017 23:01
bittrance pushed a commit that referenced this pull request Feb 5, 2017
- support float timeframes e.g. --projection 1.5days
- style improvements as suggested by iconara on PR #1
- support float timeframes e.g. --projection 1.5days
- style improvements as suggested by iconara on PR #1
@bittrance bittrance force-pushed the linear-regression-projection-checks branch from 3e6219e to a42d93b Compare February 5, 2017 23:07
@bittrance bittrance merged commit 7145d73 into master Feb 6, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants