Adding a generic performance counter checker and metric retriever #5

magmax · 2015-06-17T15:21:31Z

Hello.

I've being doing a lot of separated plugins to ask for performance counters, but I had a lot of performance problems: metrics required near 80% CPU.

So I decided to create an unique checker/metric retriever that can do both in just one step. And here it is.

The idea is to have a YAML file that configures which metrics to retrieve, how should they be stored and the limits of a check. Everything but the performance counter is optional.

So, you can do things like these:


---
\Memory\Free System Page Table Entries:
  scheme: memory.free.system_page_table_entries
\Memory\Pool Paged Bytes:
  scheme: memory.pool_paged_bytes
\Processor(_Total)\% Processor Time:
  scheme: cpu.usage
  max: 95
\Web Service(Default Web Site)\Total Connection Attempts (all instances)
  scheme: iis.default.instances
  min: 1

what will print all the performance counters and errors if there is any, and will return an error code if necessary.

I'm not a good Ruby programmer, but I did my best. Please, feel free to improve it. Indeed, I do not know what do you think about this idea.

mattyjones · 2015-06-27T14:20:57Z

@magmax Can you take a look at the Travis output please

Thanks!

magmax · 2015-07-03T08:57:52Z

I'm afraid no. I have no idea about what is wrong with:

bin/performance-counters.rb:67:13: C: Use next to skip iteration.
        row.each do |k, v|
            ^^^^

Can you help me with this?

hurrycaine · 2015-07-30T02:58:59Z

Using my google kung fu I found a recent similar complaint to Rubocop (which is the test breaking by the way). The issue outcome seems to leave it in Rubocop and just disable in .rubocop.yml on a case by case basis.

rubocop/rubocop#1238

example of disabling the next check
walles/moar@6984850

The example of the issue is not 100% like yours... if I had to guess its the use of
one of these

next unless v && k

next unless data

You could try switching those two to if statements or something

hurrycaine · 2015-07-30T03:12:56Z

I really like the idea of grouping metrics together. I also like the idea of grouping a metric and check (I was wondering if that was doable but had not investigated yet).

Combining the two (errr four) ... I am not so crazy about. It looks like in your example in your first comment if you got a critical, would it be clear on which measure(s)? Maybe putting scheme in the output? What if there were multiple failures, would there be multiple lines of what errored?

I would love to hear about some examples of how you are using and the ouput of the messages you would see in sensu.

Of course even if my concerns are valid I think this is really cool and could not use the check part or have logical groupings like all cpu mearsures in one, disk in another, and memory in another and so on.

Also for the kicker, I was just looking at all the stuff you can get out of typeperf. Something like this would prevent us from having 20 different typeperf related checks all passing in a different counter/object.

Great idea @magmax

…again

magmax · 2015-07-30T06:52:37Z

I really, really tried to fix the problem with several options, but no luck. I do not want to add any exception to style without others permission.

Thank you @hurrycaine . I really hate to check it twice, because sometimes I see alerts, but my graphs doesn't show them, and others the opposite happens. In addition, there are some PC that are hard to retrieve, and require a lot of time or resources. Why asking for them twice?

I know about the mess you can see in the alerts. But:

you can sort the script output in order to see the errors at the beginning.
I do not want to use just one configuration for all alerts, but one for some related metrics.
I do not use Uchiwa to see the errors, because it has no history. The error comes in a mail or a Slack message. You can filter the metrics in the handler or even use a mutator to do that.

IMHO, the minimum/maximum pattern is so used that it should be used as part of the configuration. Indeed, I see the necessity of taking just the metric and decide about the alert in the Sensu core, configuring it in the configuration XML instead.

I really hated to add a new file to configure my plugin... but I thought that it was just "one" solution. Maybe you can think in other better :)

hurrycaine · 2015-07-30T12:13:05Z

@magmax that is frustrating! I will see if I can find some time today to reproduce and see if I can fix it. I just skimmed the style guide and I think I gave bad advice, looks like we both lack style :O) It did give me some more ideas.

I agree the check X and metric X seemed inefficient to me as well.

I think some would moan the yaml file and I think we could add some command line arguments. If we really wanted. I have some odd cases where some servers min/maxes may need to differ and it looks like from the sensu docs a check definition on a agent/client overrides the one on the server. So this yaml file might be helpful for me.

Let me try using this check for a bit instead of throwing out theoretical suggestions.

mattyjones · 2015-07-30T12:46:47Z

@magmax If we can't figure out the issue, we can add an exception and then circle around to it when time arises.

I think overall the idea is certainly interesting and maybe even useful ;) I will try and find some time this weekend maybe and take a look at it as well.

hurrycaine · 2015-07-31T14:00:14Z

I think I am close on figuring out the style issue though IMO it might be harder to follow.

So the style is secondary, the check actually working is the important point, I can't get the check to work.

A little belated I tried the check wanting to compare my refactor to make sure the logic was the same.

@magmax I am getting a "undefined method 'shift" error when I run the original (and mine)

I am curious if you are using this check as its currently written? If so which version of windows?

This shift error is on the row object, it appears it should be a hash or an array but its just one line.
The CSV.parse by the Ruby docs will do line by line... so thats your loop line by line, not an array but I also don't see where you split the line. You can comment the entire loop body and replace with puts to see it just prints out the line.

CSV.parse(io.read, headers: true) do |row|
        puts " row = #{row}"
        # row.shift
        # row.each do |k, v|

It looks like you want to skip the first line since it is blank. Doing something like this (or make a counter)

CSV.parse(io.read, headers: true).each_with_index do |row, i|
  next if i == 0

hurrycaine · 2015-07-31T14:02:35Z

Here is what I did yesterday trying to fix the syntax issue before I tested the actual check. I am not sure it as clean as it was before

https://github.com/hurrycaine/sensu-plugins-windows/blob/generic-checker/bin/performance-counters.rb

First download Rubocop gem and run it ... see their site.
If you comment out the if statements at the bottom (inside the loop) it was happy.

These come from the Rubocop style guide

Prefer next in loops instead of conditional blocks
Avoid use of nested conditionals for flow of control. [link]
Prefer a guard clause when you can assert invalid data. A guard clause is a conditional statement at the top of a function that bails out as soon as it can.

This is all good, however our problem is we had multiple checks so what I had to do was make 2 small methods with a guard clause. I also had to add two new variables for a min_ok and max_ok because this new way the is_ok could be set to false by the min but then set back to true by max method.

As I just wrote this I guess we could go next on the loop if min trips the status to false because nothing should be below min and above max

magmax · 2015-07-31T16:55:03Z

the shift is the same than next if k == row.headers[0], but I thought was prettier. I do not know ruby enought to say which version does support it. I'm using it before using the shift, because I have no Windows at home, but it may be related to the ruby version. You can check it out at commit 2797dab.

If you find a version that has no check-style problems and do the same... come on.

About changing the configuration program by command line arguments: I thought that, but it is a mess. Think about adding PF, the prefix for graphite and their mins and maxs. I already think it is a mess just with 1 counter!

hurrycaine · 2015-08-03T13:04:21Z

I agree the command line options would only work on metric/check combo not a list of metric/checks, did not think that through when I said that.

There does not appear to be a shift method for your row. I dug a little deeper into the CSV class and it looks like you would use shift or parse not both together. When you have

CSV.parse(io.read, headers: true) do |row|

This already is looping through the "CSV" so there is nothing to shift, shift grabs the next "row".
So I took shift out and it works like a charm. I sent you a pull request, check it out.

I did some checks to make the sure the checks fired if they are below the min or above the max. I also made the decision to not check for max if the value was below min.

analytically · 2015-12-10T12:02:45Z

I'd love to get this merged. Can we have a bit more documentation and some testing?

magmax · 2015-12-10T14:29:23Z

I do not know why is it failing in rubocop. Maybe it is the "break" I'm using, but it is exactly what I want.

By the way... Finally I moved to https://github.com/carllindelof/sensu-client and wrote an extension in order to keep performance counters run in a thread instead of a process. Result: Less CPU and RAM consumption.

Anyways... I think this branch is ready to be merged but the Rubocop warn.

majormoses · 2017-05-29T03:56:33Z

closing due to inactivity, please feel free to re-open if you plan on getting this over the line.

Adding a generic performance checker and metric

165b9f9

mattyjones added the Investigation Required label Jun 17, 2015

magmax added 6 commits July 30, 2015 08:34

Trying to avoid the problem with style

2797dab

Trying to avoid the problem with style again

e0f05d6

Trying to avoid the problem with style again and again

18c76b9

Trying to avoid the problem with style again and again and again

bd9bc18

Trying to avoid the problem with style again and again and again and …

e36fcb5

…again

Reverting to the more readable version

6ad5042

mattyjones added In Progress labels Jul 30, 2015

mattyjones removed the Developer Feedback Requested label Aug 7, 2015

sensu-plugin assigned mattyjones Feb 5, 2016

majormoses closed this May 29, 2017

majormoses added Status: In Progress Status: Needs Review and removed In Progress labels Apr 18, 2018

majormoses removed the Investigation Required label Apr 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a generic performance counter checker and metric retriever #5

Adding a generic performance counter checker and metric retriever #5

magmax commented Jun 17, 2015

mattyjones commented Jun 27, 2015

magmax commented Jul 3, 2015

hurrycaine commented Jul 30, 2015

hurrycaine commented Jul 30, 2015

magmax commented Jul 30, 2015

hurrycaine commented Jul 30, 2015

mattyjones commented Jul 30, 2015

hurrycaine commented Jul 31, 2015

hurrycaine commented Jul 31, 2015

magmax commented Jul 31, 2015

hurrycaine commented Aug 3, 2015

analytically commented Dec 10, 2015

magmax commented Dec 10, 2015

majormoses commented May 29, 2017

Adding a generic performance counter checker and metric retriever #5

Adding a generic performance counter checker and metric retriever #5

Conversation

magmax commented Jun 17, 2015

mattyjones commented Jun 27, 2015

magmax commented Jul 3, 2015

hurrycaine commented Jul 30, 2015

hurrycaine commented Jul 30, 2015

magmax commented Jul 30, 2015

hurrycaine commented Jul 30, 2015

mattyjones commented Jul 30, 2015

hurrycaine commented Jul 31, 2015

hurrycaine commented Jul 31, 2015

magmax commented Jul 31, 2015

hurrycaine commented Aug 3, 2015

analytically commented Dec 10, 2015

magmax commented Dec 10, 2015

majormoses commented May 29, 2017