Plugin/LoadAvg threshold for multi cpu #21

setop · 2016-06-07T16:36:14Z

LoadAvg threshold should be multiplied by the number of CPU.
Plus I thing 0.8 is a bit low.

Divide loadavg by nproc. GitHub #21.

dolmen · 2016-06-07T23:42:40Z

I choose to normalize the loadavg value to the [0, 1] range by dividing by the number of processors.
This allow to have the same settings working on systems with different number of CPUs.

If 0.8 is a bit low, what do you suggest instead?

setop · 2016-06-08T06:50:11Z

Avg load is the average number of process which the system have had to schedule in a period of time (eg: one minute). It has no upper bound. So you won't normalize it to 0-1 by dividing by the number of CPUs.

For example, on a two CPU, avg load can be three. It just means that the system is overloaded. And that gives you the threshold : red if AvgLoad > number of CPUs.

Moreover as a sysadmin, I don't want a tool to hide me information : so I'm not keen in to trying to "normalise" this figure.

dolmen · 2016-06-08T09:01:02Z

The aim of this indicator is to show if the system is overloaded before it is too late.
Remember that the indicator is shown in a shell prompt on that system. If the system is overloaded following your threshold (LoadAvg > number of CPUs), the shell is already unresponsive for interactive use. This is too late. For good interactive use, the system must be idling often.

So I think the indicator has value as is. I concede that the "LoadAvg" name is misleading. Any idea for a better name?

setop · 2016-06-08T09:33:07Z

Then the problem may not be the name but the indicator itself. LoadAvg plugin uses /proc/loadavg (also shown by uptime). For some system, it is perfectly fine to have a load = nb CPU. For some others it may be not.

An option would be to use an other indicator, like /proc/stat ; see SO for a way to compute CPU %age, the second answer is my favorite. But again, it is perfectly fine to have 100% CPU usage if you trigger a multi-threaded video encoding.

You can also make this indicator parametrized, such as LoadAvg(myThreshold).

You aim to produce an indicator which says "oh oh my system is getting bad!". But it is far more complicated than "what time is it ?" :) Because it really depends on how the system is used.

It's gonna be very hard to put an alerting mechanism into a shell prompt as it is more the trend which is important than the instant figure. Btw, loadavg is a very basic kind of a trend.

Personally, I like to have the loadavg raw figure and a red flag when it is greater than nbcpu. It is powerful enough for me to make a decision.

I hope it helps.

dolmen added a commit that referenced this issue Jun 7, 2016

Plugin::LoadAvg: normalize with CPU count

4cc6799

Divide loadavg by nproc. GitHub #21.

dolmen self-assigned this Jun 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugin/LoadAvg threshold for multi cpu #21

Plugin/LoadAvg threshold for multi cpu #21

setop commented Jun 7, 2016

dolmen commented Jun 7, 2016

setop commented Jun 8, 2016

dolmen commented Jun 8, 2016

setop commented Jun 8, 2016

Plugin/LoadAvg threshold for multi cpu #21

Plugin/LoadAvg threshold for multi cpu #21

Comments

setop commented Jun 7, 2016

dolmen commented Jun 7, 2016

setop commented Jun 8, 2016

dolmen commented Jun 8, 2016

setop commented Jun 8, 2016