Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin/LoadAvg threshold for multi cpu #21

Open
setop opened this issue Jun 7, 2016 · 4 comments
Open

Plugin/LoadAvg threshold for multi cpu #21

setop opened this issue Jun 7, 2016 · 4 comments
Assignees

Comments

@setop
Copy link

setop commented Jun 7, 2016

LoadAvg threshold should be multiplied by the number of CPU.
Plus I thing 0.8 is a bit low.

dolmen added a commit that referenced this issue Jun 7, 2016
Divide loadavg by nproc. GitHub #21.
@dolmen dolmen self-assigned this Jun 7, 2016
@dolmen
Copy link
Owner

dolmen commented Jun 7, 2016

I choose to normalize the loadavg value to the [0, 1] range by dividing by the number of processors.
This allow to have the same settings working on systems with different number of CPUs.

If 0.8 is a bit low, what do you suggest instead?

@setop
Copy link
Author

setop commented Jun 8, 2016

Avg load is the average number of process which the system have had to schedule in a period of time (eg: one minute). It has no upper bound. So you won't normalize it to 0-1 by dividing by the number of CPUs.

For example, on a two CPU, avg load can be three. It just means that the system is overloaded. And that gives you the threshold : red if AvgLoad > number of CPUs.

Moreover as a sysadmin, I don't want a tool to hide me information : so I'm not keen in to trying to "normalise" this figure.

@dolmen
Copy link
Owner

dolmen commented Jun 8, 2016

The aim of this indicator is to show if the system is overloaded before it is too late.
Remember that the indicator is shown in a shell prompt on that system. If the system is overloaded following your threshold (LoadAvg > number of CPUs), the shell is already unresponsive for interactive use. This is too late. For good interactive use, the system must be idling often.

So I think the indicator has value as is. I concede that the "LoadAvg" name is misleading. Any idea for a better name?

@setop
Copy link
Author

setop commented Jun 8, 2016

Then the problem may not be the name but the indicator itself. LoadAvg plugin uses /proc/loadavg (also shown by uptime). For some system, it is perfectly fine to have a load = nb CPU. For some others it may be not.

An option would be to use an other indicator, like /proc/stat ; see SO for a way to compute CPU %age, the second answer is my favorite. But again, it is perfectly fine to have 100% CPU usage if you trigger a multi-threaded video encoding.

You can also make this indicator parametrized, such as LoadAvg(myThreshold).

You aim to produce an indicator which says "oh oh my system is getting bad!". But it is far more complicated than "what time is it ?" :) Because it really depends on how the system is used.

It's gonna be very hard to put an alerting mechanism into a shell prompt as it is more the trend which is important than the instant figure. Btw, loadavg is a very basic kind of a trend.

Personally, I like to have the loadavg raw figure and a red flag when it is greater than nbcpu. It is powerful enough for me to make a decision.

I hope it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants