Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: CPU and resource throttling on Beats #2789

Closed
gmoskovicz opened this issue Oct 14, 2016 · 4 comments
Closed

Feature: CPU and resource throttling on Beats #2789

gmoskovicz opened this issue Oct 14, 2016 · 4 comments
Labels
discuss Issue needs further discussion. enhancement

Comments

@gmoskovicz
Copy link

In order to be able to install beats as a lightway shipper in any host environment, i would like to know if there is a possibility to add resource throttling in the Beats platform to prevent the host machine to consume all the resources.

The idea and proposal is to natively in beats be able to do this without the need of any 3rd party tool or configuration, so anyone is able to install beats as an agent without wondering around resource consumption.

Also, the idea of this is to ask:

  1. Is this doable in the Beats platform?
  2. What's the effort for this?
@tsg tsg added the discuss Issue needs further discussion. label Oct 14, 2016
@tsg
Copy link
Contributor

tsg commented Oct 14, 2016

Something that works today is to limit the Beats to a single CPU core, via the max_procs setting. Of course, that can still be too much, so I understand the feature request.

I guess there are two ways we could go about this (just dumping my thoughts to start a discussion on this):

  1. add a variable sleep where the data is generated (i.e. line reading in Filebeat or packet sniffing in Packetbeat) that we increase/decrease depending on the current CPU usage.
  2. use OS tools to limit the CPU (e.g. cgroups on linux) but hide that in our programs, so the user doesn't even have to know we're using OS features. This could be done, for example, as a feature of the init script or during the demonization phase.

Notes:

  • Option 1 has the advantage that it works on all OSes. Option 2 probably doesn't exist on macOs and I'm not sure about Windows.
  • Option 2 has the advantage that it works for all Beats (e.g. not sure how to apply option 1 on Metricbeat)
  • In Option 1 the feedback loop could be quite long, causing the feature to seem broken. For example, we know that multiline events add a significant CPU overhead (because we need to apply regexps). If these expensive events show from time to time, they will make the CPU usage go above the limit for a while. We will adjust the sleeps, but by the time we applied the new sleeps the expensive events have passed, so now we're way below the limit. I can see this coming back at us as a bug report.
  • Another example where Option 1 will seem broken: If the Beat needs to do heavy processing that's not synchronous to the events (lines in Filebeat, packets in Packetbeat), for example they are busy with garbage collection, increasing the sleeps will eventually bring the Beat to a complete halt while not reducing the CPU time below the limit.

@gmoskovicz
Copy link
Author

@tsg i'm +1 on option number 1, but i see the disadvantages and probably the technical problems that we might be missing that are not considered in your notes. That said, cgroups automatic configuration is a good idea and we can take a look at Job Objects (https://msdn.microsoft.com/en-us/library/windows/desktop/ms684161(v=vs.85).aspx) as a possible solution for windows environments.

I believe that hiding this outside the process (managing this at the OS level) is better and prevents possible issues afterwards.

@tsg
Copy link
Contributor

tsg commented May 26, 2017

We've debated this again, and I still think OS level solutions (cgroups and/or nice) are better suited for this tasks, because the kernel has better data on the resource usage and needs and can apply the limitations better. This means the OS level solution is a lot less likely to introduce negative side effects.

Happy to reconsider if the OS level solutions don't prove to be applicable or to productize them if they do prove applicable.

@mostlyjason
Copy link

I created a new meta-issue to track this. Closing this issue for now https://github.com/elastic/beats/issues/17716

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. enhancement
Projects
None yet
Development

No branches or pull requests

3 participants