Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.cpu.cores value should be the actual number of cores, not number usable. #24600

Closed
kevinsmithwrs opened this issue Mar 17, 2021 · 4 comments
Labels
Team:Integrations Label for the Integrations team

Comments

@kevinsmithwrs
Copy link

Re-opening #23453 as a bug.

This request is related to request: #22885 resolved with PR: #23154.

The issue is that the metricbeat system.cpu module is using the go runtime.NumCPU() function to get the number of cores. According to the documentation for this function (https://golang.org/pkg/runtime/#NumCPU) , "NumCPU returns the number of logical CPUs usable by the current process." As system.cpu is using gosigar to retrieve cpu statistics, and in the case of linux this is via polling /proc/stat, NumCPU() may not return the same value if cores are reserved via cpusets.

In a kubernetes environment with cpu-manager-policy=static launch a pod with Guaranteed QoS Class in order to reserve cpus.
In the metricbeat container, /proc/stat will still show all cpu cores, yet NumCPU() will report a reduced number of cores due to cores being reserved by the Guaranteed Qos Class pod. The system.core metricset will however (correctly) report information for all cpus, not the reduced set.

For the linux case, in the below:
https://github.com/elastic/gosigar/blob/master/sigar_linux_common.go
A possible fix would be a function similar to: (self *CpuList) Get() that just returns the number of lines in /proc/stat with "cpux", or else modifying (self *Cpu) Get() to also return a count of the lines of "cpux". This would work for both #22885 and the cpuset case.

For now though, the system.cpu.cores and the non-normalized system.cpu values will be incorrect any time cpusets are used to reserve cpus.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 17, 2021
@ycombinator ycombinator added the Team:Integrations Label for the Integrations team label Mar 17, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@andresrc andresrc removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 18, 2021
@fearful-symmetry
Copy link
Contributor

So, correct me if I'm wrong, but it looks like the issue is that we're using NumCPU at all when it comes to the metric gathering itself, since we can "mix" two different values. It looks like we might want our own API for metrics, to get the actual /proc/stat value.

@kevinsmithwrs
Copy link
Author

Correct, just a new API to return the count of "cpux" lines from /proc/stat would do it I think.

@fearful-symmetry
Copy link
Contributor

Closing this, as none of the metricbeat reporting code is runtime.NumCPU() anymore, and we'll only resort to it as a fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

5 participants