-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion]system.cpu.cores
metric definition
#24942
Comments
Pinging @elastic/integrations (Team:Integrations) |
So, you're right that linux is doing something fundamentally similar when it comes to
Again, similar on linux, at least at the conceptual level of how an OS can manage processes. If we look at the actual godoc for (emphasis mine)
This isn't really a great way to get actual hardware and OS state information. Compare this with something like The real issue here is "we need more sophisticated means to determine actual CPU count," but if we want to talk about how we define |
Many of the system metricsets only return “correct” information when it runs on the host, without any kind of namespacing. I guess we are in the same situation here.
I would put it the other way around, do we have a way to get the total number of logical processors when running on a group? If there is a different API call we can make to get real numbers with a reasonable effort I think it would make sense to use it, but probably the kernel hides this info from you when you are running on a subset of the cores. In any case I agree we can clarify the docs to make it clear that we report the number of cores that Metricbeat can see. |
@fearful-symmetry , @exekias , It would be interesting to understand why we went for this approach from the beginning but I do not think the value we return reflects overall system metrics and that is the goal of this metricset. This metric value fits better in the
I don't think users are interested in this metric unless they are actively monitoring the Metricbeat process, most of the time this metric value matches the total number of logical cores (if we have a single process group) so maybe this is why it not was an issue before. I think we should put some effort in finding out how tricky it is to provide a better calculation for this metric before we settle on updating documentation. What do you folks think? |
Yes, I'm on board with that! |
Agreed, we need to do this differently. The way #22827 does it is fundamentally correct, and it's also what |
I'm gonna err on the side of clean-up and close this, as now all of metricbeat as been refactored to better emulate how |
Documentation provides the following information on the field:
From the definition above it looks like we are talking about the total number of physical cores on a machine (?) or are we interested in providing the number of logical cores/processors (logical cores are the number of physical cores times the number of threads that can run on each core through the use of hyperthreading) instead?
That said, looking at the Windows implementation:
We are calling the
GetProcessAffinityMask
win 32 api, if that returns 0 then we retrieve the value fromGetSystemInfo
win32 api.The value returned represents the number of processors that a process is allowed to run on in the current processor group.
If there are not multiple groups, the result would be the total number of logical processors.
On the linux side there seems to call
sched_getaffinity
which looks to be similar to the Windows implementation.Is the goal that
system.cpu.cores
returns the number of logical processor from the current processor group? Should the goal be returning the total number of logical processors or physical cores?Should documentation be altered here to be more clear on the value returned?
cc: @sorantis , @fearful-symmetry
The text was updated successfully, but these errors were encountered: