system/core: add CPU information for Linux hosts #31643

belimawr · 2022-05-17T09:59:10Z

What does this PR do?

It adds the following information from /proc/cpuinfo to system.core metrics on Linux hosts:

model_number
model_name
mhz
core_id
pysical_id

Below is an example of the information added to the events from a laptop CPU with 8 cores and 16 threads.

It's interesting to notice that our current system.core.id is something like a "virtual core ID" (that matches the processor from /proc/cpuinfo) and is distinct even across different CPU sockets. I'm adding a system.core.core_id that is the "physical core ID" for a given "physical CPU".

"@timestamp"	"system.core.model_name"	"system.core.model_num"	"system.core.id"	"system.core.core_id"	"system.core.mhz"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	0	0	"2,400"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	1	1	"2,400"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	2	2	"2,400"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	3	3	"2,400"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	4	4	"2,400"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	5	5	"4,443.456"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	6	6	"2,400"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	7	7	"2,400"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	8	0	"2,400"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	9	1	"2,400"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	10	2	"2,400"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	11	3	"2,400"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	12	4	"2,400"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	13	5	"2,400"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	14	6	"2,400"
"May 18, 2022 @ 15:15:52.326"	"Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz"	165	15	7	"2,400"

The Go port of Sigar we use (gosigar) we use does not support fetching the cpuinfo. After some research I decided to read directly /proc/cpuinfo. This works well for Linux on x86/x86_64

Questions

A few questions I have:

Should we add this information to system.cpu? There isn't a single value for the clock I can read. For now I'm averaging out the clock from all cores

No. The values can be quite different among different CPU/cores, it does not make sense trying to aggregate them.

If some processors have got different core types (like the M1) having this info on system/cpu might be quite inaccurate. This might not be an immediate issue, but we should keep that in mind.

It won't be an issue as we will not add this info to system.cpu

Anything against reading /proc/cpuinfo on Linux?

No

Do we need to consider non-x86 CPUs at the moment? Like ARM?

No, this PR focus only on Linux x86/x86_64 CPUs

Why is it important?

It enables more visibility on the CPUs used, see the related issue for more details.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

Naming of the metrics, specially core_id
Linux

How to test this PR locally

Run Metricbeat with system/core module enabled in one of the supported platforms, check for the metrics.

Related issues

Closes [Metricbeat] Add support for cpuinfo metricset #25471

~~## Use cases~~
~~## Screenshots~~
~~## Logs~~

elasticmachine · 2022-05-17T10:37:27Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-05-23T15:06:49.615+0000
Duration: 57 min 35 sec

Test stats 🧪

Test	Results
Failed	0
Passed	3588
Skipped	887
Total	4475

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

kvch · 2022-05-17T12:41:53Z

metricbeat/internal/metrics/cpu/metrics.go

@@ -91,7 +104,18 @@ func (m *Monitor) Fetch() (Metrics, error) {
 	oldLastSample := m.lastSample
 	m.lastSample = metric

-	return Metrics{previousSample: oldLastSample.totals, currentSample: metric.totals, count: len(metric.list), isTotals: true}, nil
+	// There isn't a 'total' for the CPU/Core frequency, so we average all the


Why not report all of the values in separate metrics?

system.cpu.* report the aggregated metrics to all cores, handling the CPU as a single unit. The fact that the OS already reports the metrics like that makes things way easier.

I just followed the same approach.

I think it would be odd to see metrics like:
system.cpu.core0.mhz
system.cpu.core1.mhz
system.cpu.core2.mhz
system.cpu.core3.mhz

We could also omit the clock frequency from system.cpu, and only report system.cpu.model_name and system.cpu.model_number if they're the same for all cores.

What do you think?

You might want to consider looking at how the system/core metricset does things, that might be a better way to report a lot of this.

It's important to remember that CPU data that comes from sources like /proc/cpuinfo can be weirdly heterogeneous, particularly on multi-socket systems and VMs, I would strongly advise against trying to "sum" things.

Mhz should probably be reported individually, even if we also wanted to have an averaging metric somewhere. Wildly different core speeds across a system could indicate inefficiencies, scheduling issues, etc. A Precise count from the system itself, as opposed to an average, is also useful for verification that the CPU is turbo'ing as it should under load.

You might want to consider looking at how the system/core metricset does things, that might be a better way to report a lot of this.

I was looking at it, and what it does is to get the aggregated metrics provided by the OS (at least on Linux), /proc/stat has got an aggregated line there. It really makes things easy.

It's important to remember that CPU data that comes from sources like /proc/cpuinfo can be weirdly heterogeneous, particularly on multi-socket systems and VMs, I would strongly advise against trying to "sum" things.

Indeed. I think I'll just keep them out of system/cpu. At least for now.
It feels pretty odd to introduce this 'core' concept into system/cpu.

fearful-symmetry · 2022-05-17T17:45:38Z

metricbeat/internal/metrics/cpu/metrics.go

@@ -91,7 +104,18 @@ func (m *Monitor) Fetch() (Metrics, error) {
 	oldLastSample := m.lastSample
 	m.lastSample = metric

-	return Metrics{previousSample: oldLastSample.totals, currentSample: metric.totals, count: len(metric.list), isTotals: true}, nil
+	// There isn't a 'total' for the CPU/Core frequency, so we average all the


You might want to consider looking at how the system/core metricset does things, that might be a better way to report a lot of this.

It's important to remember that CPU data that comes from sources like /proc/cpuinfo can be weirdly heterogeneous, particularly on multi-socket systems and VMs, I would strongly advise against trying to "sum" things.

Mhz should probably be reported individually, even if we also wanted to have an averaging metric somewhere. Wildly different core speeds across a system could indicate inefficiencies, scheduling issues, etc. A Precise count from the system itself, as opposed to an average, is also useful for verification that the CPU is turbo'ing as it should under load.

metricbeat/internal/metrics/cpu/metrics.go

metricbeat/internal/metrics/cpu/metrics_procfs_common.go

belimawr · 2022-05-20T14:21:14Z

libbeat/metric/system/cgroup/reader.go

@@ -157,7 +156,6 @@ func (r *Reader) CgroupsVersion(pid int) (CgroupsVersion, error) {
 		// V1 and V2 controllers on a cgroup. If the V2 controller has no actual controllers associated with it,
 		// We revert to V1. If it does, report V2. In the future, we may want to "combine" V2 and V1 metrics somehow.
 		if len(controllers) > 0 {
-			fmt.Printf("fetching V2 controller: %#v for pid %d\n", controllers, pid)


That looked like a print debug that was forgotten. Is it ok to remove it @fearful-symmetry ?

Yah, just confused as to why the linter is touching this file to begin with. I don't see any changes besides this one?

That's the only change on this file, but the linter runs on all files that had any changes.

This file is going to be removed in #31615

Please move this fix to the elastic-agent-system-metrics repo.

belimawr · 2022-05-20T14:35:07Z

metricbeat/module/system/test_system.py

@@ -280,7 +282,6 @@ def test_filesystem(self):
        self.assertGreater(len(output), 0)

        for evt in output:
-            print(evt)


That also looked like a forgotten debug line, so I removed it.

elasticmachine · 2022-05-20T14:54:11Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

belimawr · 2022-05-20T15:03:49Z

@kvch @fearful-symmetry, it's ready for review. I reduced the scope to only include Linux so this PR can make it into 8.3.

I'm also ignoring the linter because it's mostly noisy regarding pkg/errors on a file I removes what seemed to be a forgotten print debug.

metricbeat/module/system/core/_meta/fields.yml

fearful-symmetry

a few small nits. @kvch should we merge this here, and then later migrate all the code in internal/ over to the elastic-agent-system-metrics repo?

fearful-symmetry · 2022-05-20T20:40:43Z

libbeat/metric/system/cgroup/reader.go

@@ -157,7 +156,6 @@ func (r *Reader) CgroupsVersion(pid int) (CgroupsVersion, error) {
 		// V1 and V2 controllers on a cgroup. If the V2 controller has no actual controllers associated with it,
 		// We revert to V1. If it does, report V2. In the future, we may want to "combine" V2 and V1 metrics somehow.
 		if len(controllers) > 0 {
-			fmt.Printf("fetching V2 controller: %#v for pid %d\n", controllers, pid)


Yah, just confused as to why the linter is touching this file to begin with. I don't see any changes besides this one?

metricbeat/internal/metrics/cpu/metrics_procfs_common.go

kvch · 2022-05-23T08:53:32Z

Let's merge it here. We can move it around after FF.

kvch

Please remove the fix from libbeat/metric/system/cgroup/reader.go and rather address it in the new repo elastic-agent-system-metrics.

mergify · 2022-05-23T10:43:58Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b add-cpuinfo-metricset upstream/add-cpuinfo-metricset
git merge upstream/main
git push upstream add-cpuinfo-metricset

belimawr · 2022-05-23T10:44:34Z

fmt.Printf("fetching V2 controller: %#v for pid %d\n", controllers, pid)

Done on e329ee891f

Some log debugs were removed

- fix error messge - better variable naming

This commit adds the following information from `/proc/cpuinfo` to `system.core` metrics on Linux hosts: - `model_number` - `model_name` - `mhz` - `core_id` - `physical_id`

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 17, 2022

mergify bot assigned belimawr May 17, 2022

belimawr requested review from kvch and fearful-symmetry May 17, 2022 10:37

kvch reviewed May 17, 2022

View reviewed changes

belimawr force-pushed the add-cpuinfo-metricset branch from 643876b to 3432e92 Compare May 17, 2022 16:10

fearful-symmetry reviewed May 17, 2022

View reviewed changes

belimawr changed the title ~~[WIP] system/cpu add cpuinfo~~ [WIP] system/core: add cpuinfo May 18, 2022

belimawr force-pushed the add-cpuinfo-metricset branch 3 times, most recently from cda193d to d8ec7cf Compare May 20, 2022 14:16

belimawr commented May 20, 2022

View reviewed changes

belimawr force-pushed the add-cpuinfo-metricset branch from 7608419 to 5a0a0ca Compare May 20, 2022 14:47

belimawr changed the title ~~[WIP] system/core: add cpuinfo~~ system/core: add CPU information for Linux hosts May 20, 2022

belimawr added review Metricbeat Metricbeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels May 20, 2022

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 20, 2022

belimawr requested review from fearful-symmetry and kvch May 20, 2022 14:54

belimawr marked this pull request as ready for review May 20, 2022 14:54

belimawr requested a review from a team as a code owner May 20, 2022 14:54

kvch reviewed May 20, 2022

View reviewed changes

metricbeat/module/system/core/_meta/fields.yml Outdated Show resolved Hide resolved

belimawr force-pushed the add-cpuinfo-metricset branch from 5a0a0ca to b7a1706 Compare May 20, 2022 15:19

fearful-symmetry reviewed May 20, 2022

View reviewed changes

kvch suggested changes May 23, 2022

View reviewed changes

belimawr force-pushed the add-cpuinfo-metricset branch from ebfe830 to f9f5ab7 Compare May 23, 2022 10:23

belimawr requested a review from kvch May 23, 2022 10:45

belimawr added 12 commits May 23, 2022 12:45

[WIP] system/core add cpuinfo

0d04ea2

Fixing tests

3270610

Some log debugs were removed

fix tests

1758858

fix fields

816a277

exclude metrics from system/cpu

787cdab

adding test data and finishing tests

467067b

update documentation

0c416c7

PR review updates

3f3cc02

PR review

83c5c3d

PR improvements

57b767f

- fix error messge - better variable naming

fix tests on Windows

fd283af

fix naming

38ec0f5

belimawr force-pushed the add-cpuinfo-metricset branch from e329ee8 to 38ec0f5 Compare May 23, 2022 10:50

kvch approved these changes May 23, 2022

View reviewed changes

do not add metrics to event if they're empty

72afa74

belimawr merged commit 108be1d into elastic:main May 23, 2022

belimawr deleted the add-cpuinfo-metricset branch May 23, 2022 16:13

kvch added a commit to kvch/elastic-agent-system-metrics that referenced this pull request Jun 9, 2022

Pick changes from elastic/beats#31643

31b34c1

kvch mentioned this pull request Jun 9, 2022

Pick changes from https://github.com/elastic/beats/pull/31643 to add support for CPU info elastic/elastic-agent-system-metrics#36

Merged

3 tasks

kvch added a commit to kvch/elastic-agent-system-metrics that referenced this pull request Jun 9, 2022

Pick changes from elastic/beats#31643

2b4aadc

kvch added a commit to elastic/elastic-agent-system-metrics that referenced this pull request Jun 9, 2022

Pick changes from elastic/beats#31643 to add support for CPU info (#36)

b510c44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

system/core: add CPU information for Linux hosts #31643

system/core: add CPU information for Linux hosts #31643

belimawr commented May 17, 2022 •

edited

Loading

elasticmachine commented May 17, 2022 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

kvch May 17, 2022

belimawr May 17, 2022

fearful-symmetry May 17, 2022

belimawr May 18, 2022

fearful-symmetry May 17, 2022

belimawr May 20, 2022

fearful-symmetry May 20, 2022

belimawr May 23, 2022

kvch May 23, 2022

belimawr May 20, 2022

elasticmachine commented May 20, 2022

belimawr commented May 20, 2022

fearful-symmetry left a comment

fearful-symmetry May 20, 2022

kvch commented May 23, 2022 •

edited

Loading

kvch left a comment

mergify bot commented May 23, 2022

belimawr commented May 23, 2022

system/core: add CPU information for Linux hosts #31643

system/core: add CPU information for Linux hosts #31643

Conversation

belimawr commented May 17, 2022 • edited Loading

What does this PR do?

Questions

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

elasticmachine commented May 17, 2022 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticmachine commented May 20, 2022

belimawr commented May 20, 2022

fearful-symmetry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kvch commented May 23, 2022 • edited Loading

kvch left a comment

Choose a reason for hiding this comment

mergify bot commented May 23, 2022

belimawr commented May 23, 2022

belimawr commented May 17, 2022 •

edited

Loading

elasticmachine commented May 17, 2022 •

edited by jenkins-beats-ci bot

Loading

kvch commented May 23, 2022 •

edited

Loading