cleanup(engines): detach per-cpu kernel metrics from global kernel metrics #2031

Andreagit97 · 2024-08-28T09:25:25Z

What type of PR is this?

/kind cleanup

Any specific area of the project related to this PR?

/area libscap-engine-bpf

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

Does this PR require a change in the driver versions?

No

What this PR does / why we need it:

As explained in issue #2028, it is better to split the per-CPU counters from the global counters for verbosity reasons.
More in detail when the per-CPU counters are enabled, libscap under the hood also enables the global counters. This is done to avoid a double loop over all the CPUs and to keep the code simpler without duplications. The idea behind this choice is that usually, a user should enable the per-cpu stats to obtain more insights with respect to the global ones so the global ones should be already enabled...

Which issue(s) this PR fixes:

Fixes #2028

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

cleanup(engines): detach per-cpu kernel metrics from global kernel metrics

github-actions · 2024-08-28T09:25:49Z

Please double check driver/API_VERSION file. See versioning.

/hold

codecov · 2024-08-28T09:37:11Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.30%. Comparing base (bf3c89b) to head (6aaec6c).
Report is 26 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2031   +/-   ##
=======================================
  Coverage   74.30%   74.30%           
=======================================
  Files         253      253           
  Lines       30966    30966           
  Branches     5397     5400    +3     
=======================================
  Hits        23010    23010           
- Misses       7932     7946   +14     
+ Partials       24       10   -14

Flag	Coverage Δ
libsinsp	`74.30% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-08-28T09:41:35Z

Perf diff from master - unit tests

     6.32%     -1.50%  [.] sinsp_evt::get_type
     5.83%     -1.31%  [.] next
     4.81%     +0.78%  [.] sinsp_parser::process_event
     0.82%     +0.73%  [.] 0x00000000000e93c0
     9.74%     +0.72%  [.] sinsp_parser::reset
     6.77%     +0.60%  [.] sinsp::next
     0.23%     +0.58%  [.] sinsp_parser::parse_rw_exit
     2.02%     +0.56%  [.] scap_event_decode_params
     4.21%     -0.44%  [.] gzfile_read
     0.50%     +0.42%  [.] sinsp_container_info::sinsp_container_info

Heap diff from master - unit tests

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Benchmarks diff from master

Comparing gbench_data.json to /root/actions-runner/_work/libs/libs/build/gbench_data.json
Benchmark                                                         Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------
BM_sinsp_split_mean                                            +0.0297         +0.0299           145           150           145           150
BM_sinsp_split_median                                          +0.0316         +0.0317           145           150           145           150
BM_sinsp_split_stddev                                          -0.3063         -0.3074             1             1             1             1
BM_sinsp_split_cv                                              -0.3263         -0.3275             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_mean                  +0.0053         +0.0055            42            42            42            42
BM_sinsp_concatenate_paths_relative_path_median                +0.0056         +0.0057            42            42            42            42
BM_sinsp_concatenate_paths_relative_path_stddev                -0.4610         -0.4609             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_cv                    -0.4639         -0.4638             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_mean                     -0.0223         -0.0222            17            17            17            17
BM_sinsp_concatenate_paths_empty_path_median                   -0.0185         -0.0184            17            17            17            17
BM_sinsp_concatenate_paths_empty_path_stddev                   -0.8604         -0.8600             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_cv                       -0.8572         -0.8568             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_mean                  +0.0392         +0.0393            43            44            43            44
BM_sinsp_concatenate_paths_absolute_path_median                +0.0433         +0.0434            43            45            43            45
BM_sinsp_concatenate_paths_absolute_path_stddev                +3.9479         +3.9463             0             1             0             1
BM_sinsp_concatenate_paths_absolute_path_cv                    +3.7615         +3.7594             0             0             0             0
BM_sinsp_split_container_image_mean                            +0.0009         +0.0011           349           349           349           349
BM_sinsp_split_container_image_median                          -0.0026         -0.0025           350           349           349           349
BM_sinsp_split_container_image_stddev                          -0.1712         -0.1718             3             3             3             3
BM_sinsp_split_container_image_cv                              -0.1720         -0.1727             0             0             0             0

Andreagit97 · 2024-08-28T10:04:34Z

userspace/libpman/src/stats.c

+ * The following `if` handle the case in which we want to get the metrics per CPU but not the global ones.
+ * It is an unsual case but at the moment we support it.
+ */
+ if ((flags & METRICS_V2_KERNEL_COUNTERS_PER_CPU) && !(flags & METRICS_V2_KERNEL_COUNTERS))


[EARLY FEEDBACK]

Actually, i'm managing also this weird case in which we have the per-CPU stats enabled but not the global ones... I cannot think of a real use case for it so I'm not sure we want to keep it. WDYT? @FedeDP @incertum

I'd say that if KERNEL_COUNTERS are disabled, KERNEL_COUNTERS_PER_CPU must be disabled too!

I see 3 options:

handling the 2 stats separately (so looping 2 times among CPUs if both are enabled)

handling the 2 stats together so METRICS_V2_KERNEL_COUNTERS_PER_CPU can be enabled only if METRICS_V2_KERNEL_COUNTERS is enabled. We create a dependecy

like case 1 but we have a duplicated logic that allow us to loop just once if both metrics are enabled (implemented in this PR)

I'd go with 2, easier and expected since both metric flags share same prefix.

yep makes sense, we just need to put somewhere a log that warns the user if it enables a flag without the other

We can also say that if only METRICS_V2_KERNEL_COUNTERS_PER_CPU is passed, we silently enable METRICS_V2_KERNEL_COUNTERS too.

We can also say that if only METRICS_V2_KERNEL_COUNTERS_PER_CPU is passed, we silently enable METRICS_V2_KERNEL_COUNTERS too.

Great idea, i will go for it!

userspace/libpman/src/stats.c

FedeDP · 2024-08-28T10:21:18Z

/milestone 0.18.0

Andreagit97 · 2024-08-28T16:40:53Z

test/libscap/test_suites/engines/kmod/kmod.cpp

@@ -193,8 +193,8 @@ TEST(kmod, metrics_v2_check_per_CPU_stats)

 ssize_t num_online_CPUs = sysconf(_SC_NPROCESSORS_ONLN);

- // We want to check our CPUs counters
- uint32_t flags = METRICS_V2_KERNEL_COUNTERS;
+ // Enabling `METRICS_V2_KERNEL_COUNTERS_PER_CPU` we also enable `METRICS_V2_KERNEL_COUNTERS`


we should definitely unify the test for the 3 engines in some way because we are copying and pasting the same code 3 times for all the tests, in the end, the interface is the same... BTW I'm not doing it in this PR :/

userspace/libpman/src/stats.c

Andreagit97 · 2024-08-28T16:42:41Z

userspace/libscap/scap.c

@@ -302,6 +302,12 @@ int32_t scap_get_stats(scap_t* handle, scap_stats* stats)
 //
 const struct metrics_v2* scap_get_stats_v2(scap_t* handle, uint32_t flags, uint32_t* nstats, int32_t* rc)
 {
+ // If we enable per-cpu counters, we also enable kenrel global counters by default.


as suggested by @FedeDP

userspace/libscap/metrics_v2.h

FedeDP · 2024-08-29T06:36:21Z

Left a minor suggestion, otherwise LGTM!

userspace/libsinsp/metrics_collector.h

Andreagit97 · 2024-08-30T08:07:27Z

We should be fine now :)

FedeDP · 2024-08-31T08:11:58Z

There is a "tmp" commit still 😄

Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>

… enabled Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>

Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com> Co-authored-by: Federico Di Pierro <nierro92@gmail.com>

Andreagit97 · 2024-09-02T07:57:39Z

There is a "tmp" commit still 😄

reworded it, thank you!

FedeDP

/approve

poiana · 2024-09-02T08:04:05Z

LGTM label has been added.

Git tree hash: 8b865d3317e02cb4c9dddb77f2c8fda7cfae22d6

userspace/libpman/src/stats.c

Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com> Co-authored-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

incertum · 2024-09-03T17:58:04Z

userspace/libscap/engine/bpf/scap_bpf.c

 switch(stat)
 {
 case RUN_CNT:
- strlcat(stats[offset].name, bpf_libbpf_stats_names[RUN_CNT], sizeof(stats[offset].name));


Follow up here: Shouldn't this stay here, because we concat the name according to the switch statement?
Also we seem to use stat for the loop and here for the switch statement. Perhaps let's use separate wording for clarity?

The idea is to call strlcat just once with the generic variable stat

strlcat(stats[offset].name, bpf_libbpf_stats_names[stat], sizeof(stats[offset].name));

instead of triplicating the same line 3 times using an explicit enum

strlcat(stats[offset].name, bpf_libbpf_stats_names[RUN_CNT], sizeof(stats[offset].name)); strlcat(stats[offset].name, bpf_libbpf_stats_names[RUN_TIME_NS], sizeof(stats[offset].name)); strlcat(stats[offset].name, bpf_libbpf_stats_names[AVG_TIME_NS], sizeof(stats[offset].name));

Also we seem to use stat for the loop and here for the switch statement. Perhaps let's use separate wording for clarity?

I am not sure I got this, we are using stat (the index of the array) in the switch case to select the right metric

Looked again and yes stat is the index to the bpf_libbpf_stats ... I suppose a big confusion with stat and offset. Thanks for clarifying and also working on this.

incertum · 2024-09-03T18:03:32Z

userspace/libscap/engine/bpf/scap_bpf.c

@@ -1849,22 +1856,20 @@ const struct metrics_v2* scap_bpf_get_stats_v2(struct scap_engine_handle engine,
 {
 strlcpy(stats[offset].name, info.name, METRIC_NAME_MAX);
 }
+ strlcat(stats[offset].name, bpf_libbpf_stats_names[stat], sizeof(stats[offset].name));


[nit] re comment here https://github.com/falcosecurity/libs/pull/2031/files#diff-12833abd4271488260dae0ba178c6ad3f0bc63642f793a20b06ab4eb10d02cf9L1839 libbpf stats were introduced w/ kernel 5.1 so folks with lower kernels can't reach this code since we check for libbpf stats being enabled.

You are right but since usually many bpf features are backported I'm not so confident in removing it... I found this commit 957ab1c, unfortunately, I don't remember why I added it but i bet I had found an issue on some old machines...

Fair, yes the backports.

incertum

/approve

poiana · 2024-09-05T04:44:38Z

LGTM label has been added.

Git tree hash: 667ee5b9ecfd8e1c9ef82047cc3269bfe7a37aac

FedeDP

/approve

poiana · 2024-09-05T06:06:02Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, FedeDP, incertum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Andreagit97,FedeDP,incertum]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

FedeDP · 2024-09-05T06:06:16Z

/unhold

poiana added kind/cleanup do-not-merge/work-in-progress release-note dco-signoff: yes area/libscap-engine-bpf area/libscap-engine-kmod labels Aug 28, 2024

Andreagit97 marked this pull request as draft August 28, 2024 09:25

poiana added area/libscap-engine-modern-bpf size/L approved labels Aug 28, 2024

poiana requested review from hbrueckner and Molter73 August 28, 2024 09:25

poiana added the do-not-merge/hold label Aug 28, 2024

Andreagit97 commented Aug 28, 2024

View reviewed changes

userspace/libpman/src/stats.c Show resolved Hide resolved

poiana added this to the 0.18.0 milestone Aug 28, 2024

Andreagit97 force-pushed the cleanup_per_cpu_metrics branch from 5827698 to 35f2152 Compare August 28, 2024 16:43

Andreagit97 changed the title ~~[WIP] cleanup(engines): detach per-cpu kernel metrics from global kernel metrics~~ cleanup(engines): detach per-cpu kernel metrics from global kernel metrics Aug 28, 2024

Andreagit97 marked this pull request as ready for review August 28, 2024 16:44

poiana removed the do-not-merge/work-in-progress label Aug 28, 2024

poiana requested a review from leogr August 28, 2024 16:44

Andreagit97 force-pushed the cleanup_per_cpu_metrics branch from 35f2152 to e3442fc Compare August 28, 2024 16:54

Andreagit97 commented Aug 28, 2024

View reviewed changes

FedeDP reviewed Aug 29, 2024

View reviewed changes

userspace/libscap/metrics_v2.h Outdated Show resolved Hide resolved

Andreagit97 commented Aug 29, 2024

View reviewed changes

userspace/libsinsp/metrics_collector.h Show resolved Hide resolved

Andreagit97 and others added 4 commits September 2, 2024 09:56

cleanup(libscap): detach per-CPU counters from global kernel counters

fe0ffd5

Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>

fix(pman): remove a wrong flag

6224bcd

Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>

cleanup(libscap): always enable global counters when per-cpu ones are…

c431314

… enabled Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>

docs(libscap): add a comment

2300299

Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com> Co-authored-by: Federico Di Pierro <nierro92@gmail.com>

Andreagit97 force-pushed the cleanup_per_cpu_metrics branch from e85976c to 2300299 Compare September 2, 2024 07:57

FedeDP previously approved these changes Sep 2, 2024

View reviewed changes

poiana assigned FedeDP Sep 2, 2024

poiana added the lgtm label Sep 2, 2024

incertum reviewed Sep 2, 2024

View reviewed changes

userspace/libpman/src/stats.c Outdated Show resolved Hide resolved

cleanup: rename a parameter

c27ef9c

Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com> Co-authored-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

Andreagit97 dismissed FedeDP’s stale review via c27ef9c September 3, 2024 06:59

poiana removed the lgtm label Sep 3, 2024

poiana requested a review from FedeDP September 3, 2024 07:00

fix: use correct index for libbpf stats

6aaec6c

Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com> Co-authored-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

incertum reviewed Sep 3, 2024

View reviewed changes

incertum approved these changes Sep 5, 2024

View reviewed changes

poiana assigned incertum Sep 5, 2024

poiana added the lgtm label Sep 5, 2024

FedeDP approved these changes Sep 5, 2024

View reviewed changes

poiana removed the do-not-merge/hold label Sep 5, 2024

poiana merged commit b632379 into falcosecurity:master Sep 5, 2024
45 of 49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cleanup(engines): detach per-cpu kernel metrics from global kernel metrics #2031

cleanup(engines): detach per-cpu kernel metrics from global kernel metrics #2031

Andreagit97 commented Aug 28, 2024 •

edited

Loading

github-actions bot commented Aug 28, 2024

codecov bot commented Aug 28, 2024 •

edited

Loading

github-actions bot commented Aug 28, 2024 •

edited

Loading

Andreagit97 Aug 28, 2024

FedeDP Aug 28, 2024 •

edited

Loading

Andreagit97 Aug 28, 2024

FedeDP Aug 28, 2024 •

edited

Loading

Andreagit97 Aug 28, 2024

FedeDP Aug 28, 2024

Andreagit97 Aug 28, 2024

FedeDP commented Aug 28, 2024

Andreagit97 Aug 28, 2024

FedeDP Aug 29, 2024

Andreagit97 Aug 28, 2024

FedeDP commented Aug 29, 2024

Andreagit97 commented Aug 30, 2024

FedeDP commented Aug 31, 2024

Andreagit97 commented Sep 2, 2024 •

edited

Loading

FedeDP left a comment

poiana commented Sep 2, 2024

incertum Sep 3, 2024

Andreagit97 Sep 4, 2024

incertum Sep 5, 2024

incertum Sep 3, 2024

Andreagit97 Sep 4, 2024

incertum Sep 5, 2024

incertum left a comment

poiana commented Sep 5, 2024

FedeDP left a comment

poiana commented Sep 5, 2024

FedeDP commented Sep 5, 2024

cleanup(engines): detach per-cpu kernel metrics from global kernel metrics #2031

cleanup(engines): detach per-cpu kernel metrics from global kernel metrics #2031

Conversation

Andreagit97 commented Aug 28, 2024 • edited Loading

github-actions bot commented Aug 28, 2024

codecov bot commented Aug 28, 2024 • edited Loading

Codecov Report

github-actions bot commented Aug 28, 2024 • edited Loading

Perf diff from master - unit tests

Heap diff from master - unit tests

Heap diff from master - scap file

Benchmarks diff from master

Choose a reason for hiding this comment

FedeDP Aug 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FedeDP Aug 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FedeDP commented Aug 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FedeDP commented Aug 29, 2024

Andreagit97 commented Aug 30, 2024

FedeDP commented Aug 31, 2024

Andreagit97 commented Sep 2, 2024 • edited Loading

FedeDP left a comment

Choose a reason for hiding this comment

poiana commented Sep 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

incertum left a comment

Choose a reason for hiding this comment

poiana commented Sep 5, 2024

FedeDP left a comment

Choose a reason for hiding this comment

poiana commented Sep 5, 2024

FedeDP commented Sep 5, 2024

Andreagit97 commented Aug 28, 2024 •

edited

Loading

codecov bot commented Aug 28, 2024 •

edited

Loading

github-actions bot commented Aug 28, 2024 •

edited

Loading

FedeDP Aug 28, 2024 •

edited

Loading

FedeDP Aug 28, 2024 •

edited

Loading

Andreagit97 commented Sep 2, 2024 •

edited

Loading