new(libscap): save attached_progs + new libbpf stats -> complete scap_stats_v2 for modern bpf (new metrics 3a/n) #1044

incertum · 2023-04-12T04:59:08Z

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap-engine-udig

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Continuation of #1021 (review) for modern bpf plus try reaching parity in existing stats in old bpf.

CC @Andreagit97

Which issue(s) this PR fixes:

falcosecurity/falco#2222

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

new(libscap): save attached_progs + new libbpf stats for modern bpf (new metrics 3a/n)

incertum · 2023-04-12T05:05:50Z

@Andreagit97 started looking into also adding the missing kernel side counters for modern_bpf. In the old probe we have that termination filler and state->tail_ctx.prev_res and state->tail_ctx.evt_type available, unclear what the best translation for modern_bpf would be? Figured let's ask the master of creation of modern_bpf first 😉

Would adding bug and page faults drops also make sense?

struct counter_map
{
	uint64_t n_evts;		 /* Number of events correctly sent to userspace. */
	uint64_t n_drops_buffer;	 /* Number of drops due to a full ringbuf. */
	uint64_t n_drops_buffer_clone_fork_enter;
	uint64_t n_drops_buffer_clone_fork_exit;
	uint64_t n_drops_buffer_execve_enter;
	uint64_t n_drops_buffer_execve_exit;
	uint64_t n_drops_buffer_connect_enter;
	uint64_t n_drops_buffer_connect_exit;
	uint64_t n_drops_buffer_open_enter;
	uint64_t n_drops_buffer_open_exit;
	uint64_t n_drops_buffer_dir_file_enter;
	uint64_t n_drops_buffer_dir_file_exit;
	uint64_t n_drops_buffer_other_interest_enter;
	uint64_t n_drops_buffer_other_interest_exit;
	uint64_t n_drops_max_event_size; /* Number of drops due to an excessive event size (>64KB). */
	uint64_t n_drops_pf;
	uint64_t n_drops_bug;
};

Andreagit97 · 2023-04-12T12:00:43Z

what the best translation for modern_bpf would be?

Uhm I would say that the best destinations are ringbuf__reserve_space and auxmap__submit_event they both have an if similar to this

	int err = bpf_ringbuf_output(rb, auxmap->data, auxmap->payload_pos, BPF_RB_NO_WAKEUP);
	if(err)
	{
		counter->n_drops_buffer++;
	}

I would patch it in the following way:

	int err = bpf_ringbuf_output(rb, auxmap->data, auxmap->payload_pos, BPF_RB_NO_WAKEUP);
	if(err)
	{
               /* Here we can add an helper that resolves the syscall and add the counter */
                compute_syscall_stat(ctx, counter);
		counter->n_drops_buffer++;
	}


/// ...

static __always_inline void compute_syscall_stat(void *ctx, struct counter_map *counter)
{
    int id = extract__syscall_id(ctx->regs);
    switch(maps__get_ppm_sc(id)) /* here we can obtain ppm_sc to avoid ifdefs */
    {
        case PPM_EXECVE:
             counter->n_execve...++;

        ....
    }
}

The bad news is that these 2 helpers ringbuf__reserve_space and auxmap__submit_event will be used also by non-syscall hooks so i think that at a certain point in time the verifier will complain since the ctx is different for these programs and doesn't contain the regs field. BTW at the moment this seems the most reasonable way to address the problem, maybe in the future we could find something better when we will face the issue

Would adding bug and page faults drops also make sense?

I would say no 🤔

PPM_FAILURE_BUG is something we have because, in the old drivers, we have runtime checks on the number of parameters each event must send, but we want to deprecate it so I would avoid a similar logic in the modern. Probably a simple debug print in some places (for example when we are not able to obtain a bpf map) could be enough, there is no need for a counter.
PPM_FAILURE_INVALID_USER_MEMORY is something I tried to remove since my first day 😆 This code causes the whole event to be discarded so in many cases we patch it on the fly before it reaches the final switch with the counters. So yes, I would avoid a counter for this stuff

incertum · 2023-04-13T03:18:14Z

Since we are touching the modern bpf driver we can defer this one to 0.12.0 milestone.

incertum · 2023-04-27T21:46:51Z

Moving out of wip as it is now feature complete @Andreagit97 .

Here is an example output on my test machine:

[SCAP-OPEN]: General statistics

Events correctly captured (SCAP_SUCCESS): 13160187
Seen by driver (kernel side events): 13160291
Time elapsed: 246 s
Rate of userspace events (events/second): 53496
Rate of kernel side events (events/second): 53497
Number of timeouts: 320272
Number of 'next' calls: 13480459

[SCAP-OPEN]: Stats v2.

[SCAP-OPEN]: 37 metrics in total
[SCAP-OPEN]: [1] kernel-side counters
[SCAP-OPEN]: [2] libbpf stats (compare to `bpftool prog show` CLI)

[1] n_evts: 13160291
[1] n_drops_buffer_total: 0
[1] n_drops_buffer_clone_fork_enter: 0
[1] n_drops_buffer_clone_fork_exit: 0
[1] n_drops_buffer_execve_enter: 0
[1] n_drops_buffer_execve_exit: 0
[1] n_drops_buffer_connect_enter: 0
[1] n_drops_buffer_connect_exit: 0
[1] n_drops_buffer_open_enter: 0
[1] n_drops_buffer_open_exit: 0
[1] n_drops_buffer_dir_file_enter: 0
[1] n_drops_buffer_dir_file_exit: 0
[1] n_drops_buffer_other_interest_enter: 0
[1] n_drops_buffer_other_interest_exit: 0
[1] n_drops_scratch_map: 0
[1] n_drops: 0
[2] sys_enter.run_cnt: 4293383
[2] sys_enter.run_time_ns: 2132921040
[2] sys_enter.avg_time_ns: 496
[2] sys_exit.run_cnt: 4293559
[2] sys_exit.run_time_ns: 2872383191
[2] sys_exit.avg_time_ns: 668
[2] sched_proc_exit.run_cnt: 219
[2] sched_proc_exit.run_time_ns: 135208
[2] sched_proc_exit.avg_time_ns: 617
[2] sched_switch.run_cnt: 2002123
[2] sched_switch.run_time_ns: 2528098684
[2] sched_switch.avg_time_ns: 1262
[2] pf_user.run_cnt: 2567124
[2] pf_user.run_time_ns: 339707867
[2] pf_user.avg_time_ns: 132
[2] pf_kernel.run_cnt: 3861
[2] pf_kernel.run_time_ns: 1099374
[2] pf_kernel.avg_time_ns: 284
[2] signal_deliver.run_cnt: 262
[2] signal_deliver.run_time_ns: 153263
[2] signal_deliver.avg_time_ns: 584

incertum · 2023-05-02T21:49:44Z

Small note: Handing this PR off to @Andreagit97 as I currently have limited availability. Thanks so much for your help Andrea 🚀 !

Andreagit97 · 2023-05-10T09:53:07Z

This should be ready for review :)

Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it> Co-authored-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

Andreagit97 · 2023-05-10T12:39:10Z

The last commit improves some cmake logs and modern bpf tests

Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>

Andreagit97 · 2023-05-10T13:39:59Z

the last commit fixes missing names in bpf stats on old kernels

Andreagit97 · 2023-05-10T15:55:15Z

There are some issues with GitHub actions we will try again tomorrow maybe we will be luckier 🤞

incertum

Thanks @Andreagit97 proposing @FedeDP takes a look at the changes re available CPUs, the other changes LGTM!

incertum · 2023-05-11T07:38:09Z

driver/modern_bpf/helpers/store/ringbuf_store_params.h

 */
-static __always_inline void ringbuf__store_event_header(struct ringbuf_struct *ringbuf, u32 event_type)
+static __always_inline void ringbuf__store_event_header(struct ringbuf_struct *ringbuf)


I like this general refactor to include event_type in the structs throughout, much easier and cleaner :)

incertum · 2023-05-11T07:48:26Z

test/libscap/helpers/engines.cpp

+ * This is extracted from `libbpf_num_possible_cpus()`.
+ * We avoid to include libbpf just for this helper.
+ */
+static int parse_cpu_mask_str(const char *s, bool **mask, int *mask_sz)


@FedeDP could you check on these changes as you worked on these parts in the past as well? Thanks!

this a copy and paste of what libbpf_num_possible_cpus does, it is more precise than sysconf(_SC_NPROCESSORS_CONF) and i faced some cases in which we need more precision to have a correct assertion in these tests

incertum · 2023-05-11T07:50:03Z

test/libscap/test_suites/engines/bpf/bpf.cpp

+	ASSERT_GT(nstats, 0);
+
+	/* These names should always be available */
+	std::unordered_set<std::string> minimal_stats_name = {"n_evts", "sys_enter.run_cnt", "sys_enter.run_time_ns", "sys_exit.run_cnt", "sys_exit.run_time_ns", "signal_deliver.run_cnt", "signal_deliver.run_time_ns"};


This is a much better test, thanks!

incertum · 2023-05-11T08:04:27Z

userspace/libpman/include/libpman.h

@@ -25,6 +25,10 @@ extern "C"
 {
 #endif

+	/* Forward decleare them */


Not needed, but just in case you wanted to fix some types, declare and also Return a scap_stats_v2 ... below

sure i will do it, thanks!

incertum · 2023-05-11T08:10:24Z

userspace/libpman/include/libpman.h

 	 */
-	int pman_get_scap_stats_v2(void* scap_stats_v2_struct, uint32_t flags, uint32_t* nstats);
+	struct scap_stats_v2* pman_get_scap_stats_v2(uint32_t flags, uint32_t* nstats, int32_t* rc);


Hmm we now change this again? Didn't we discuss in the initial scap stats v2 PR that we want to return the error code? Either way is fine with me just wanted to point out that it initially was similar to how you changed it and then based on reviewers comments changed it.

that's true you are right, in the end, this approach seems more simple to manage in the code so I would go for it! If anyone has something against it I can change it again of course

I am fine with leaving as is :)

incertum · 2023-05-11T08:18:21Z

userspace/libpman/src/stats.c

+	/* offset in stats buffer */
+	int offset = 0;
+
+	/* If it is the first time we call this function we populate the stats */


Oh nice, yes this is a good alternative way to allocate the stats buffer :)

incertum · 2023-05-11T08:26:21Z

userspace/libscap/engine/bpf/scap_bpf.c

@@ -1756,7 +1756,18 @@ const struct scap_stats_v2* scap_bpf_get_stats_v2(struct scap_engine_handle engi
 				}
 				stats[offset].type = STATS_VALUE_TYPE_U64;
 				stats[offset].flags = PPM_SCAP_STATS_LIBBPF_STATS;
-				strlcpy(stats[offset].name, info.name, STATS_NAME_MAX);
+				/* This could happen on old kernels where we don't have names inside the info struct


@Andreagit97 on that note: Just occurred to me, perhaps we could add some more comments throughout that libbpf stats are only supported starting kernel 5.1 or if those feature were back ported. We check for the kernel settings plus in the scap open example I give hints, but maybe we can do more even more?

In addition, could you clarify what "old kernels" here means?

good point! Done :)

FedeDP

LGTM; left some minor comments!

FedeDP · 2023-05-11T08:30:02Z

driver/modern_bpf/programs/attached/events/page_fault_kernel.bpf.c

@@ -24,12 +24,12 @@ int BPF_PROG(pf_kernel,
 	}

 	struct ringbuf_struct ringbuf;
-	if(!ringbuf__reserve_space(&ringbuf, ctx, PAGE_FAULT_SIZE))
+	if(!ringbuf__reserve_space(&ringbuf, ctx, PAGE_FAULT_SIZE, PPME_PAGE_FAULT_E))


I am looking forward to fill the event size somewhere (event_table perhaps?) so that we can drop all the EVENT_SIZE defines :)
Now that we pass the event type too, it should be a quick change but would be even safer too! (ie: no typos allowed!)

yep we definitely need to do that!

FedeDP · 2023-05-11T08:40:15Z

driver/modern_bpf/programs/tail_called/events/syscall_dispatched_events/bpf.bpf.c

@@ -15,12 +15,12 @@ int BPF_PROG(bpf_e,
 	     long id)
 {
 	struct ringbuf_struct ringbuf;
-	if(!ringbuf__reserve_space(&ringbuf, ctx, BPF_E_SIZE))
+	if(!ringbuf__reserve_space(&ringbuf, ctx, BPF_E_SIZE, PPME_SYSCALL_BPF_2_E))


I am also wondering whether we can move ringbuf__store_event_header directly inside the reserve_space function.
I have no strong opinion btw.

that's a good point, there are some real corner cases in which they are not used together, see the hotplug.bpf.c file, btw yes we can think of merging them one day, this would simplify the flow, but i would postpone it in another PR :)

Yep, it will be part of the aforementioned big event_table events sizes refactor i think, if we'll ever get to it :D

userspace/libpman/include/libpman.h

FedeDP · 2023-05-11T09:16:15Z

userspace/libscap/scap.c

@@ -813,6 +813,11 @@ scap_threadinfo* scap_get_proc_table(scap_t* handle)
 //
 int32_t scap_get_stats(scap_t* handle, OUT scap_stats* stats)
 {
+	if(stats == NULL)


Can we add the same check to scap_get_stats_v2?

in scap_get_stats_v2 we receive the pointers from the engine so it cannot be null or at least it would be the return value of scap_get_stats_v2 so probably the caller needs to check that is not null

const struct scap_stats_v2* scap_get_stats_v2(scap_t* handle, uint32_t flags, OUT uint32_t* nstats, OUT int32_t* rc) { if(handle->m_vtable) { return handle->m_vtable->get_stats_v2(handle->m_engine, flags, nstats, rc); } ASSERT(false); *nstats = 0; *rc = SCAP_FAILURE; return NULL; }

and so we check it in sinsp

const struct scap_stats_v2* sinsp::get_capture_stats_v2(uint32_t flags, uint32_t* nstats, int32_t* rc) const { /* On purpose ignoring failures to not interrupt in case of stats retrieval failure. */ const struct scap_stats_v2* stats_v2 = scap_get_stats_v2(m_h, flags, nstats, rc); if (!stats_v2) { *nstats = 0; return NULL; } return stats_v2; }

Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it> Co-authored-by: Federico Di Pierro <nierro92@gmail.com> Co-authored-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

incertum

LGTM! Thank you!

FedeDP

/approve

poiana · 2023-05-11T13:09:52Z

LGTM label has been added.

Git tree hash: e420cefb120fdedd958f63f322d4e068fe94e8da

leogr · 2023-05-11T16:48:48Z

Closing and reopening to trigger @poiana

leogr · 2023-05-12T07:50:23Z

Closing and reopening to trigger @poiana

Again, hoping GitHub is working today 👼

/close

poiana · 2023-05-12T07:50:40Z

@leogr: Closed this PR.

In response to this:

Closing and reopening to trigger @poiana

Again, hoping GitHub is working today 👼

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

leogr · 2023-05-12T07:50:44Z

/reopen

poiana · 2023-05-12T07:50:57Z

@leogr: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

leogr

/approve

poiana · 2023-05-12T07:51:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FedeDP, incertum, leogr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [FedeDP,incertum,leogr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

FedeDP · 2023-05-12T07:52:08Z

tide plugin is still 💀

poiana added release-note do-not-merge/work-in-progress kind/cleanup dco-signoff: yes kind/feature New feature or request labels Apr 12, 2023

poiana requested review from hbrueckner and LucaGuerra April 12, 2023 04:59

poiana added area/libscap-engine-modern-bpf approved area/libscap size/M labels Apr 12, 2023

incertum force-pushed the libbpf-stats-modern-bpf branch from a2ccff5 to 8eac22c Compare April 12, 2023 05:09

poiana added size/XXL and removed size/M labels Apr 13, 2023

incertum added this to the 0.12.0 milestone Apr 13, 2023

incertum force-pushed the libbpf-stats-modern-bpf branch from 6a2917d to 4ff1ab8 Compare April 27, 2023 21:39

incertum changed the title ~~wip: new(libscap): save attached_progs + new libbpf stats for modern bpf (new metrics 3a/n)~~ new(libscap): save attached_progs + new libbpf stats for modern bpf (new metrics 3a/n) Apr 27, 2023

poiana removed the do-not-merge/work-in-progress label Apr 27, 2023

incertum changed the title ~~new(libscap): save attached_progs + new libbpf stats for modern bpf (new metrics 3a/n)~~ new(libscap): save attached_progs + new libbpf stats -> complete scap_stats_v2 for modern bpf (new metrics 3a/n) Apr 27, 2023

incertum mentioned this pull request Apr 28, 2023

new(falco): introduce new metrics w/ Falco internal: metrics snapshot option and new metrics config falcosecurity/falco#2333

Merged

Andreagit97 modified the milestones: 0.12.0, 0.11.0 May 8, 2023

Andreagit97 force-pushed the libbpf-stats-modern-bpf branch from 4ff1ab8 to 5dc917f Compare May 10, 2023 09:42

new(driver): support new stats in the modern bpf probe

42ad122

Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it> Co-authored-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

fix: obtain a name also for old kernels

957ab1c

Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>

Andreagit97 force-pushed the libbpf-stats-modern-bpf branch from d86959d to 957ab1c Compare May 10, 2023 13:35

incertum commented May 11, 2023

View reviewed changes

FedeDP reviewed May 11, 2023

View reviewed changes

update: address some review comments

f3a9a56

Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it> Co-authored-by: Federico Di Pierro <nierro92@gmail.com> Co-authored-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

incertum commented May 11, 2023

View reviewed changes

FedeDP approved these changes May 11, 2023

View reviewed changes

poiana assigned FedeDP May 11, 2023

poiana added the lgtm label May 11, 2023

leogr approved these changes May 11, 2023

View reviewed changes

poiana assigned leogr May 11, 2023

leogr approved these changes May 11, 2023

View reviewed changes

leogr closed this May 11, 2023

leogr reopened this May 11, 2023

poiana closed this May 12, 2023

poiana reopened this May 12, 2023

leogr approved these changes May 12, 2023

View reviewed changes

poiana merged commit 8fde485 into falcosecurity:master May 12, 2023

incertum deleted the libbpf-stats-modern-bpf branch December 8, 2023 20:40

new(libscap): save attached_progs + new libbpf stats -> complete scap_stats_v2 for modern bpf (new metrics 3a/n) #1044

new(libscap): save attached_progs + new libbpf stats -> complete scap_stats_v2 for modern bpf (new metrics 3a/n) #1044

Conversation

incertum commented Apr 12, 2023

incertum commented Apr 12, 2023

Andreagit97 commented Apr 12, 2023 • edited Loading

incertum commented Apr 13, 2023

incertum commented Apr 27, 2023

incertum commented May 2, 2023

Andreagit97 commented May 10, 2023

Andreagit97 commented May 10, 2023

Andreagit97 commented May 10, 2023

Andreagit97 commented May 10, 2023

incertum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FedeDP left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

incertum left a comment

Choose a reason for hiding this comment

FedeDP left a comment

Choose a reason for hiding this comment

poiana commented May 11, 2023

leogr commented May 11, 2023

leogr commented May 12, 2023

poiana commented May 12, 2023

leogr commented May 12, 2023

poiana commented May 12, 2023

leogr left a comment

Choose a reason for hiding this comment

poiana commented May 12, 2023

FedeDP commented May 12, 2023

Andreagit97 commented Apr 12, 2023 •

edited

Loading