-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new(libscap): save attached_progs + new libbpf stats -> complete scap_stats_v2 for modern bpf (new metrics 3a/n) #1044
new(libscap): save attached_progs + new libbpf stats -> complete scap_stats_v2 for modern bpf (new metrics 3a/n) #1044
Conversation
@Andreagit97 started looking into also adding the missing kernel side counters for modern_bpf. In the old probe we have that termination filler and Would adding bug and page faults drops also make sense?
|
a2ccff5
to
8eac22c
Compare
Uhm I would say that the best destinations are int err = bpf_ringbuf_output(rb, auxmap->data, auxmap->payload_pos, BPF_RB_NO_WAKEUP);
if(err)
{
counter->n_drops_buffer++;
} I would patch it in the following way: int err = bpf_ringbuf_output(rb, auxmap->data, auxmap->payload_pos, BPF_RB_NO_WAKEUP);
if(err)
{
/* Here we can add an helper that resolves the syscall and add the counter */
compute_syscall_stat(ctx, counter);
counter->n_drops_buffer++;
}
/// ...
static __always_inline void compute_syscall_stat(void *ctx, struct counter_map *counter)
{
int id = extract__syscall_id(ctx->regs);
switch(maps__get_ppm_sc(id)) /* here we can obtain ppm_sc to avoid ifdefs */
{
case PPM_EXECVE:
counter->n_execve...++;
....
}
} The bad news is that these 2 helpers
I would say no 🤔
|
Since we are touching the modern bpf driver we can defer this one to 0.12.0 milestone. |
6a2917d
to
4ff1ab8
Compare
Moving out of wip as it is now feature complete @Andreagit97 . Here is an example output on my test machine:
|
Small note: Handing this PR off to @Andreagit97 as I currently have limited availability. Thanks so much for your help Andrea 🚀 ! |
4ff1ab8
to
5dc917f
Compare
This should be ready for review :) |
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it> Co-authored-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
The last commit improves some cmake logs and modern bpf tests |
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
d86959d
to
957ab1c
Compare
the last commit fixes missing names in bpf stats on old kernels |
There are some issues with GitHub actions we will try again tomorrow maybe we will be luckier 🤞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Andreagit97 proposing @FedeDP takes a look at the changes re available CPUs, the other changes LGTM!
*/ | ||
static __always_inline void ringbuf__store_event_header(struct ringbuf_struct *ringbuf, u32 event_type) | ||
static __always_inline void ringbuf__store_event_header(struct ringbuf_struct *ringbuf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this general refactor to include event_type in the structs throughout, much easier and cleaner :)
* This is extracted from `libbpf_num_possible_cpus()`. | ||
* We avoid to include libbpf just for this helper. | ||
*/ | ||
static int parse_cpu_mask_str(const char *s, bool **mask, int *mask_sz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FedeDP could you check on these changes as you worked on these parts in the past as well? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this a copy and paste of what libbpf_num_possible_cpus
does, it is more precise than sysconf(_SC_NPROCESSORS_CONF)
and i faced some cases in which we need more precision to have a correct assertion in these tests
ASSERT_GT(nstats, 0); | ||
|
||
/* These names should always be available */ | ||
std::unordered_set<std::string> minimal_stats_name = {"n_evts", "sys_enter.run_cnt", "sys_enter.run_time_ns", "sys_exit.run_cnt", "sys_exit.run_time_ns", "signal_deliver.run_cnt", "signal_deliver.run_time_ns"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a much better test, thanks!
userspace/libpman/include/libpman.h
Outdated
@@ -25,6 +25,10 @@ extern "C" | |||
{ | |||
#endif | |||
|
|||
/* Forward decleare them */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, but just in case you wanted to fix some types, declare
and also Return a
scap_stats_v2 ...
below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure i will do it, thanks!
*/ | ||
int pman_get_scap_stats_v2(void* scap_stats_v2_struct, uint32_t flags, uint32_t* nstats); | ||
struct scap_stats_v2* pman_get_scap_stats_v2(uint32_t flags, uint32_t* nstats, int32_t* rc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm we now change this again? Didn't we discuss in the initial scap stats v2 PR that we want to return the error code? Either way is fine with me just wanted to point out that it initially was similar to how you changed it and then based on reviewers comments changed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's true you are right, in the end, this approach seems more simple to manage in the code so I would go for it! If anyone has something against it I can change it again of course
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with leaving as is :)
/* offset in stats buffer */ | ||
int offset = 0; | ||
|
||
/* If it is the first time we call this function we populate the stats */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice, yes this is a good alternative way to allocate the stats buffer :)
@@ -1756,7 +1756,18 @@ const struct scap_stats_v2* scap_bpf_get_stats_v2(struct scap_engine_handle engi | |||
} | |||
stats[offset].type = STATS_VALUE_TYPE_U64; | |||
stats[offset].flags = PPM_SCAP_STATS_LIBBPF_STATS; | |||
strlcpy(stats[offset].name, info.name, STATS_NAME_MAX); | |||
/* This could happen on old kernels where we don't have names inside the info struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Andreagit97 on that note: Just occurred to me, perhaps we could add some more comments throughout that libbpf stats are only supported starting kernel 5.1 or if those feature were back ported. We check for the kernel settings plus in the scap open example I give hints, but maybe we can do more even more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, could you clarify what "old kernels" here means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! Done :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; left some minor comments!
@@ -24,12 +24,12 @@ int BPF_PROG(pf_kernel, | |||
} | |||
|
|||
struct ringbuf_struct ringbuf; | |||
if(!ringbuf__reserve_space(&ringbuf, ctx, PAGE_FAULT_SIZE)) | |||
if(!ringbuf__reserve_space(&ringbuf, ctx, PAGE_FAULT_SIZE, PPME_PAGE_FAULT_E)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am looking forward to fill the event size somewhere (event_table perhaps?) so that we can drop all the EVENT_SIZE
defines :)
Now that we pass the event type too, it should be a quick change but would be even safer too! (ie: no typos allowed!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep we definitely need to do that!
@@ -15,12 +15,12 @@ int BPF_PROG(bpf_e, | |||
long id) | |||
{ | |||
struct ringbuf_struct ringbuf; | |||
if(!ringbuf__reserve_space(&ringbuf, ctx, BPF_E_SIZE)) | |||
if(!ringbuf__reserve_space(&ringbuf, ctx, BPF_E_SIZE, PPME_SYSCALL_BPF_2_E)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also wondering whether we can move ringbuf__store_event_header
directly inside the reserve_space
function.
I have no strong opinion btw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point, there are some real corner cases in which they are not used together, see the hotplug.bpf.c
file, btw yes we can think of merging them one day, this would simplify the flow, but i would postpone it in another PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it will be part of the aforementioned big event_table
events sizes refactor i think, if we'll ever get to it :D
@@ -813,6 +813,11 @@ scap_threadinfo* scap_get_proc_table(scap_t* handle) | |||
// | |||
int32_t scap_get_stats(scap_t* handle, OUT scap_stats* stats) | |||
{ | |||
if(stats == NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the same check to scap_get_stats_v2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in scap_get_stats_v2
we receive the pointers from the engine so it cannot be null or at least it would be the return value of scap_get_stats_v2
so probably the caller needs to check that is not null
const struct scap_stats_v2* scap_get_stats_v2(scap_t* handle, uint32_t flags, OUT uint32_t* nstats, OUT int32_t* rc)
{
if(handle->m_vtable)
{
return handle->m_vtable->get_stats_v2(handle->m_engine, flags, nstats, rc);
}
ASSERT(false);
*nstats = 0;
*rc = SCAP_FAILURE;
return NULL;
}
and so we check it in sinsp
const struct scap_stats_v2* sinsp::get_capture_stats_v2(uint32_t flags, uint32_t* nstats, int32_t* rc) const
{
/* On purpose ignoring failures to not interrupt in case of stats retrieval failure. */
const struct scap_stats_v2* stats_v2 = scap_get_stats_v2(m_h, flags, nstats, rc);
if (!stats_v2)
{
*nstats = 0;
return NULL;
}
return stats_v2;
}
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it> Co-authored-by: Federico Di Pierro <nierro92@gmail.com> Co-authored-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
LGTM label has been added. Git tree hash: e420cefb120fdedd958f63f322d4e068fe94e8da
|
Closing and reopening to trigger @poiana |
Again, hoping GitHub is working today 👼 /close |
@leogr: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@leogr: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: FedeDP, incertum, leogr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
What type of PR is this?
/kind cleanup
/kind feature
Any specific area of the project related to this PR?
/area libscap-engine-modern-bpf
/area libscap
Does this PR require a change in the driver versions?
What this PR does / why we need it:
Continuation of #1021 (review) for modern bpf plus try reaching parity in existing stats in old bpf.
CC @Andreagit97
Which issue(s) this PR fixes:
falcosecurity/falco#2222
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: