-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new(libscap, libsinsp): new scap_stats_v2
API, featuring new libbpf
bpftool prog show like stats for bpf (new metrics 3/n)
#1021
Conversation
New scap-open test binary stats output now looks like this
|
bpftool prog show
like scap_libbpf_statsbpftool prog show
like scap_libbpf_stats (new metrics 3/n)
Uhm I have some doubts about this PR... Let's say in this release we have removed the If we want to expose again some tracepoints information, probably I would enrich the actual typedef struct bpf_prog_stats
{
uint64_t run_cnt;
uint64_t run_time_ns;
// ...
}bpf_prog_stats;
/*!
\brief Statistics about an in progress capture
*/
typedef struct scap_stats
{
uint64_t n_evts; ///< Total number of events that were received by the driver.
uint64_t n_drops; ///< Number of dropped events.
uint64_t n_drops_buffer; ///< Number of dropped events caused by full buffer.
uint64_t n_drops_buffer_clone_fork_enter;
uint64_t n_drops_buffer_clone_fork_exit;
uint64_t n_drops_buffer_execve_enter;
uint64_t n_drops_buffer_execve_exit;
uint64_t n_drops_buffer_connect_enter;
uint64_t n_drops_buffer_connect_exit;
uint64_t n_drops_buffer_open_enter;
uint64_t n_drops_buffer_open_exit;
uint64_t n_drops_buffer_dir_file_enter;
uint64_t n_drops_buffer_dir_file_exit;
uint64_t n_drops_buffer_other_interest_enter;
uint64_t n_drops_buffer_other_interest_exit;
uint64_t n_drops_scratch_map; ///< Number of dropped events caused by full frame scratch map.
uint64_t n_drops_pf; ///< Number of dropped events caused by invalid memory access.
uint64_t n_drops_bug; ///< Number of dropped events caused by an invalid condition in the kernel instrumentation.
uint64_t n_preemptions; ///< Number of preemptions.
uint64_t n_suppressed; ///< Number of events skipped due to the tid being in a set of suppressed tids.
uint64_t n_tids_suppressed; ///< Number of threads currently being suppressed.
bpf_prog_stats sys_enter;
bpf_prog_stats sys_exit;
//...
}scap_stats; in the old probe, you can just cut and paste the actual implementation, while in the modern bpf we should do something like this for(int index = 0; index < g_state.n_possible_cpus; index++)
{
if(bpf_map_lookup_elem(counter_maps_fd, &index, &cnt_map) < 0)
{
snprintf(error_message, MAX_ERROR_MESSAGE_LEN, "unbale to get the counter map for CPU %d", index);
pman_print_error((const char *)error_message);
goto clean_print_stats;
}
stats->n_evts += cnt_map.n_evts;
stats->n_drops_buffer += cnt_map.n_drops_buffer;
stats->n_drops_scratch_map += cnt_map.n_drops_max_event_size;
stats->n_drops += (cnt_map.n_drops_buffer + cnt_map.n_drops_max_event_size);
}
/* Enhanced logic... */
int ret = 0;
struct bpf_prog_info info = {0};
u32 info_len = sizeof(info);
ret = bpf_obj_get_info_by_fd(bpf_program__fd(g_state.skel->progs.sys_enter), &info, &info_len);
if(ret!=0)
{
stats->sys_enter.run_cnt = info.run_cnt;
stats->sys_enter.run_time_ns = info.run_time_ns;
// ...
} I don't like too much this approach but I don't see many other ways, at least right now 🤔 Some random thoughts:
|
Thank you @Andreagit97 for your input. Offering some additional context:
Curious to hear more feedback. Thinking behind introducing a new libbpf stats struct is that we want to call the libbpf stats less frequently and opt-in. We could accomplish some gating within the existing get_stats, but would we want to or would keeping them separate be cleaner? Re the implementation suggestion above would be hesitant to hard-code out an exact struct by tracepoint name as it's going to generate a lot more maintenance overhead. The simple array that we populate with the new bpf struct is more dynamic and will automatically extend to new tracepoints as the name is merely part of the struct.
Heard, see comment above. At the end either way will be fine, let's collect more feedback and collectively make a decision. |
Yay, another tracepoint discussion :D As much as I'm against exposing tracepoints in the scap/sinsp API, I'm all for returning info about them in resource utilization stats, as long as we explicitly document them as unstable (not part of any public contract). On an implementation level, personally I would add yet another vtable method (eventually maybe drop the current get_stats one) with an API like:
where the engine has full control over the names of the stats fields and they need not have any specific structure (unless we choose to do so) It would be used like this:
i.e. the caller provides a buffer of arbitrary size and the engine fills it with as many stats as it likes, using the Of course there are many decisions here, like
|
Will take some time to review this deeper, but I'm 100% onboard with @gnosek's proposal. This will also be the first step for a dynamic and implementation agnostic stats collection, which we would still want to achieve in libsinsp somewhere in the future. I think also satisfies both the information hiding and flexibility need that we brought to the table. |
Thanks @gnosek and @jasondellaluce for a preview, meaning this is now 3 times feedback for consolidating to one stats, hence would call this a decision wrt to this aspect. As far as the implementation details let's gather more feedback and something dynamic like @gnosek suggested would make me happy :) Plus we need to have gates in place to tell which stats category to look up so that it can be called frequently without unnecessary lookups if for example all you want is the counters not the libbpf stats ... |
Just a thought but the engines have a ->configure method, we can add new settings there easily |
Great, let me move this PR to WIP since we need to refactor it quite a bit before it's ready! The more feedback the better and I'll start a refactor trying to take everyones input into consideration. |
bpftool prog show
like scap_libbpf_stats (new metrics 3/n)bpftool prog show
like scap_libbpf_stats (new metrics 3/n)
Pushed a staging commit for checkin. Not yet as elegant as @gnosek suggestion, because Grzegorz is like light years somewhere else in C coding skills 🤣 Furthermore, could we ease in the transition with an interim Flattening out the schema kind of works pretty well, more ideas around the schema? The chosen metadata field names are generic enough to cover many metrics use cases.
current new scap open test output:
If we agree that this is the right first iteration, would go ahead and remove all the libbpf_stats references and cleanup more as I also wasn't sure where to init the stats buffer including defining the buffer size, feedback appreciated :) |
…tering Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Co-authored-by: Jason Dellaluce <jasondellaluce@gmail.com> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Co-authored-by: Jason Dellaluce <jasondellaluce@gmail.com> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Co-authored-by: Grzegorz Nosek <grzegorz.nosek@sysdig.com> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
* apply reviewers feedback * stats static const sized allocated in each engine for now * general cleanup, such as re-audit returns on possible failures * more comments Co-authored-by: Jason Dellaluce <jasondellaluce@gmail.com> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Co-authored-by: Andrea Terzolo <andrea.terzolo@polito.it> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Co-authored-by: Andrea Terzolo <andrea.terzolo@polito.it> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Co-authored-by: Grzegorz Nosek <grzegorz.nosek@sysdig.com> Co-authored-by: Jason Dellaluce <jasondellaluce@gmail.com> Co-authored-by: Andrea Terzolo <andrea.terzolo@polito.it> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Co-authored-by: Grzegorz Nosek <grzegorz.nosek@sysdig.com> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
if (offset > nstats_allocated - 1) | ||
{ | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it's a bug if we ended up in here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes would say so, want some error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes probably here we need to set an error message and return immediately with SCAP_FAILURE
since we should never reach this point
Co-authored-by: Grzegorz Nosek <grzegorz.nosek@sysdig.com> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
LGTM label has been added. Git tree hash: 5e266b79eed23903660fde26efa7a8ce017f3e86
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will approve it so we can go on with the next steps, but I would like to see these comments addressed before releasing the final version :)
/approve
snprintf(error_message, MAX_ERROR_MESSAGE_LEN, "unable to get the counter map for CPU %d", index); | ||
pman_print_error((const char *)error_message); | ||
close(counter_maps_fd); | ||
return -ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return -ret; | |
return SCAP_FAILURE; |
uint32_t nstats; | ||
int32_t rc; | ||
const scap_stats_v2* stats_v2; | ||
stats_v2 = scap_get_stats_v2(h, flags, &nstats, &rc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for other PRs: we could also test failure cases like a NULL stats vector or stuff like that
const struct scap_stats_v2* scap_modern_bpf__get_stats_v2(struct scap_engine_handle engine, uint32_t flags, OUT uint32_t* nstats, OUT int32_t* rc) | ||
{ | ||
*rc = SCAP_SUCCESS; | ||
if(pman_get_scap_stats_v2((void*)engine.m_handle->m_stats, flags, (void*)nstats)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(pman_get_scap_stats_v2((void*)engine.m_handle->m_stats, flags, (void*)nstats)) | |
if(pman_get_scap_stats_v2((void*)engine.m_handle->m_stats, flags, nstats)) |
{ | ||
struct scap_stats_v2 *stats = (struct scap_stats_v2 *)scap_stats_v2_struct; | ||
*nstats = 0; | ||
int ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we use SCAP_FAILURE
this should be no more necessary
int ret; |
RUN_TIME_NS, | ||
AVG_TIME_NS, | ||
BPF_MAX_LIBBPF_STATS, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add the typedef
also here like with bpf_kernel_counters_stats
const struct scap_stats_v2* engine::get_stats_v2(uint32_t flags, uint32_t* nstats, int32_t* rc) | ||
{ | ||
*nstats = scap_gvisor::stats::MAX_GVISOR_COUNTERS_STATS; | ||
scap_stats_v2* stats = engine::m_stats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scap_stats_v2* stats = engine::m_stats; | |
scap_stats_v2* stats = this->m_stats; |
@@ -29,6 +29,13 @@ limitations under the License. | |||
#include <sys/utsname.h> | |||
#include "ringbuffer/ringbuffer.h" | |||
|
|||
const char * const modern_bpf_kernel_counters_stats_names[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capture.c
in libpman seems the right place for this... we don't use it here
MODERN_BPF_MAX_KERNEL_COUNTERS_STATS | ||
}modern_bpf_kernel_counters_stats; | ||
|
||
extern const char * const modern_bpf_kernel_counters_stats_names[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we move the declaration in capture.c
we don't need this
// | ||
// machine_info flags | ||
// | ||
#define PPM_BPF_STATS_ENABLED (1 << 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could remove the PPM_
prefix and use just SCAP_
WDYT?
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Andreagit97, incertum, jasondellaluce The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind cleanup
/kind feature
Any specific area of the project related to this PR?
/area driver-bpf
/area driver-modern-bpf
/area libscap-engine-bpf
/area libscap-engine-modern-bpf
/area libscap
/area libpman
/area libsinsp
Does this PR require a change in the driver versions?
What this PR does / why we need it:
Introduce native support for libbpf
bpftool prog show
statistics as discussed here falcosecurity/falco#2222 (comment).Approach:
libbpf
is already bundled in libs. As @gnosek anticipated it was an easy integration only requiring addingbpf_obj_get_info_by_fd
fds
directly from the engine specific handle (m_attached_progs
), therefore I don't anticipate any issues ...bpf_prog_info
is straight forward.echo 1 > /proc/sys/kernel/bpf_stats_enabled
is enabled on your machine, else no stats will be available. Plus test with old bpf for now as modern_bpf support is pending some additional extensions in order to reach parity with the old bpf.CC @falcosecurity/libs-maintainers for awareness.
Which issue(s) this PR fixes:
falcosecurity/falco#2222
Fixes #
Special notes for your reviewer:
@Andreagit97 still need to support
m_attached_progs
in structmodern_bpf_engine
, was wondering if we should do it in this PR or a follow up PR? Started all the scaffolding so that at least we don't break the scap-open test binary in the meantime.In addition, since we just recently separated out the bpf programs for each driver and you and @FedeDP worked on this, are there other implications I may be missing? Thanks in advance for checking!
Does this PR introduce a user-facing change?: