Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new(libscap, libsinsp): new scap_stats_v2 API, featuring new libbpf bpftool prog show like stats for bpf (new metrics 3/n) #1021

Merged
merged 26 commits into from
Apr 17, 2023

Conversation

incertum
Copy link
Contributor

@incertum incertum commented Apr 2, 2023

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap-engine-udig

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Introduce native support for libbpf bpftool prog show statistics as discussed here falcosecurity/falco#2222 (comment).

Approach:

  • libbpf is already bundled in libs. As @gnosek anticipated it was an easy integration only requiring adding bpf_obj_get_info_by_fd
  • Getting the fds directly from the engine specific handle (m_attached_progs), therefore I don't anticipate any issues ...
  • The way we look the libbpf stats properties up should naturally extend to any future programs we support without requiring a refactor, plus extracting more fields from bpf_prog_info is straight forward.
  • Please note these libbpf stats are currently only available on a per program granularity AFAIK and we won't get filler / tail call resolution, but maybe one day :)
  • These stats are going to be very valuable when inspecting the trade-offs between sys enter and exit tracepoints and LSM hooks. LSM hook are planned for libs 0.12.0, see Support Linux Security Modules hook for more accurate security tracing  #252
  • When testing please ensure echo 1 > /proc/sys/kernel/bpf_stats_enabled is enabled on your machine, else no stats will be available. Plus test with old bpf for now as modern_bpf support is pending some additional extensions in order to reach parity with the old bpf.

CC @falcosecurity/libs-maintainers for awareness.

Which issue(s) this PR fixes:

falcosecurity/falco#2222

Fixes #

Special notes for your reviewer:

@Andreagit97 still need to support m_attached_progs in struct modern_bpf_engine, was wondering if we should do it in this PR or a follow up PR? Started all the scaffolding so that at least we don't break the scap-open test binary in the meantime.

In addition, since we just recently separated out the bpf programs for each driver and you and @FedeDP worked on this, are there other implications I may be missing? Thanks in advance for checking!

Does this PR introduce a user-facing change?:

new(libscap, libsinsp): add libbpf `bpftool prog show` like scap_libbpf_stats

@incertum
Copy link
Contributor Author

incertum commented Apr 3, 2023

New scap-open test binary stats output now looks like this

sudo libscap/examples/01-open/scap-open --bpf driver/bpf/probe.o

[SCAP-OPEN]: Hello!

---------------------- SCAP SOURCE ----------------------
* BPF probe: 'driver/bpf/probe.o'
-----------------------------------------------------------

--------------------- CONFIGURATIONS ----------------------
* Print single event type: -1 (`-1` means no event to print).
* Run until '18446744073709551615' events are catched.
-----------------------------------------------------------

---------------------- INTERESTING SYSCALLS ----------------------
* All sc codes are enabled!
------------------------------------------------------------------

* OK! BPF probe correctly loaded: NO VERIFIER ISSUES :)
* Live capture in progress...
* Press CTRL+C to stop the capture
^C
---------------------- STATS -----------------------

[SCAP-OPEN]: General statistics

Events correctly captured (SCAP_SUCCESS): 56201039
Seen by driver (kernel side events): 56201398
Time elapsed: 1994 s
Rate of userspace events (events/second): 28185
Rate of kernel side events (events/second): 28185
Number of dropped events: 0
Number of timeouts: 65845
Number of 'next' calls: 56266884
Number of preemptions: 0
Number of events skipped due to the tid being in a set of suppressed tids: 0
Number of threads currently being suppressed: 0

[SCAP-OPEN]: libbpf statistics

Number of total bpf program invocations (sum run_cnt): 56206994
Nanoseconds spent in total across bpf programs (sum run_time_ns): 107288950041
Average time spent across bpf programs (avg_time_ns): 1908
Prog [sys_enter] run_cnt: 20741581
Prog [sys_enter] run_time_ns: 32088344143
Prog [sys_enter] avg_time_ns: 1547
Prog [sys_exit] run_cnt: 20743009
Prog [sys_exit] run_time_ns: 40765096995
Prog [sys_exit] avg_time_ns: 1965
Prog [sched_process_e] run_cnt: 1850
Prog [sched_process_e] run_time_ns: 6870257
Prog [sched_process_e] avg_time_ns: 3713
Prog [sched_switch] run_cnt: 6650672
Prog [sched_switch] run_time_ns: 30832591912
Prog [sched_switch] avg_time_ns: 4636
Prog [page_fault_user] run_cnt: 8031096
Prog [page_fault_user] run_time_ns: 3572066118
Prog [page_fault_user] avg_time_ns: 444
Prog [page_fault_kern] run_cnt: 36638
Prog [page_fault_kern] run_time_ns: 18727841
Prog [page_fault_kern] avg_time_ns: 511
Prog [signal_deliver] run_cnt: 2148
Prog [signal_deliver] run_time_ns: 5252775
Prog [signal_deliver] avg_time_ns: 2445

[SCAP-OPEN]: Kernel side buffer drop statistics

Number of dropped events caused by full buffer (total / all buffer drops - includes all categories below, likely higher than sum of syscall categories): 0
Number of dropped events caused by full buffer (n_drops_buffer_clone_fork_enter syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_clone_fork_exit syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_execve_enter syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_execve_exit syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_connect_enter syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_connect_exit syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_open_enter syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_open_exit syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_dir_file_enter syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_dir_file_exit syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_other_interest_enter syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_other_interest_exit syscall category): 0
Number of dropped events caused by full scratch map: 0
Number of dropped events caused by invalid memory access (page faults): 0
Number of dropped events caused by an invalid condition in the kernel instrumentation (bug): 0
-----------------------------------------------------

[SCAP-OPEN]: Bye!

@incertum incertum changed the title new(libscap, libsinsp): add libbpf bpftool prog show like scap_libbpf_stats new(libscap, libsinsp): add libbpf bpftool prog show like scap_libbpf_stats (new metrics 3/n) Apr 3, 2023
@Andreagit97
Copy link
Member

Uhm I have some doubts about this PR... Let's say in this release we have removed the tp logic from sinsp and we have pushed it into the engines, this is because the tracepoint concept is something that the user should not know about. Now in this PR, we are exposing stats about tracepoints to userspace... this is in some way against what we did some weeks ago :/ do we need again a shared enum between engines to loop over tracepoints (called now "attached progs")? This is precisely what we removed some changes ago, now every engine has its enum/logic because every engine could have different progs attached...
WDYT? @jasondellaluce @FedeDP

If we want to expose again some tracepoints information, probably I would enrich the actual scap_stats with new info. I would avoid another v-table method since the new logic is just related to bpf engines :/ Maybe we could have something like this

typedef struct bpf_prog_stats
{
  uint64_t run_cnt;
  uint64_t run_time_ns;
  // ...
}bpf_prog_stats;

/*!
  \brief Statistics about an in progress capture
*/
typedef struct scap_stats
{
	uint64_t n_evts; ///< Total number of events that were received by the driver.
	uint64_t n_drops; ///< Number of dropped events.
	uint64_t n_drops_buffer; ///< Number of dropped events caused by full buffer.
	uint64_t n_drops_buffer_clone_fork_enter;
	uint64_t n_drops_buffer_clone_fork_exit;
	uint64_t n_drops_buffer_execve_enter;
	uint64_t n_drops_buffer_execve_exit;
	uint64_t n_drops_buffer_connect_enter;
	uint64_t n_drops_buffer_connect_exit;
	uint64_t n_drops_buffer_open_enter;
	uint64_t n_drops_buffer_open_exit;
	uint64_t n_drops_buffer_dir_file_enter;
	uint64_t n_drops_buffer_dir_file_exit;
	uint64_t n_drops_buffer_other_interest_enter;
	uint64_t n_drops_buffer_other_interest_exit;
	uint64_t n_drops_scratch_map; ///< Number of dropped events caused by full frame scratch map.
	uint64_t n_drops_pf; ///< Number of dropped events caused by invalid memory access.
	uint64_t n_drops_bug; ///< Number of dropped events caused by an invalid condition in the kernel instrumentation.
	uint64_t n_preemptions; ///< Number of preemptions.
	uint64_t n_suppressed; ///< Number of events skipped due to the tid being in a set of suppressed tids.
	uint64_t n_tids_suppressed; ///< Number of threads currently being suppressed.
        bpf_prog_stats sys_enter;
        bpf_prog_stats sys_exit;
        //...
}scap_stats;

in the old probe, you can just cut and paste the actual implementation, while in the modern bpf we should do something like this

	for(int index = 0; index < g_state.n_possible_cpus; index++)
	{
		if(bpf_map_lookup_elem(counter_maps_fd, &index, &cnt_map) < 0)
		{
			snprintf(error_message, MAX_ERROR_MESSAGE_LEN, "unbale to get the counter map for CPU %d", index);
			pman_print_error((const char *)error_message);
			goto clean_print_stats;
		}
		stats->n_evts += cnt_map.n_evts;
		stats->n_drops_buffer += cnt_map.n_drops_buffer;
		stats->n_drops_scratch_map += cnt_map.n_drops_max_event_size;
		stats->n_drops += (cnt_map.n_drops_buffer + cnt_map.n_drops_max_event_size);
	}

	/* Enhanced logic... */
	int ret = 0;
	struct bpf_prog_info info = {0};
	u32 info_len = sizeof(info);
	ret = bpf_obj_get_info_by_fd(bpf_program__fd(g_state.skel->progs.sys_enter), &info, &info_len);
	if(ret!=0)
	{
		stats->sys_enter.run_cnt = info.run_cnt;
		stats->sys_enter.run_time_ns = info.run_time_ns;
		// ...
	}

I don't like too much this approach but I don't see many other ways, at least right now 🤔

Some random thoughts:

  • doing all this stuff into scap_get_stats could be time-consuming if we do it frequently, so maybe we need to add a flag into the scap_stats struct, something like bool enhanced_stats, to collect these new stats only on demand. WDYT?
  • I would try to converge all the new stats into the scap_get_stats method. We are adding different methods to get different metrics from different places, maybe it would be better to find a unique place and enable these new logic on demand (?) ☝️

@incertum
Copy link
Contributor Author

incertum commented Apr 3, 2023

Uhm I have some doubts about this PR... Let's say in this release we have removed the tp logic from sinsp and we have pushed it into the engines, this is because the tracepoint concept is something that the user should not know about. Now in this PR, we are exposing stats about tracepoints to userspace... this is in some way against what we did some weeks ago :/ do we need again a shared enum between engines to loop over tracepoints (called now "attached progs")? This is precisely what we removed some changes ago, now every engine has its enum/logic because every engine could have different progs attached... WDYT? @jasondellaluce @FedeDP

Thank you @Andreagit97 for your input. Offering some additional context:

  • Agree that for the instrumentation over Falco rules the end user does not need to know about tracepoints, however when it comes to resource utilization and additional performance metrics such as the ones introduced here yes the end user absolutely needs to know about the tracepoints. This is because you need to ensure large scale production stability.
  • The good news is that the shared enum here appears fine as it's just for passive stats collections, meaning the indices don't mean anything to any of the engines and don't map to an exact tracepoint as a consequence, it's just a random index. We would however need to know the absolute MAX tracepoints to ensure we allocate it appropriately for both bpf drivers. Then a different index means a different tracepoints between bpf and modern_bpf and this is fine as for the stats reporting we just loop over them and extract the tracepoint name and respective metrics dynamically from the struct, no hard-coding here. Also no hard-coding for the Falco logs we will use the name as key for the logs dynamically (of course with appropriate checks in place), such a more auto-discovery approach is very common in data pipelines in ETLs.

If we want to expose again some tracepoints information, probably I would enrich the actual scap_stats with new info. I would avoid another v-table method since the new logic is just related to bpf engines :/ Maybe we could have something like this

typedef struct bpf_prog_stats
{
  uint64_t run_cnt;
  uint64_t run_time_ns;
  // ...
}bpf_prog_stats;

/*!
  \brief Statistics about an in progress capture
*/
typedef struct scap_stats
{
	uint64_t n_evts; ///< Total number of events that were received by the driver.
	uint64_t n_drops; ///< Number of dropped events.
	uint64_t n_drops_buffer; ///< Number of dropped events caused by full buffer.
	uint64_t n_drops_buffer_clone_fork_enter;
	uint64_t n_drops_buffer_clone_fork_exit;
	uint64_t n_drops_buffer_execve_enter;
	uint64_t n_drops_buffer_execve_exit;
	uint64_t n_drops_buffer_connect_enter;
	uint64_t n_drops_buffer_connect_exit;
	uint64_t n_drops_buffer_open_enter;
	uint64_t n_drops_buffer_open_exit;
	uint64_t n_drops_buffer_dir_file_enter;
	uint64_t n_drops_buffer_dir_file_exit;
	uint64_t n_drops_buffer_other_interest_enter;
	uint64_t n_drops_buffer_other_interest_exit;
	uint64_t n_drops_scratch_map; ///< Number of dropped events caused by full frame scratch map.
	uint64_t n_drops_pf; ///< Number of dropped events caused by invalid memory access.
	uint64_t n_drops_bug; ///< Number of dropped events caused by an invalid condition in the kernel instrumentation.
	uint64_t n_preemptions; ///< Number of preemptions.
	uint64_t n_suppressed; ///< Number of events skipped due to the tid being in a set of suppressed tids.
	uint64_t n_tids_suppressed; ///< Number of threads currently being suppressed.
        bpf_prog_stats sys_enter;
        bpf_prog_stats sys_exit;
        //...
}scap_stats;

in the old probe, you can just cut and paste the actual implementation, while in the modern bpf we should do something like this

	for(int index = 0; index < g_state.n_possible_cpus; index++)
	{
		if(bpf_map_lookup_elem(counter_maps_fd, &index, &cnt_map) < 0)
		{
			snprintf(error_message, MAX_ERROR_MESSAGE_LEN, "unbale to get the counter map for CPU %d", index);
			pman_print_error((const char *)error_message);
			goto clean_print_stats;
		}
		stats->n_evts += cnt_map.n_evts;
		stats->n_drops_buffer += cnt_map.n_drops_buffer;
		stats->n_drops_scratch_map += cnt_map.n_drops_max_event_size;
		stats->n_drops += (cnt_map.n_drops_buffer + cnt_map.n_drops_max_event_size);
	}

	/* Enhanced logic... */
	int ret = 0;
	struct bpf_prog_info info = {0};
	u32 info_len = sizeof(info);
	ret = bpf_obj_get_info_by_fd(bpf_program__fd(g_state.skel->progs.sys_enter), &info, &info_len);
	if(ret!=0)
	{
		stats->sys_enter.run_cnt = info.run_cnt;
		stats->sys_enter.run_time_ns = info.run_time_ns;
		// ...
	}

I don't like too much this approach but I don't see many other ways, at least right now 🤔

Curious to hear more feedback. Thinking behind introducing a new libbpf stats struct is that we want to call the libbpf stats less frequently and opt-in. We could accomplish some gating within the existing get_stats, but would we want to or would keeping them separate be cleaner?

Re the implementation suggestion above would be hesitant to hard-code out an exact struct by tracepoint name as it's going to generate a lot more maintenance overhead. The simple array that we populate with the new bpf struct is more dynamic and will automatically extend to new tracepoints as the name is merely part of the struct.

Some random thoughts:

  • doing all this stuff into scap_get_stats could be time-consuming if we do it frequently, so maybe we need to add a flag into the scap_stats struct, something like bool enhanced_stats, to collect these new stats only on demand. WDYT?
  • I would try to converge all the new stats into the scap_get_stats method. We are adding different methods to get different metrics from different places, maybe it would be better to find a unique place and enable these new logic on demand (?) ☝️

Heard, see comment above. At the end either way will be fine, let's collect more feedback and collectively make a decision.

@gnosek
Copy link
Contributor

gnosek commented Apr 3, 2023

Yay, another tracepoint discussion :D

As much as I'm against exposing tracepoints in the scap/sinsp API, I'm all for returning info about them in resource utilization stats, as long as we explicitly document them as unstable (not part of any public contract).

On an implementation level, personally I would add yet another vtable method (eventually maybe drop the current get_stats one) with an API like:

struct scap_stat
{
  const char* name;
  // maybe some metadata here, like the unit as an enum
  int64_t value;
};

int32_t get_stats(struct engine_handle engine, void** cursor, struct scap_stat* buf, size_t buf_size);

where the engine has full control over the names of the stats fields and they need not have any specific structure (unless we choose to do so)

It would be used like this:

#define BUF_SIZE 42
void* cursor = NULL;
struct scap_stat stats[BUF_SIZE];

while(1) {
  int32_t ret = get_stats(engine, &cursor, stats, sizeof(stats));
  if(ret < 0) {
    return SCAP_FAILURE;
  }

  if(ret == 0)
  {
    // all done
    break;
  }

  ASSERT(ret <= sizeof(stats));
  for(size_t i = 0; i < ret; ++i)
  {
    printf("%s = %lld\n", stats[i].name, stats[i].value);
  }
}

i.e. the caller provides a buffer of arbitrary size and the engine fills it with as many stats as it likes, using the cursor variable to keep track across calls (if it has more stats than would fit in the buffer).

Of course there are many decisions here, like const char* vs char[SOME_SIZE] (or maybe an enum for the well known currently built in stats and a string for custom stuff) etc., but overall it feels:

  • simple enough to be used from C
  • flexible enough to provide stats at different layers with low overhead
  • composable (one component can delegate a part of its stats to another one with minimal ceremony)

@jasondellaluce
Copy link
Contributor

Will take some time to review this deeper, but I'm 100% onboard with @gnosek's proposal. This will also be the first step for a dynamic and implementation agnostic stats collection, which we would still want to achieve in libsinsp somewhere in the future. I think also satisfies both the information hiding and flexibility need that we brought to the table.

@incertum
Copy link
Contributor Author

incertum commented Apr 3, 2023

Thanks @gnosek and @jasondellaluce for a preview, meaning this is now 3 times feedback for consolidating to one stats, hence would call this a decision wrt to this aspect. As far as the implementation details let's gather more feedback and something dynamic like @gnosek suggested would make me happy :) Plus we need to have gates in place to tell which stats category to look up so that it can be called frequently without unnecessary lookups if for example all you want is the counters not the libbpf stats ...

@gnosek
Copy link
Contributor

gnosek commented Apr 3, 2023

Plus we need to have gates in place to tell which stats category to look up

Just a thought but the engines have a ->configure method, we can add new settings there easily

@incertum
Copy link
Contributor Author

incertum commented Apr 3, 2023

Great, let me move this PR to WIP since we need to refactor it quite a bit before it's ready! The more feedback the better and I'll start a refactor trying to take everyones input into consideration.

@incertum incertum changed the title new(libscap, libsinsp): add libbpf bpftool prog show like scap_libbpf_stats (new metrics 3/n) wip: new(libscap, libsinsp): add libbpf bpftool prog show like scap_libbpf_stats (new metrics 3/n) Apr 3, 2023
@incertum
Copy link
Contributor Author

incertum commented Apr 4, 2023

Pushed a staging commit for checkin. Not yet as elegant as @gnosek suggestion, because Grzegorz is like light years somewhere else in C coding skills 🤣

Furthermore, could we ease in the transition with an interim scap_stats_new approach which would include the new libbpf metrics and existing counters, because otherwise I will break everything including all clients besides Falco.

Flattening out the schema kind of works pretty well, more ideas around the schema? The chosen metadata field names are generic enough to cover many metrics use cases.

typedef struct scap_stats_new
{
	/* Metadata */
	char name[NAME_MAX];
	uint32_t type;
	uint32_t id;
	int valid;
	/* Stats (key, value) */
	char key[NAME_MAX];
	uint64_t value;
	// todo: add unit enum
}scap_stats_new;

current new scap open test output:

Stats name [sys_enter] key [run_cnt] and value [4382]
Stats name [sys_enter] key [run_time_ns] and value [13841362]
Stats name [sys_enter] key [avg_time_ns] and value [3158]
Stats name [sys_exit] key [run_cnt] and value [4383]
Stats name [sys_exit] key [run_time_ns] and value [16531416]
Stats name [sys_exit] key [avg_time_ns] and value [3771]
Stats name [sched_process_e] key [run_cnt] and value [1]
Stats name [sched_process_e] key [run_time_ns] and value [6843]
Stats name [sched_process_e] key [avg_time_ns] and value [6843]
Stats name [sched_switch] key [run_cnt] and value [1984]
Stats name [sched_switch] key [run_time_ns] and value [13562477]
Stats name [sched_switch] key [avg_time_ns] and value [6835]
Stats name [page_fault_user] key [run_cnt] and value [275]
Stats name [page_fault_user] key [run_time_ns] and value [515259]
Stats name [page_fault_user] key [avg_time_ns] and value [1873]
Stats name [page_fault_kern] key [run_cnt] and value [11]
Stats name [page_fault_kern] key [run_time_ns] and value [3279]
Stats name [page_fault_kern] key [avg_time_ns] and value [298]
Stats name [signal_deliver] key [run_cnt] and value [2]
Stats name [signal_deliver] key [run_time_ns] and value [6953]
Stats name [signal_deliver] key [avg_time_ns] and value [3476]

If we agree that this is the right first iteration, would go ahead and remove all the libbpf_stats references and cleanup more as I also wasn't sure where to init the stats buffer including defining the buffer size, feedback appreciated :)

incertum and others added 11 commits April 13, 2023 23:29
…tering

Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Co-authored-by: Jason Dellaluce <jasondellaluce@gmail.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Co-authored-by: Jason Dellaluce <jasondellaluce@gmail.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Co-authored-by: Grzegorz Nosek <grzegorz.nosek@sysdig.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
* apply reviewers feedback
* stats static const sized allocated in each engine for now
* general cleanup, such as re-audit returns on possible failures
* more comments

Co-authored-by: Jason Dellaluce <jasondellaluce@gmail.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Co-authored-by: Andrea Terzolo <andrea.terzolo@polito.it>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Co-authored-by: Andrea Terzolo <andrea.terzolo@polito.it>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Co-authored-by: Grzegorz Nosek <grzegorz.nosek@sysdig.com>
Co-authored-by: Jason Dellaluce <jasondellaluce@gmail.com>
Co-authored-by: Andrea Terzolo <andrea.terzolo@polito.it>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Co-authored-by: Grzegorz Nosek <grzegorz.nosek@sysdig.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Comment on lines +1753 to +1756
if (offset > nstats_allocated - 1)
{
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it's a bug if we ended up in here, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes would say so, want some error message

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes probably here we need to set an error message and return immediately with SCAP_FAILURE since we should never reach this point

Co-authored-by: Grzegorz Nosek <grzegorz.nosek@sysdig.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
@poiana
Copy link
Contributor

poiana commented Apr 17, 2023

LGTM label has been added.

Git tree hash: 5e266b79eed23903660fde26efa7a8ce017f3e86

Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will approve it so we can go on with the next steps, but I would like to see these comments addressed before releasing the final version :)
/approve

snprintf(error_message, MAX_ERROR_MESSAGE_LEN, "unable to get the counter map for CPU %d", index);
pman_print_error((const char *)error_message);
close(counter_maps_fd);
return -ret;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return -ret;
return SCAP_FAILURE;

uint32_t nstats;
int32_t rc;
const scap_stats_v2* stats_v2;
stats_v2 = scap_get_stats_v2(h, flags, &nstats, &rc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for other PRs: we could also test failure cases like a NULL stats vector or stuff like that

const struct scap_stats_v2* scap_modern_bpf__get_stats_v2(struct scap_engine_handle engine, uint32_t flags, OUT uint32_t* nstats, OUT int32_t* rc)
{
*rc = SCAP_SUCCESS;
if(pman_get_scap_stats_v2((void*)engine.m_handle->m_stats, flags, (void*)nstats))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(pman_get_scap_stats_v2((void*)engine.m_handle->m_stats, flags, (void*)nstats))
if(pman_get_scap_stats_v2((void*)engine.m_handle->m_stats, flags, nstats))

{
struct scap_stats_v2 *stats = (struct scap_stats_v2 *)scap_stats_v2_struct;
*nstats = 0;
int ret;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we use SCAP_FAILURE this should be no more necessary

Suggested change
int ret;

RUN_TIME_NS,
AVG_TIME_NS,
BPF_MAX_LIBBPF_STATS,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add the typedef also here like with bpf_kernel_counters_stats

const struct scap_stats_v2* engine::get_stats_v2(uint32_t flags, uint32_t* nstats, int32_t* rc)
{
*nstats = scap_gvisor::stats::MAX_GVISOR_COUNTERS_STATS;
scap_stats_v2* stats = engine::m_stats;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
scap_stats_v2* stats = engine::m_stats;
scap_stats_v2* stats = this->m_stats;

@@ -29,6 +29,13 @@ limitations under the License.
#include <sys/utsname.h>
#include "ringbuffer/ringbuffer.h"

const char * const modern_bpf_kernel_counters_stats_names[] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capture.c in libpman seems the right place for this... we don't use it here

MODERN_BPF_MAX_KERNEL_COUNTERS_STATS
}modern_bpf_kernel_counters_stats;

extern const char * const modern_bpf_kernel_counters_stats_names[];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we move the declaration in capture.c we don't need this

//
// machine_info flags
//
#define PPM_BPF_STATS_ENABLED (1 << 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could remove the PPM_ prefix and use just SCAP_ WDYT?

userspace/libsinsp/sinsp.cpp Show resolved Hide resolved
@poiana
Copy link
Contributor

poiana commented Apr 17, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, incertum, jasondellaluce

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Andreagit97,incertum,jasondellaluce]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana merged commit 16580de into falcosecurity:master Apr 17, 2023
@incertum incertum deleted the libbpf-stats branch December 8, 2023 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants