Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new(bpf-driver): Introduce bpf_probe_read_user and bpf_probe_read_kernel variants #809

Merged
merged 27 commits into from
Feb 6, 2023

Conversation

hbrueckner
Copy link
Contributor

@hbrueckner hbrueckner commented Dec 21, 2022

What type of PR is this?

/kind feature
/kind bug

Any specific area of the project related to this PR?

/area driver-bpf

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

tbd

What this PR does / why we need it:

This PR introduces the bpf_probe_read_user and bpf_probe_read_kernel variants to correctly read memory either from kernel or user space. This PR would also be the base to enable the BPF driver for the s390x architecture as well. Please note that for s390x there is yet another issue to be solved (here and in modern-bpf as well): new(bpf, modern-bpf): Add socketcall support

Which issue(s) this PR fixes:

Fixes #497

Special notes for your reviewer:

This PR evolved over the last couple of weeks and months. And I think it's time to open to obtain feedback and thoughts from you. There is some more testing required along the way and I also look forward towards the shared driver test cases being worked on.

There are basically three areas that where the user and kernel bpf probe read variants are introduced:

  1. Direct calls are replaced.
  2. Helper routines to be adjusted which also affects the ringbuffer routines
  3. _READ() macro
    The commits are structured in the way to introduce and then adjust helper occurrences, e.g., by adding information from where to read memory (similar as in modern-bpf-driver).

cc: @Andreagit97, @FedeDP , @Molter73

Does this PR introduce a user-facing change?:

new(bpf-driver): Introduce `bpf_probe_read_user` and `bpf_probe_read_kernel` variants

@FedeDP
Copy link
Contributor

FedeDP commented Dec 21, 2022

This is tremendous work @hbrueckner ! Thank you very much!
On a quick glance, it seems like you did a great job reducing this work to the bare minimum, i really appreciate that.
Everything seems pretty well reasonated; will need to give it an in-depth review in the upcoming days (or well, first days of the new year :D )

@hbrueckner
Copy link
Contributor Author

Hi @FedeDP

This is tremendous work @hbrueckner ! Thank you very much! On a quick glance, it seems like you did a great job reducing this work to the bare minimum, i really appreciate that. Everything seems pretty well reasonated; will need to give it an in-depth review in the upcoming days (or well, first days of the new year :D )

Thank you so much! And, sure, take your time to review. Also, I liked to have it out before leaving for holidays.

@Andreagit97
Copy link
Member

Wow that's amazing, as @FedeDP correctly said it seems really well resonated and smart, thank you very much for that!
Just 2 quick questions:

  • Do many kernels backport these bpf helpers bpf_probe_read_user/kernel? Or we can take advantage of them only in kernels >= 5.5?
  • Have you ever tested it on old kernel versions (like 4.14)? I'm a little bit worried about old verifiers. There is no so much additional complexity but never say never :/

Copy link
Contributor

@gnosek gnosek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! A few minor comments inline

driver/bpf/bpf_helpers.h Show resolved Hide resolved
Comment on lines +1099 to +1137
return __bpf_val_to_ring(data, val, 0, param_info->type, -1, false,
param_type_to_mem(param_info->type));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pass param_info->type to __bpf_val_to_ring, so we should be able to call param_type_to_mem in there and keep the signature unchanged, right?

Copy link
Contributor Author

@hbrueckner hbrueckner Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a version with unchanged signature. The problem here is that param_type_to_mem makes the decision that for PT_CHARBUF and PT_BYTEBUF user space memory is being copied. Unfortunately, this does not work in all cases.

Here are some places where for char/byte buffers are copied from kernel space (either from kernel structures or already copied into the scratch area):

git grep -n __bpf_val_to_ring driver/bpf/ |grep  'BUF' |grep 'KERNEL'
driver/bpf/fillers.h:709:               return __bpf_val_to_ring(data, 0, size, PT_BYTEBUF, -1, true, KERNEL);
driver/bpf/fillers.h:2378:              res = __bpf_val_to_ring(data, 0, exe_len, PT_CHARBUF, -1, false, KERNEL);
driver/bpf/fillers.h:2386:              res = __bpf_val_to_ring(data, 0, args_len - exe_len, PT_BYTEBUF, -1, false, KERNEL);
driver/bpf/fillers.h:2528:      res = __bpf_val_to_ring(data, (unsigned long)data->tmp_scratch, cgroups_len, PT_BYTEBUF, -1, false, KERNEL);
driver/bpf/fillers.h:2735:              res = __bpf_val_to_ring(data, 0, env_len, PT_BYTEBUF, -1, false, KERNEL);
driver/bpf/fillers.h:6208:              res = __bpf_val_to_ring(data, 0, exe_len, PT_CHARBUF, -1, false, KERNEL);
driver/bpf/fillers.h:6216:              res = __bpf_val_to_ring(data, 0, args_len - exe_len, PT_BYTEBUF, -1, false, KERNEL);
driver/bpf/fillers.h:6357:      res = __bpf_val_to_ring(data, (unsigned long)data->tmp_scratch, cgroups_len, PT_BYTEBUF, -1, false, KERNEL);
driver/bpf/fillers.h:6410:      res = __bpf_val_to_ring(data, 0, env_len, PT_BYTEBUF, -1, false, KERNEL);
driver/bpf/fillers.h:6617:              res = __bpf_val_to_ring(data, 0, exe_len, PT_CHARBUF, -1, false, KERNEL);
driver/bpf/fillers.h:6625:              res = __bpf_val_to_ring(data, 0, args_len - exe_len, PT_BYTEBUF, -1, false, KERNEL);
driver/bpf/fillers.h:6767:      res = __bpf_val_to_ring(data, (unsigned long)data->tmp_scratch, cgroups_len, PT_BYTEBUF, -1, false, KERNEL);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so how about we

#define __bpf_val_to_ring(..., param_type, ...) __bpf_val_to_ring_with_explicit_mem(..., param_type, ..., param_type_to_mem(param_type))

(or do the equivalent static inline) and use __bpf_val_to_ring_with_explicit_mem in the few spots where the default guess is wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gnosek ,
let me think a bit about to have an explicit mem version for __bpf_val_to_ring.

# git grep -n __bpf_val_to_ring driver/bpf/  |grep 'KERNEL' |wc -l
17
# git grep -n __bpf_val_to_ring driver/bpf/  |grep 'USER' |wc -l
7

This would at least reduce those 7 occurrences where USER has to be specified directly.

@hbrueckner
Copy link
Contributor Author

Wow that's amazing, as @FedeDP correctly said it seems really well resonated and smart, thank you very much for that!

❤️

Just 2 quick questions:
* Do many kernels backport these bpf helpers bpf_probe_read_user/kernel? Or we can take advantage of them only in kernels >= 5.5?

Let's be smart and start with kernels >= 5.5. For backports/distributions, let's use the feature_gates here to have some control.

* Have you ever tested it on old kernel versions (like `4.14`)? I'm a little bit worried about old verifiers. There is no so much additional complexity but never say never :/

No I have not. Regarding BPF issues, I had more trouble on the socketcall part.

Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR LGTM, to be honest i don't like too much all these bpf_val_to_ring flavors but I also understand that there are not so many ways to do that in a clear way. One thing we can try to do in the next future is use helpers like push_charbuf like in the modern probe or something similar, but right now we don't many choices :/

driver/bpf/filler_helpers.h Show resolved Hide resolved
driver/bpf/filler_helpers.h Show resolved Hide resolved
@Andreagit97
Copy link
Member

moreover i've tried it on a 4.14 kernel and it seems to work 🥳

@Andreagit97 Andreagit97 added this to the next-driver milestone Dec 22, 2022
@Andreagit97
Copy link
Member

i've seen both PRs this one and the socketcall one and they look good to me! Probably the best way to go here is: to merge the entire driver testing framework (we miss only the last PR IIRC) so we can try to avoid regressions on BPF, then rebase this one that unfortunately now has some conflicts (sorry for that), and then go with the socketcalls for both BPF and modern BPF (WDYT?)

It would be amazing to use the same approach we use in the modern bpf and also here in the old probe, I need to double-check it but as far as I saw it would be quite difficult to do that, mainly because we extract argument also in nested helpers and this complicates the whole story :(

Sorry if we are not so active but the release consumes almost all our time :/

@hbrueckner
Copy link
Contributor Author

hbrueckner commented Jan 13, 2023

Hi @Andreagit97

i've seen both PRs this one and the socketcall one and they look good to me!

Thanks! ❤️

Probably the best way to go here is: to merge the entire driver testing framework (we miss only the last PR IIRC) so we can try to avoid regressions on BPF, then rebase this one that unfortunately now has some conflicts (sorry for that), and then go with the socketcalls for both BPF and modern BPF (WDYT?)

Make sense to me. The 4 parts of #783 are merged, so if there is no further driver testing framework changes, I would go with rebasing. I think there were few more probe reads to be converted during the rebase. Once the rebase is done, I will go thru above review comments.

It would be amazing to use the same approach we use in the modern bpf and also here in the old probe, I need to double-check it but as far as I saw it would be quite difficult to do that, mainly because we extract argument also in nested helpers and this complicates the whole story :(

It would be indeed nice and, actually, there are conceptually not too far away. Instead of having the arguments on stack variable like in modern bpf , they are in the bpf map ;-) And, yes, there couple of nested helpers to consider in the bpf driver.
Further, the current implementation is based on having BPF_SUPPORTS_RAW_TRACEPOINTS to get the arguments directly from the ctx. Without BPF_SUPPORTS_RAW_TRACEPOINTS, the argument bpf map is required additionally to obtain arguments, especially, in the exit fillers (where there are no arguments in the tracepoint format). Accessing then socketcall arguments would always need to go through argument bpf map first. Let me know if it makes sense to also cover the non-BPF_SUPPORTS_RAW_TRACEPOINTS; it should be easy to extract from the argument bpf map first and, then, fill the socketcall argument map.

@Andreagit97
Copy link
Member

Make sense to me. The 4 parts of #783 are merged, so if there is no further driver testing framework changes, I would go with rebasing. I think there were few more probe reads to be converted during the rebase. Once the rebase is done, I will go thru above review comments.

We have also part 5 but this should be the last one 🤣 we are a little bit blocked by the release, but I will try to do that ASAP

It would be indeed nice and, actually, there are conceptually not too far away. Instead of having the arguments on stack variable like in modern bpf , they are in the bpf map ;-)

That's true I think we cannot do that without an auxiliary BPF map for the socketcall args :/

And, yes, there couple of nested helpers to consider in the bpf driver. Further, the current implementation is based on having BPF_SUPPORTS_RAW_TRACEPOINTS to get the arguments directly from the ctx. Without BPF_SUPPORTS_RAW_TRACEPOINTS, the argument bpf map is required additionally to obtain arguments, especially, in the exit fillers (where there are no arguments in the tracepoint format). Accessing then socketcall arguments would always need to go through argument bpf map first. Let me know if it makes sense to also cover the non-BPF_SUPPORTS_RAW_TRACEPOINTS; it should be easy to extract from the argument bpf map first and, then, fill the socketcall argument map.

This is a good question, on one side I would have our drivers consistent between kernel versions on the other side our BPF code is already a mess and I wouldn't add a feature if we don't really need that... if we don't implement it we will lose support for soscketcall in kernels 4.14 <= x < 4.17. WDYT @FedeDP @gnosek ?

@hbrueckner
Copy link
Contributor Author

hbrueckner commented Jan 16, 2023

Let me know if it makes sense to also cover the non-BPF_SUPPORTS_RAW_TRACEPOINTS; it should be easy to extract from the argument bpf map first and, then, fill the socketcall argument map.

This is a good question, on one side I would have our drivers consistent between kernel versions on the other side our BPF code is already a mess and I wouldn't add a feature if we don't really need that... if we don't implement it we will lose support for soscketcall in kernels 4.14 <= x < 4.17. WDYT @FedeDP @gnosek ?

To enable it, it is just about this additional helper to be called in handle_socketcall (and using stack_ctx instead of ctx) to extract socketcall ID and socketcall arguments:

/* For socketcalls, arguments needs to be accessed early, i.e., before
 * and filler are initialized.  Therefore, the bpf_syscall_get_argument*
 * functions cannot be used.
 */
static __always_inline unsigned long socketcall_get_argument(void *ctx, int idx)
{
#ifdef BPF_SUPPORTS_RAW_TRACEPOINTS
	return bpf_syscall_get_argument_from_ctx(ctx, idx);
#else
	unsigned long arg = 0;
	unsigned long *args = unstash_args();
	if (args)
		arg = bpf_syscall_get_argument_from_args(args, idx);
	return arg;
#endif
}

@Andreagit97
Copy link
Member

Let me know if it makes sense to also cover the non-BPF_SUPPORTS_RAW_TRACEPOINTS; it should be easy to extract from the argument bpf map first and, then, fill the socketcall argument map.

This is a good question, on one side I would have our drivers consistent between kernel versions on the other side our BPF code is already a mess and I wouldn't add a feature if we don't really need that... if we don't implement it we will lose support for soscketcall in kernels 4.14 <= x < 4.17. WDYT @FedeDP @gnosek ?

To enable it, it is just about this additional helper to be called in handle_socketcall (and using stack_ctx instead of ctx) to extract socketcall ID and socketcall arguments:

/* For socketcalls, arguments needs to be accessed early, i.e., before
 * and filler are initialized.  Therefore, the bpf_syscall_get_argument*
 * functions cannot be used.
 */
static __always_inline unsigned long socketcall_get_argument(void *ctx, int idx)
{
#ifdef BPF_SUPPORTS_RAW_TRACEPOINTS
	return bpf_syscall_get_argument_from_ctx(ctx, idx);
#else
	unsigned long arg = 0;
	unsigned long *args = unstash_args();
	if (args)
		arg = bpf_syscall_get_argument_from_args(args, idx);
	return arg;
#endif
}

Ok, so I would say yes, we can support also the bpf without BPF_SUPPORTS_RAW_TRACEPOINTS

@hbrueckner
Copy link
Contributor Author

hbrueckner commented Feb 3, 2023

Rebased the series to latest master and added few more commits regarding driver tests. I happy to say that the driver tests pass on s390x:

[----------] Global test environment tear-down
[==========] 249 tests from 4 test suites ran. (23011 ms total)
[  PASSED  ] 242 tests.
[  SKIPPED ] 7 tests, listed below:
[  SKIPPED ] SyscallExit.open_by_handle_atX_success_mp
[  SKIPPED ] SyscallExit.recvmsgX_no_snaplen
[  SKIPPED ] SyscallExit.recvmsgX_snaplen
[  SKIPPED ] SyscallEnter.connectE_INET
[  SKIPPED ] SyscallEnter.connectE_INET6
[  SKIPPED ] SyscallEnter.connectE_UNIX
[  SKIPPED ] SyscallEnter.connectE_UNIX_max_path

(Note that I need to add socketcall though and to make network tests happy, I had to do socket calibration in libscap directly but should then work once socketcall is ready.)

Next steps for me is to go through the review feedback above and continue with socketcall integration.

queuelen = _READ(sk->sk_ack_backlog);
queuemax = _READ(sk->sk_max_ack_backlog);
if(queuelen && queuemax)
if (fd >= 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self-comment: We need to add the same check for kmod driver as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you willing to open another PR about this, before we forget? 🤣
Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the kmod, the respective driver test passes w/o change 🤔 However, I will try to make the change for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done w/ 28650fd

hbrueckner and others added 16 commits February 6, 2023 18:06
Signed-off-by: Hendrik Brueckner <brueckner@de.ibm.com>
Co-authored-by: Mauro Ezequiel Moltrasio <mmoltras@redhat.com>
…IANTS

Signed-off-by: Hendrik Brueckner <brueckner@de.ibm.com>
Co-authored-by: Mauro Ezequiel Moltrasio <mmoltras@redhat.com>
Signed-off-by: Mauro Ezequiel Moltrasio <mmoltras@redhat.com>
Signed-off-by: Mauro Ezequiel Moltrasio <mmoltras@redhat.com>
Callers specify the len directly instead of a pointer to read it.

Signed-off-by: Hendrik Brueckner <brueckner@de.ibm.com>
The sockaddr_un->sun_path is an array and the data resides already
in the scratch (kernel) space. Hence, use read from kernel here.

Signed-off-by: Hendrik Brueckner <brueckner@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@de.ibm.com>
Use bpf_probe_read_kernel() to read information from kernel's
unix socket structure.

Signed-off-by: Hendrik Brueckner <brueckner@de.ibm.com>
Unify accept4_x behavior for providing socket tuple and queue len
information with modern BPF and driver tests.

Signed-off-by: Hendrik Brueckner <brueckner@de.ibm.com>
The driver test on `s390x` for fsconfigX_failure fails with:

	/root/git/falcosecurity/libs/test/drivers/event_class/event_class.cpp:799: Failure
	Expected equality of these values:
	  size
	    Which is: 1
	  expected_size
	    Which is: 0
	>>>>> length of the param is not correct. Param id = 6

	[  FAILED  ] SyscallExit.fsconfigX_failure (91 ms)

The related filler implementation uses:
````
                /* Parameter 6: value_charbuf (type: PT_CHARBUF) */
                res = bpf_val_to_ring_mem(data, 0, KERNEL);
````
to push an empty param. However the `__bpf_val_to_ring()` implementation
for PT_CHARBUF does not check for `val == 0` and tries to read a
string from address 0.  The probe read str does not fail and returns
a `\0` terminated string cause above failure.

Note that on `s390x`, there is the concept of which low-core (or prefix
area). It is an area of 2 pages at address zero with lots of critical
state information for system operation. Each CPUs has this information
at the same place and it needs to be set up with prefixing / set prefix
instructions.  That means, a zero string is read which results in an
empty/NUL-terminated string.  Note that this behavior might be specific
to s390x and other architectures might not have memory mapped at address
zero.

Note: Trying to use the `bpf_push_empty_param()` in the fsconfig filler
caused an invalid map access BPF verifier.  That's why changing the
filler helper to ignore if val is zero.

Signed-off-by: Hendrik Brueckner <brueckner@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@de.ibm.com>
On s390x, `fork` and `clone` (not `clone3`) generate child events
that lack information from the argv and envp memory.  Obtaining
information from them is not possible because after process creation
those user space memory causes a page fault to become available.
BPF probe reads do not support page faults, no data can be obtained.

To correctly pass the driver tests, drop those incomplete child events
and let the `sched_proc_fork` filler handle them entirely.

Note: Dropping those child events is done only when `sched_proc_fork`
is enabled.

For the modern BPF part, see also:

	commit 81b6d4e
	Author: Andrea Terzolo <andrea.terzolo@polito.it>
	Date:   Mon Oct 24 22:07:17 2022 +0000

	    new(driver): implement `CAPTURE_SCHED_PROC_FORK` logic

	    Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>

Signed-off-by: Hendrik Brueckner <brueckner@de.ibm.com>
Unify accept4_x behavior for providing socket tuple and queue len
information with modern BPF, BPF, and driver tests.

Signed-off-by: Hendrik Brueckner <brueckner@de.ibm.com>
hbrueckner and others added 2 commits February 6, 2023 18:41
The `bpf_probe_read_kernel` / `bpf_probe_read_user` perform memset
to zero in case they fail.  Avoid duplication and safe few insns.

Co-authored-by: Federico Di Pierro <nierro92@gmail.com>
Signed-off-by: Hendrik Brueckner <brueckner@de.ibm.com>
Co-authored-by: Andrea Terzolo <andrea.terzolo@polito.it>
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
@hbrueckner hbrueckner force-pushed the s390x-bpf-driver-2022-12-21 branch from bd9d870 to 9b25675 Compare February 6, 2023 17:51
Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be ready!
/approve

@poiana
Copy link
Contributor

poiana commented Feb 6, 2023

LGTM label has been added.

Git tree hash: 0c49e0e9ce61a05b8e9086f6ccdf050d4cac63a6

Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented Feb 6, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, FedeDP, hbrueckner

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana merged commit 9a98243 into falcosecurity:master Feb 6, 2023
@hbrueckner hbrueckner deleted the s390x-bpf-driver-2022-12-21 branch February 7, 2023 08:46
@FedeDP FedeDP mentioned this pull request Feb 23, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants