Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new(bpf, modern-bpf): Add socketcall support #810

Closed
hbrueckner opened this issue Dec 21, 2022 · 13 comments
Closed

new(bpf, modern-bpf): Add socketcall support #810

hbrueckner opened this issue Dec 21, 2022 · 13 comments
Assignees
Labels
kind/bug Something isn't working kind/feature New feature or request

Comments

@hbrueckner
Copy link
Contributor

Motivation

On the s390x architecture, network calls are routed through the socketcall syscall which in-kernel multiplexes to the different network syscalls, e.g., socket, connect, sendmsg, ...

Direct network syscall support has been added to s390 backend with Linux kernel v4.3. However, glibc sticks with kernel headers v3.2 for non-x86 architectures. You can read more details here: https://sourceware.org/pipermail/libc-alpha/2022-September/142108.html (thanks to @stliibm to kick off this discussion).

With that in mind, here is brief summary:

  • glibc: uses socketcall as mentioned above
  • musl: uses direct syscalls and falls back to socketcall (details here)
  • golang: uses socketcall

Feature

The kmod driver has already socketcall support. This feature is to add respective socketcall handling to the bpf and modern-bpf drivers.

Alternatives

None -- see summary above.

Additional context

None.

@hbrueckner
Copy link
Contributor Author

Here are my thoughts on socketcall for modern-bpf.

Add socketcall BPF program which roughly performs these steps:

  1. Extract socket call id and map to the direct syscall number
  2. Perform event filtering based of interested direct syscall
  3. Extract and store arguments in the BPF map
  4. Perform a tail call to the direct syscall BPF program using the existing syscall_enter_tail_table and syscall_exit_tail_table maps

Direct syscall BPF programs:
5. The function to obtain syscall arguments needs to be extended to obtain sc_args from the BPF map. This is possible as the ctx still contains the __NR_socketcall number to distinguish. This could look like:

+static __always_inline unsigned long extract__net_argument(struct pt_regs *regs, int idx)
+{
+       int syscall_id = syscalls_dispatcher__get_syscall_id(regs);
+
+       if (syscall_id == __NR_socketcall) {
+               unsigned long *args = unstash_sc_args();
+               if (!args || idx > 5)
+                       return 0;
+               return args[idx];
+       }
+
+       return extract__syscall_argument(regs, idx);
+}
+
 SEC("tp_btf/sys_enter")
 int BPF_PROG(socket_e,
             struct pt_regs *regs,
@@ -26,23 +49,25 @@ int BPF_PROG(socket_e,
 
        /* Parameter 1: domain (type: PT_ENUMFLAGS32) */
        /* why to send 32 bits if we need only 8 bits? */
-       u8 domain = (u8)extract__syscall_argument(regs, 0);
+       u8 domain = (u8)extract__net_argument(regs, 0);
        ringbuf__store_u32(&ringbuf, (u32)socket_family_to_scap(domain));

Test programs would need to follow a similar approach to wrap the arguments into an array.

libpman:
To make this as transparent as possible, socketcall is enabled along with any network call of interest. So far I understood this then would also require a PPME_SYSCALL_SOCKETCALL_E/X event. The question generate a socketcall event in case when the something went wrong for the tail call or silently ignore it. For libpman it is also the question how much processing could be done in there and what in the socketcall bpf program, e.g., providing a tail call map for the socket call id (= doing the id to syscall nr in user space).

@Andreagit97 @FedeDP Wdyt?

@Andreagit97
Copy link
Member

Ei @hbrueckner thank you for all these Infos, let me think a little bit about it. I will come back soon with some ideas :)

@hbrueckner
Copy link
Contributor Author

hbrueckner commented Dec 21, 2022

Hi @Andreagit97

Ei @hbrueckner thank you for all these Infos, let me think a little bit about it. I will come back soon with some ideas :)

not to bias you in any direction... here is my current draft outlining above ideas

@Andreagit97
Copy link
Member

I thought a little bit about it, I will propose something similar to your solution but slightly different, I need only some time to put together some code :/ I will come back here ASAP :)

@hbrueckner
Copy link
Contributor Author

HI @Andreagit97

thought a little bit about it, I will propose something similar to your solution but slightly different, I need only some time to put together some code :/ I will come back here ASAP :)

Thanks a lot ;-) I am already curious. One thought that came up was to lazy extract the arguments (w/o pushing them into a map). No time to try it out... so the wrapper could extract (==copy) from the socket call args list using the specific argument index. However, need to try and get some feedback from the BPF verifier ;-)

@Andreagit97
Copy link
Member

This is what I was thinking about https://github.com/Andreagit97/libs/tree/support_socketcall. It is just some scratch code to let you understand what I have in mind (I hope 🤣). Maybe we can't do that at all but right now I don't see any blocker in doing something like this. Unfortunately, right now I have really little time to experiment with it but maybe we can collect some early feedback :)

P.S it doesn't compile I miss some things like the _NR__socketcall definition kernel side, the CAPTURE_SOCKETCALL ifdefs, and maybe other stuff

@hbrueckner
Copy link
Contributor Author

Hi @Andreagit97 ,

many thanks for that fast update! ❤️

This is what I was thinking about https://github.com/Andreagit97/libs/tree/support_socketcall. It is just some scratch code to let you understand what I have in mind (I hope rofl). Maybe we can't do that at all but right now I don't see any blocker in doing something like this.

I understand you way of thinking. I also had a similar approach of providing enter/exit maps in libpman which then have been consumed by the socketcall bpf program. Your idea to hook into sys_enter/sys_exit is more elegant in my eyes and I would build upon that. I will try to pull out a helper (or if BPF verifier complains a macro) to extract the socketcall arguments to keep the changes in the network syscall BPF programs as simple as possible.

Unfortunately, right now I have really little time to experiment with it but maybe we can collect some early feedback :)

No problem... I am also still in "catch-up" phase and will experiment the next days.

P.S it doesn't compile I miss some things like the _NR__socketcall definition kernel side, the CAPTURE_SOCKETCALL ifdefs, and maybe other stuff

No problem... fully sufficient to understand they idea (and not getting distracted by ifdefs 🤣 )

@Andreagit97
Copy link
Member

I will try to pull out a helper (or if BPF verifier complains a macro) to extract the socketcall arguments to keep the changes in the network syscall BPF programs as simple as possible.

Yeah, the noisiest part of this approach is that every network syscall filler should have a little block at the beginning with the syscall args management like this:

#ifdef CAPTURE_SOCKETCALL
	if(id == __NR_socketcall)
	{
		unsigned long args_pointer = extract__syscall_argument(regs, 0);
		bpf_probe_read_user(&args, 3*sizeof(unsigned long), (void*)args_pointer);
	} 
	else
	{
#endif
		args[0] = extract__syscall_argument(regs, 0);
		args[1] = extract__syscall_argument(regs, 1);
		args[2] = extract__syscall_argument(regs, 2);
#ifdef CAPTURE_SOCKETCALL
	}
#endif	

If we are able to do something generic for all syscalls even better, but let's say I would give priority to performance and code clarity, at the end every syscall is different from another so it could be acceptable to have a dedicated way to handle its arguments, so don't worry too much about that, BTW if we can find better argument management I'm onboard :)

No problem... I am also still in "catch-up" phase and will experiment the next days.

Thank you 🙏

@hbrueckner
Copy link
Contributor Author

Hi @Andreagit97 ,

I have some updates here. Below is a summary for the helper function based on your initial commit which is pretty nice! Let me know what you think.

Further, I spot some changes to your initial commit which I would like to merge in a co-authored commit (by me) with a change in the commit message. Let me know if you are fine with that.

Summary on the helper and an example how it looks like for bind

Regarding

Yeah, the noisiest part of this approach is that every network syscall filler should have a little block at the beginning with the syscall args management like this:

[...]

If we are able to do something generic for all syscalls even better, but let's say I would give priority to performance and code clarity, at the end every syscall is different from another so it could be acceptable to have a dedicated way to handle its arguments, so don't worry too much about that, BTW if we can find better argument management I'm onboard :)

Let me know if you like the extract__network_args helper.

/**
 * @brief Extract one ore more arguments related to a network / socket system call.
 *
 * This function takes into consideration whether the network system call has been
 * called directly (e.g. accept4) or through the socketcall system call multiplexer.
 * For the socketcall multiplexer, arguments are extracted from the second argument
 * of the socketcall system call.  See socketcall(2) for more information.
 *
 * @param argv Pointer to store up to @num arguments of size `unsigned long`
 * @param num Number of arguments to extract
 * @param regs Pointer to the struct pt_regs to access arguments and system call ID
 */
static __always_inline void extract__network_args(void *argv, int num, struct pt_regs *regs)
{
#ifdef CAPTURE_SOCKETCALL
        int id = syscalls_dispatcher__get_syscall_id(regs);
        if(id == __NR_socketcall)
        {
                unsigned long args_pointer = extract__syscall_argument(regs, 1);
                bpf_probe_read_user(argv, num * sizeof(unsigned long), (void*)args_pointer);
                return;
        }
#endif  
        for (int i = 0; i < num; i++)
        {
                unsigned long *dst = (unsigned long *)argv;
                dst[i] = extract__syscall_argument(regs, i);
        }
}

The helper needs to extract the system call id, as it is not provided as parameter for the sys_exit BPF program (see below).

In practice, the change, for example, for bind would then look like:

--- a/driver/modern_bpf/programs/tail_called/events/syscall_dispatched_events/bind.bpf.c
+++ b/driver/modern_bpf/programs/tail_called/events/syscall_dispatched_events/bind.bpf.c
@@ -25,8 +25,12 @@ int BPF_PROG(bind_e,
 
        /*=============================== COLLECT PARAMETERS  ===========================*/
 
+       /* Collect parameters at the beginning to easily manage socketcalls */
+       unsigned long args[1];
+       extract__network_args(args, 1, regs);
+
        /* Parameter 1: fd (type: PT_FD) */
-       s32 fd = (s32)extract__syscall_argument(regs, 0);
+       s32 fd = (s32)args[0];
        ringbuf__store_s64(&ringbuf, (s64)fd);
 
        /*=============================== COLLECT PARAMETERS  ===========================*/
@@ -54,13 +58,16 @@ int BPF_PROG(bind_x,
        auxmap__preload_event_header(auxmap, PPME_SOCKET_BIND_X);
 
        /*=============================== COLLECT PARAMETERS  ===========================*/
+       /* Collect parameters at the beginning to easily manage socketcalls */
+       unsigned long args[3];
+       extract__network_args(args, 3, regs);
 
        /* Parameter 1: res (type: PT_ERRNO) */
        auxmap__store_s64_param(auxmap, ret);
 
        /* Parameter 2: addr (type: PT_SOCKADDR) */
-       unsigned long sockaddr_ptr = extract__syscall_argument(regs, 1);
-       u16 addrlen = (u16)extract__syscall_argument(regs, 2);
+       unsigned long sockaddr_ptr = args[1];
+       u16 addrlen = (u16)args[2];
        auxmap__store_sockaddr_param(auxmap, sockaddr_ptr, addrlen);
 
        /*=============================== COLLECT PARAMETERS  ===========================*/

Also I think about adding, e.g, SC_BIND_ARGS_NUM, defines to abstract from the direct argument numbers.

With some additional fixup and conditionals, I got it working with a small socket/bind program:

{"container.id":"host","evt.args":"domain=2 type=1 proto=0 ","evt.category":"net","evt.num":99489,"evt.time":1673444147059239908,"evt.type":"socket","fd.name":null,"proc.cmdline":"bind","proc.exe":"./bind","proc.pid":36207,"proc.ppid":7998}

{"container.id":"host","evt.args":"res=0 addr=127.0.0.1:10000 ","evt.category":"net","evt.num":99492,"evt.time":1673444147059420569,"evt.type":"bind","fd.name":"127.0.0.1:10000","proc.cmdline":"bind","proc.exe":"./bind","proc.pid":36207,"proc.ppid":7998}

@Andreagit97
Copy link
Member

Andreagit97 commented Jan 26, 2023

@hbrueckner sorry for the late response :/ I love this solution it seems really smart!
We can have defines like SC_BIND_ARGS_NUM if we want, but probably in that case I would unify the number of parameters needed by the enter and exit event, otherwise, we should have SC_BIND_ARGS_NUM_E and SC_BIND_ARGS_NUM_X that could change over time if we update the event schema. Probably is better to define only SC_BIND_ARGS_NUM==3 (since the syscall bind has 3 arguments), it shouldn't be a relevant performance penalty, WDYT?

Further, I spot some changes to your initial commit which I would like to merge in a co-authored commit (by me) with a change in the commit message. Let me know if you are fine with that.

Sure! Feel free to use it :)

Completely on board with this solution! Thank you for that!

@hbrueckner
Copy link
Contributor Author

Hi @Andreagit97,

@hbrueckner sorry for the late response :/

No problem... I also have some stuff on table right now ... so will catch up soon with that and also the BPF rebase as the last driver test case PR has been merged!

I love this solution it seems really smart!

Thanks a lot! ❤️

We can have defines like SC_BIND_ARGS_NUM if we want, but probably in that case I would unify the number of parameters needed by the enter and exit event, otherwise, we should have SC_BIND_ARGS_NUM_E and SC_BIND_ARGS_NUM_X that could change over time if we update the event schema. Probably is better to define only SC_BIND_ARGS_NUM==3 (since the syscall bind has 3 arguments), it shouldn't be a relevant performance penalty, WDYT?

My thought was indeed to define the exact number of of the respective socket system calls. If there are quite less being used, it could be a case by case decision (with comment). Finally, I think I have the right direction now to proceed with updating the other socket calls. Thanks a lot for your feedback!

@FedeDP
Copy link
Contributor

FedeDP commented Mar 23, 2023

I think this can be closed, right? @hbrueckner :)

@hbrueckner
Copy link
Contributor Author

I think this can be closed, right? @hbrueckner :)

yes, merged w/ PR #811

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants