Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(proposals): proposal for a new bpf probe #268

Merged
merged 1 commit into from
Apr 19, 2022

Conversation

Andreagit97
Copy link
Member

Signed-off-by: Andrea Terzolo andrea.terzolo@polito.it

What type of PR is this?

/kind design

Any specific area of the project related to this PR?

/area proposals

What this PR does / why we need it:

Hi all 🖖 With this PR I want to propose a possible design for a new BPF probe that exploits all modern tracing technologies to improve performance and usability. I spent some time dreaming about this solution, and now it finally seems within reach! Please don't forget that this is just a proposal to put together possible ideas so, don't be shy, if you have suggestions, don't hesitate to share your thoughts!

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
@poiana
Copy link
Contributor

poiana commented Mar 29, 2022

Hi @Andreagit97. Thanks for your PR.

I'm waiting for a falcosecurity member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@FedeDP
Copy link
Contributor

FedeDP commented Mar 29, 2022

This is incredibly cool, man! Thank you very much!
I think that leveraging eBPF CO-RE alone is huge, given that lots of possible adopters hit some issues downloading a prebuilt probe/kmod.
Moreover, possible performance gains and using newest features in the eBPF world would be really great, for both users and developers.
Finally IMHO the possibility to test the new probe is one of the most tricky but awesome part.

Thank you, hopefully we'll get our hands dirty on this asap :)

@krisnova
Copy link
Contributor

This is beautiful. Well done. I have some questions on specifically what the embedded bytecode does. However at a glance this looks like a very persuasive proposal.

You have my full support for pushing this through. The ring buffer component alone is enough to get this on my radar!


For Falco sustainability and reliability I would encourage us to use this via a feature gate on day one. Ideally this new probe is something that users can opt in to as needed. I would not like to release a new version of Falco that only has this code available.

So maybe it would be better to frame this as an "alternative" instead of a hard "replacement"?

@Andreagit97
Copy link
Member Author

Hi @kris-nova, thank you very much for your support! I completely agree with you: this probe is just an alternative to the current one. Users would choose according to their kernel versions and specific needs. For what concerns the bytecode, I will try to explain a little bit here.
Today when we compile with clang, we obtain a probe.o file that we have to supply separately from Falco, like the kernel module. With the new probe, this bytecode is directly embedded in a C header file that will be compiled directly inside libscap. This is possible through bpftool skeleton feature. So when we have to load the bpf probe, we don't need anymore to download the probe.o file with bytecode from our buckets or compile it locally because it is already self-contained in our executable. Libbpf will directly relocate this bytecode inside the header file according to the CO-RE approach. I hope it's a bit clearer now. If you have any other doubts, please let me know.

@Molter73
Copy link
Contributor

Very cool proposal indeed! I just wanted to add there is the option when using BTF probes to provide a path to a custom file holding this information, which could be generated for kernels that were compiled without BTF support as well.

The implications might be more than what I understand, but I think this means we could eventually use this new probe with kernels older than 5.8, the raw tracepoint would need to be used in that case but everything else might just fall into place. Maybe not something to address right now, but it could be a way to re-unify the existing and new probe designs if it ever comes to that. It's also important to note that loading incorrect BTF information could result in some unwanted behavior, so that needs to be handled with care.

@Andreagit97
Copy link
Member Author

Hi @Molter73, nice to see you again here 😄 I think we have to split this discussion into three main topics:

1. The target kernel version.

The main goal behind this new probe is to use emerging tracing technologies to improve performance and usability. IMHO, there are 3 key points for reaching this:

  1. BPF ringbuf (introduced in 5.8)
  2. BPF_PROG_TYPE_TRACING programs (introduced in 5.5)
  3. BPF global variables (introduced in 5.5)

As you can notice, with kernels older than 5.5 we cannot take advantage of what we can call "modern bpf". For these reasons, applying the new probe to older kernels will have no real benefits in terms of performance. Moreover, the new design would highly rely on these new concepts, so it wouldn't be so easy to have a sort of "common architecture" between our probes. I know that this is not your point, and I'm sorry for the digression, but I just want to be as clearer as possible on that.

2. What we could backport to the current probe

  • CO-RE approach has no strict kernel requirements, so we could introduce it in our current probe. Anyway, we first need to adopt libbpf as a loader.
  • We could try to lightweight the actual probe architecture making it a little bit more linear.
  • Last but not least, there is also the possibility to backport the testing phase to our actual fillers!

These are definitely important points but more from a developer perspective than from a user one, so, as a first step, I would focus my attention on what could really bring us a huge value: having a more performant and modular probe that is easier to understand and to contribute. After that, I completely agree with you: we can try to resue and adapt what we can in our current probe. IMHO the rationale here is: as a first step, let's try to reach something working, keeping our current probe as a lifebelt. As soon as we have something satisfying and quite stable, we could start the porting phase.

3. What we can do with BTF information.

Right now, we can use BTF information for two main scopes:

  1. CO-RE relocations with libbpf

As you correctly pointed out in your comment, libbpf allows performing CO-RE relocations through external BTF information where the actual kernel BTF are not present. The verifier is not involved in this process, it only receives the already relocated bytecode.

  1. Direct kernel memory access, exploiting BTF on the verifier side.

The problem here is that this direct access must be granted directly by the verifier and not by libbpf. The verifier cannot trust external BTF sources, as pointed out also in the pointer you provided us. The external source could be tainted, causing faults at run-time.
Today we need BTF for BPF_PROG_TYPE_TRACING, but in the next future, we could exploit them also with BPF_PROG_TYPE_LSM programs, which are really interesting for our use cases as well explained here. Nowadays, I see BTF as a must-have feature.

Let me know if these thoughts sound good to you :)

@Molter73
Copy link
Contributor

Molter73 commented Apr 1, 2022

Thanks for the detailed explanation @Andreagit97! I must admit I'm not as well versed in eBPF as I'd like to and appreciate any bit of information I can get to learn more.

My main concern with a new probe being created is that this will add an extra overhead to the development of new features. Things like supporting new syscalls already require developers to implement them both in the kernel module and eBPF probe, after this new probe is created the requirement of a third implementation might be needed as well for the new BTF enabled probe, so I'm trying to see if there's any way we could cut down on development effort without leaving anyone behind.

@Andreagit97
Copy link
Member Author

Hi @Molter73 , I well understand your concerns, I thought a lot about this question. Unfortunately, we cannot avoid reimplementing our fillers since the underlying technologies are different. IMHO having a unique shared codebase for the bpf code would be a nightmare to maintain and would also decrease the actual performance since we should add some code to manage this duality. At the end of the day, the answer could be simply "yes", for a certain period of time we would probably need to add a new syscall for all our drivers. But here can come into play what we have said in the previous comment: when we will have something working with the new probe, we could backport the new structure to the current probe obtaining a sort of common pattern to manage events. The idea is to reach something like this also in the current probe:

int BPF_PROG(ppme_syscall_open_x,
	     struct pt_regs *regs,
	     long ret)
{
	/* Get per-CPU auxiliary map. */
	struct auxiliary_map *aux_map = get_auxiliary_map();
	
	/* Fill this map with the event header and all parameters we have to catch. */
	aux_map_store_event_header(PPME_SYSCALL_OPEN_X, aux_map); 

	/* 1° Parameter: ret (type: PT_FD) */
	aux_map_store_s64_param(ret, aux_map);

	/* 2° Parameter: name (type: PT_FSPATH) */
	unsigned long name = get_argument(regs, 0);
	aux_map_store_charbuf_param(name, aux_map);

	/* 3° Parameter: flags (type: PT_FLAGS32) */
	unsigned long flags = get_argument(regs, 1);
	aux_map_store_u32_param(open_flags_to_scap(sys_arg), aux_map);

	/* 4° Parameter: mode (type: PT_UINT32) */
	unsigned long mode = get_argument(regs, 2);
	aux_map_store_u32_param(open_modes_to_scap(flags, mode), aux_map);

	/* 5° Parameter: dev (type: PT_UINT32) */
	unsigned long dev = extract_dev_from_fd(ret);
	aux_map_store_u32_param(dev, aux_map);

	/* Copy the content of the map into the ring buffer and push it to user-space. */
	copy_aux_map_into_ringbuf(aux_map);
	return 0;
}

Obviously, the underlying technologies will be different, but if we are good enough, we could try to share the code between probes using only different helper functions that wrap different technologies! However, before doing this, we need CO-RE, libbpf, and a great refactor of our code: the state-of-the-art of our actual probe doesn't allow us to dream solutions like this. Anyway, I see this as a second step to avoid increasing the developer effort, as I said in the previous comment.
We have also to add that we usually don't touch our bpf codebase unless we need to add a new syscall or to fix something that is broken, so the effort here could be quite small.
Let me know if these ideas sound good to you 😄

@krisnova
Copy link
Contributor

krisnova commented Apr 6, 2022

Future thoughts:

  • How does this integrate with Cilium at runtime on a node? I believe the Cilium project uses perf buffers through the perf package. Would there be a conflict if both Falco and Cilium move to try to use the same kernel buffer in the future?
  • Performance hits. I imagine this is more performant than the traditional buffer mechanism. Do we have plans to benchmark this anywhere? Could make Falco very attractive!
  • The dynamic system to manage new system call integrations in the future seems exciting. Again, I want to see performance here.

If we can get the performance increase, and a migration path laid out I think it could be possible to move to a new probe all together. Although I am wondering what the other maintainers have to say about this.

Approving for now. Happy to help test, develop, debug on my end. Great work.

/approve

@Andreagit97
Copy link
Member Author

Hi @kris-nova thank you very much for your support! I will try to address here your points:

  1. This is a great question. Sharing the same physical buffer is quite difficult right now, since Falco considers itself as the unique consumer. We should implement a selective consuming logic to allow mechanisms like this. This is a bit futuristic but very smart!

  2. Yes, you are right, the ring-buffer is more performant than the simple perf-buffer. There are also BPF developers' benchmarks that prove it.

  3. I will address here what concerns the performance. My idea is to implement as a first thing a set of significant syscall that could provide us a real-use case to test. These are the syscall that we catch today in kernel simple-consumer mode:

 - open
 - close
 - ioctl
 - pipe
 - dup
 - dup2
 - socket
 - connect
 - accept
 - sendto
 - recvfrom
 - sendmsg
 - recvmsg
 - bind
 - listen
 - socketpair
 - setsockopt
 - clone
 - fork
 - vfork
 - execve
 - kill
 - flock
 - chdir
 - fchdir
 - rename
 - mkdir
 - rmdir
 - creat
 - link
 - unlink
 - symlink
 - chmod
 - fchmod
 - lchown
 - getrlimit
 - ptrace
 - syslog
 - setuid
 - setgid
 - setpgid
 - setsid
 - setresuid
 - setresgid
 - utime
 - uselib
 - sched_setparam
 - setrlimit
 - chroot
 - mount
 - umount2
 - quotactl
 - tkill
 - inotify_init
 - openat
 - mkdirat
 - unlinkat
 - renameat
 - linkat
 - symlinkat
 - fchmodat
 - unshare
 - signalfd
 - timerfd_create
 - eventfd
 - accept4
 - signalfd4
 - eventfd2
 - dup3
 - pipe2
 - inotify_init1
 - prlimit64
 - open_by_handle_at
 - setns
 - process_vm_readv
 - process_vm_writev
 - renameat2
 - seccomp
 - bpf
 - execveat
 - userfaultfd
 - copy_file_range
 - clone3
 - openat2

If we are able to implement also just a subset of them, we can have a reliable sample for our tests.
After this implementation phase, my idea is to reproduce the same tests performed by @Stringy in this document and see how much we have reduced the eBPF instrumentation time. I agree with you the sooner we have the performance tests, the better it is to understand the future of this new probe.
If we all agree on the idea of this new probe, we can start to implement it and see the real gain.

@leogr
Copy link
Member

leogr commented Apr 19, 2022

/ok-to-test

Copy link
Member

@leogr leogr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can just say that this is very cool and very well presented. Congrats. I really like this proposal. So, it's time to approve it for me. 😺

/approve

@poiana
Copy link
Contributor

poiana commented Apr 19, 2022

LGTM label has been added.

Git tree hash: 9bcb76d70d75173ce778802f0b3ff4111d85b6c8

Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

As already stated, huge +1 from me! Thanks for this great work!

@poiana
Copy link
Contributor

poiana commented Apr 19, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, FedeDP, kris-nova, leogr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants