-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce conditional kernel-side event filtering #1557
Comments
Thanks @stevenbrz. In fact, I've been thinking about pushing more functionality like filtering into the kernel driver ever since I started contributing. However, reality set in quickly ... Here are some challenges to be aware of: Check out our recently added kernel testing framework proposal here. It highlights that anything you do in the kernel driver happens in the application context, so yes, you can slow down all your apps, which SREs won't like. That's why we still have to find the right balance. We're all painfully aware that pushing all events up to userspace won't scale, but you also don't want to go crazy in the driver either. Maintaining our drivers and ensuring they remain compatible across our extensive kernel support matrix is a significant burden. The eBPF verifier rejecting probes is one of the most frustrating experiences for adopters, and this PR for example highlights this pain point.
I suppose you tried different configs. Out of curiosity, was just monitoring fork/execve* syscalls already a problem? Do you use Falco with very fine-tuned rules or are you looking to use it for generous data collection (aka noisy rules, lots of outputs)? Asking because this could also cause increased back pressure. Do your servers have 96+ CPUs?
Amazing! Looking forward to it! By the way for such a significant new feature we typically open proposals. It would also be great to quantify performance improvements using an MVP. I'd prioritize filtering
Tangentially, I responded yesterday to this issue falcosecurity/rules#196 which is also around event counting and benign events / anomaly detection. |
Hi @incertum, thanks for the quick reply!
Yeah it's a tricky balance. If this feature is disabled by default and remains an advanced/experimental setting (with these tradeoffs documented), could it be easier to argue its inclusion?
fork, exec*, and open* comprise the vast majority of relevent syscalls on our systems so that's why I wanted to target filtering most of the noise from those.
We use the new base set of rules included in the
Yeah, we have Falco deployed on servers with up to 128 CPUs.
We'd be willing to accept some missing process lineage/metadata stemming from filtering the noisier syscalls if it meant lowering our drop rate of potentially useful signal. This is assuming the state engine can handle cleaning out stale data if events are filtered. |
yes you see throughout the project that newer features are typically disabled by default. However, the maintenance burden and making sure the eBPF verifier does not complain would still be there. Btw there are other discussions around allowing attaching custom probes. Perhaps that could be a path forward for this as well? Details TBD in a proposal down the road.
If you could try just with the absolute minimum fork/exec* related ones and nuke all other ones and report back if that at least works I would appreciate it 😉 thanks in advance! We had similar discussions in the past (96 CPU machines), for example checkout falcosecurity/falco#2296 (comment). Did you try testing such a matrix? Of course now it's
huh ok here is the problem .... I think you are the first adopter I know of who tries Falco on such massive servers. libscap scans each CPU in the |
@stevenbrz just occurred to me: Could you try using the modern BPF driver instead (you use kernel 5.15 and are therefore eligible for it)? Try increasing the number of CPUs per ring buffer to As per a conversation we had with the kernel eBPF experts there should be no contention concerns kernel side. The default is 2 CPUs for each buffer for the modern BPF driver since the kernel is accounting memory wrongly (twice) for the new BPF ring buffer compared to the older perf buffer. |
So I tried each set of syscalls - only adding the previous set to the next instead of replacing it outright. Surprisingly no drops until the third iteration! I also tried setting the CPUs per ring buffer from 2 to 6 and increased the buffer size by 2x in the last round, but that didn't appear to make a difference. And as for the rules, the following is the only file I included:
https://gist.github.com/stevenbrz/69391aa71b22d205b6add88ae10fb905 |
🤯 scap event rates are over 21 Million / second !!! The libbpf stats (which you hadn't turned on) tracepoint invocation rates must be off the roof. I hate to break it, but Falco cannot handle this (yet). Falco can probably handle scap event rates of up to 100K / second with acceptable drop rates. And anything below 60-70K / second should likely be no problem / no drops. Your servers have low "process spawn rates", but are very network and file open heavy. Typically file opens are the biggest problem AFAIK. This may not be the advice you are hoping for, but you may want to consider cutting your losses (for now) and at least perform security monitoring around spawned processes and then crawl and find solutions to add other syscalls. For example, because of the TOCTOU attack vector we expanded the monitoring scope to enter events (see https://github.com/falcosecurity/libs/pull/235/files). You could manually cut that in half if it's an acceptable risk (push empty params or revert those changes all together). Wrt network you could also try only pushing TCP up and skip any other traffic. However, with your event rates you may really need very aggressive ip and file paths prefix filtering kernel side (be aware of the socket or bind syscalls iinter-dependencies for some network related syscalls). Maybe try to hard-code such an approach to see if there may be hope. If you search for |
This is all great information, thank you! I'll look into what you said about enter events.
So just so I understand correctly, you mean for example I could patch logic into |
You should be able to return and drop out early before making the enter tail call. userspace should be resilient to missing events.
That's what I would try. Or maybe for early testing try the reverse, that is, only push onto the buffer on a match, e.g. /etc dir. Be aware that in the kernel you don't have Tagging some of our eBPF experts who may have additional advice @Andreagit97 @FedeDP 😉 Very curious to see if there is any hope for such beefy servers 🙃 thanks for reaching out! |
FWIW, we have been experiencing similar issues on our busiest servers. We are in the process of evaluating Tetragon. Primarily for its in-kernel filtering capabilities. HTH. |
I've thrown together a patch that allows you to specify a set of filters in the config with each filter consisting of the syscall number, arg number for the string to filter on, and finally a set of filter prefixes e.g.
It appears to work well on one of our higher load systems reducing memory usage by ~60% and the drop rate by ~30% only filtering on |
That's amazing! For which driver? All of them? |
Ah sorry, I only implemented it for the |
Just acknowledging that kernel-side (where we only have the raw arg) we can have Still thinking ... not sure yet what we could do about that to support most use cases ...
This is extremely valuable feedback, thanks so much for that! Hmmm same thinking ... |
Yeah great catch! So if the concern is for an attacker to abuse the filters to, for example, open Another attack vector I can think of is that you wouldn't want to filter any directories that an attacker has write access to, since then they could either place their payload under that directory or symlink from it. Given that |
Looking back at first finding out if Falco can be usable when attempting to monitor files on your system, what would you think about the following approach? The kernel driver has some sampling logic (not used in Falco! Used in the original sysdig tool for system diagnosis purposes). Not sure if this is for you, but I would first like to find out what the "kernel event rate" on your system is that Falco can handle. We could find this out by adjusting the sampling in experiments, but I know it's gonna be some work to perform all these tests. WDYT? After that we could come back and see how aggressive possible kernel side filtering would need to be or we find out there are more problems elsewhere. |
Hi @incertum, sorry for the late response. I agree it would be incredibly valuable to know the event rate Falco can support. I assume it would be pretty uniform across host types given the event processor is single-threaded. Further, I think building some sort of benchmark to measure this across Falco driver versions or even rulesets would be nice to measure the effectiveness of the various tweaks we want to test. Will try to throw something together soon - still working on testing and iterating on my filtering patch at the moment. |
Thank you @stevenbrz - we now have a new repo https://github.com/falcosecurity/cncf-green-review-testing for benchmarking purposes. We are still developing it - also on the CNCF side. We would love your contributions to help shape these efforts. We call it "kernel event rate" for now - just have a suspicion that it's not just the pure event rate, but it could also have to do with the nature of events and bursts. Lastly another opportunity would be to help us shape some of the new guides: https://falco.org/docs/troubleshooting/dropping/ |
Hi, we were able to test our patch more widely, so here are some of our results: For event rate over the course of ~4 days, here's what we have (aggregating over
Now for drop rate (percentages from Falco version
Falco Version
It looks like we get drastically better worst-case performance at the cost of slightly worst 90-99th performance which is a win for us. There's another variable of the switch between 0.36.2->0.37.0, but it's likely not too significant looking at the changelog. |
Thank you very much for sharing these updates @stevenbrz. I wrote it earlier and still believe that kernel-side filtering needs to be part of Falco's future (one way or another), while of course we still have to find the best way(s) of doing it and it's going to be opt-in only for sure. @falcosecurity/libs-maintainers proposing to move ahead with a formal proposal under https://github.com/falcosecurity/libs/tree/master/proposals to discuss details and timelines more concretely? WDYT @leogr?Realistically such a new feature will take at least 2 releases from proposal to first implementation. |
Is your patch publicly available?
We can still discuss this thread, but discussing on a proposal draft would work as well. I'd be very curious to learn about the current PoC, first. |
@stevenbrz thanks for sharing the patches. Shall we tackle the proposal after the Falco 0.38.0 release (end of May)? This would give all maintainers a bit more time to take a look at the current patch and comment on here a bit more. At the same time please feel free to already go ahead and open the proposal as @leogr already posted. [Just FYI: I'll be out quite a bit the upcoming weeks, I'll take a much closer look end of May.] |
Sure, I can work on opening a proposal summarizing the feature in the coming days. |
Issues go stale after 90d of inactivity. Mark the issue as fresh with Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with Provide feedback via https://github.com/falcosecurity/community. /lifecycle stale |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. Mark the issue as fresh with Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with Provide feedback via https://github.com/falcosecurity/community. /lifecycle stale |
/remove-lifecycle stale |
Motivation
Hi! We're deploying Falco on large, highly utilized instances. Despite allocating an entire CPU core to Falco, we experience a high percentage of event drops. We have a high volume of nearly identical and benign events coming through on hosts that ultimately consume resources having to run through the rule evaluation pipeline.
Feature
It would be excellent to be able to specify a set of filters for dropping events in kernel-space before even getting allocated on the ring buffer. For example, a filter could ignore all exec events with a specific
proc.cmdline
or similarly open events with a givenfd.name
.We're looking to try and make a patch supporting this, and it would be great if we could do it in such a way that it could ultimately be beneficial to the upstream.
Diving into the code, it looks like it could potentially live here where we could peek into
ctx
, filtering out syscalls that match patterns defined in the config. I'm not sure the best way to do this generically, but even just supportingexec*
andopen*
would likely benefit us a lot.Any thoughts on this approach or if there's potentially a better way to do this?
Alternatives
We've tried adjusting
base_syscalls.custom_set
in the config to the minimum set we need in addition to adjusting the ring buffer parameters with no perceivable improvement.Additional context
We're running the latest
0.36.2
release on a mixture of ARM and x86 boxes running CentOS Stream and AlmaLinux 9 with kernel versions5.15
and later.The text was updated successfully, but these errors were encountered: