[WIP] new(libscap): POC optimized syscalls #788

incertum · 2022-12-13T09:00:23Z

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap-engine-udig

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

POC: Optimize syscalls of interest selection based on Falco's rules. For each syscall encode all syscall dependencies to allow for no ambiguity or negative impacts on sinsp state engine build-up or life-cycle management.

Disclaimer: This is a very early draft. Edited the syscall table with string replacement and only explored few initial use cases through additional manual edits. If this or a similar approach were accepted, it would require careful review of the sinsp state engine. Open to other suggestions to accomplish the same end goal and no feelings hurt should this be completely off.

Pros:

syscall table encodes all dependencies
syscall table now acts as "documentation" of how the sinsp state engine works
no ambiguity
only trace the syscalls that are defined in Falco's rules plus the syscalls that are absolutely necessary
supports all configurations
...

Cons:

cumbersome at the beginning to extend the syscall table
redundancy (since it's small data Big O should not be of concern at agent start-up)
makes syscall table wider / more crowded
...

Examples:

Solely newly spawned processes are monitored -> no need to update threads in state engine as no other syscalls a process can make are monitored.

# execve, execveat
sudo ./libscap/examples/01-open/scap-open --bpf driver/bpf/probe.o --optimized_syscalls --ppm_sc 324 --ppm_sc 325

On the other hand, below scenarios include all syscalls that change a thread's params like uid, pgid, or dir and for the syscalls that create an fd, close is included to destroy fds.

#open, openat, openat2
sudo ./libscap/examples/01-open/scap-open --bpf driver/bpf/probe.o --optimized_syscalls --ppm_sc 5 --ppm_sc 322 --ppm_sc 197

#connect
sudo ./libscap/examples/01-open/scap-open --bpf driver/bpf/probe.o --optimized_syscalls --ppm_sc 243

#accept
sudo ./libscap/examples/01-open/scap-open --bpf driver/bpf/probe.o --optimized_syscalls --ppm_sc 245

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

…l_table Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

poiana · 2022-12-13T09:00:28Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: incertum
Once this PR has been reviewed and has the lgtm label, please assign mstemm for approval by writing /assign @mstemm in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Andreagit97 · 2022-12-13T13:27:43Z

this seems an interesting idea, the real question could be is sinsp-example able to correctly detect something without crashing/leakages with just a little set of syscalls 🤔 ?

incertum · 2022-12-13T19:24:27Z

this seems an interesting idea, the real question could be is sinsp-example able to correctly detect something without crashing/leakages with just a little set of syscalls 🤔 ?

On top of my head the initial scap filesystem scan at agent startup also needs to be a bit more configurable, most notably if you only care about spawned processes, no need to populate fd tables.

For the rest of the critical syscalls there will be the distinction: (1) Some dependent syscalls are responsible to add and remove items to the state engine vs (2) update information ensuring correct current view. (1) being most critical to get right, but (2) is critical too. Thread table cleanup is through tp sched_process_exit, for the fds need to look more into it, but close syscall is only one that has EF_DESTROYS_FD flag and hence always needs to be traced for any syscall that is fd related ...

In summary, let's go 1:1 through every syscall and audit sinsp in order to derive a roadmap re what changes would be needed in addition to adjusting the syscall selection mechanisms. The potential impact of these optimizations in terms of savings in compute overhead -> very very large sums of money plus new opportunities of using Falco for more tailored environments with varying monitoring scopes.

dwindsor · 2022-12-14T21:44:46Z

In summary, let's go 1:1 through every syscall and audit sinsp in order to derive a roadmap re what changes would be needed in addition to adjusting the syscall selection mechanisms.

While I like this idea, any syscall dependency list we come up with will be fragile and subject to becoming out of date fairly quickly. The syscall dependency list will likely be different depending on compile options (e.g. whether the libs were built using -DMINIMAL_BUILD, which IIUC has a less compilcated/smaller state engine than the normal build). Also, changes to the libs' stage engine will create new/non-obvious syscall dependencies, so we'd need a way to quickly/automatically rebuild the syscall dependency tree after any such changes..

incertum · 2022-12-15T00:55:25Z

While I like this idea, any syscall dependency list we come up with will be fragile and subject to becoming out of date fairly quickly. The syscall dependency list will likely be different depending on compile options (e.g. whether the libs were built using -DMINIMAL_BUILD, which IIUC has a less complicated/smaller state engine than the normal build). Also, changes to the libs' stage engine will create new/non-obvious syscall dependencies, so we'd need a way to quickly/automatically rebuild the syscall dependency tree after any such changes.

💯 absolutely @dwindsor, thanks for calling this out!

To your point, how to automatically create a syscall dependency tree ... let me think more what options there could be besides manual expert knowledge input.

How would you feel if we extended this POC to sinsp-example soon, start the sinsp audit and then iron out details based on outcomes?

jasondellaluce · 2022-12-16T11:17:47Z

@incertum thanks a lot for driving the initiative on this topic. I support this and I'm looking forward to accomplishing the end goal, but IMO this is not the right path from few perspectives:

Maintainability - Totally endorse @dwindsor's point. It will be very hard to maintain the dependency list, and automating the process might be even harder because it's super dependent on the internal state management logic of libsinsp. Not only, this may vary between different architectures and compilation options (MINIMAL_BUILD is a valid example). The event table is already expected to grow in the future when new events will be added, I don't see making it even heavier to maintain as a long-term sustainable solution.
Separation of concerns - I don't think it's a good idea to push into libscap additional logic that is heavily dependent on the internals of libsinsp. Although it is true that sometimes the responsibility boundaries are broken (e.g. we need to define new event types in libscap to represent libsinsp's meta-evts), we can generally say that the two libraries have different scopes & goals, and should have a good separation layer. I think we need to do better in the future to further improve the differentiation of concerns, and this proposal goes in the opposite direction by representing the sinsp state logic into libscap. IMO, libscap should never be concerned about the notion of "state".
Information modeling - Here we are assuming the existence of a dependency relation between different syscalls (or events, more specifically) in order to compose libsinsp's state. I would argue that this is not an optimal way to model the problem. I think the dependency relation should be defined between subparts of the libsinsp's state and the events that help recostructing them, and not between events and other events. For example, here we're modeling the open syscall dependent to clone and close, whereas I would instead see the "sinsp file descriptor state" to be dependent on open and close (with non-mandatory dependencies on read/write too), the "sinsp process state" to be dependent on clone and execve, and the two "sinsp file descriptor state" and "sinsp process state" being loosely connected too (e.g. a file descriptor should reference its thread).

Let me know what you think. Of course, my assumptions imply that changes will mostly/entirely happen in libsinsp and that a polishing phase must happen in the state management code first. I think that should be the first step to look forward to.

incertum · 2022-12-20T00:56:20Z

Thanks @jasondellaluce overall this is all excellent feedback and enough food for thought to try tackling a POC#2 that truly is efficient (you only pay for what you get and need) and more easily maintainable. However, I think we won't get away without some sort of knowledge base, at least in part, but everything can live in libsinsp, heard and acknowledged 🙏 .

I'll leave this PR open until then, in case more folks want to offer additional thoughts. Expectation is to then attach RIP POC#1 and close it.

incertum · 2023-01-11T08:43:58Z

Closing in favor of POC (take 2) #826.

incertum added 2 commits December 13, 2022 00:32

new(libscap): POC for optimized syscalls option

9fd3e95

Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

new(driver, libscap): POC add state_dependency_ppm_sc_array to syscal…

592b51b

…l_table Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

poiana added do-not-merge/work-in-progress release-note-none kind/cleanup dco-signoff: yes kind/design labels Dec 13, 2022

poiana added kind/feature New feature or request area/libscap labels Dec 13, 2022

poiana requested review from hbrueckner and LucaGuerra December 13, 2022 09:00

poiana added the size/XXL label Dec 13, 2022

incertum mentioned this pull request Dec 20, 2022

new(falco): introduce new metrics w/ Falco internal: metrics snapshot option and new metrics config falcosecurity/falco#2333

Merged

incertum mentioned this pull request Jan 11, 2023

new(libsinsp): sinsp_repair_state_sc_set ppm sc API #826

Merged

incertum closed this Jan 11, 2023

incertum deleted the poc-optimized-syscalls branch December 8, 2023 20:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] new(libscap): POC optimized syscalls #788

[WIP] new(libscap): POC optimized syscalls #788

incertum commented Dec 13, 2022

poiana commented Dec 13, 2022

Andreagit97 commented Dec 13, 2022

incertum commented Dec 13, 2022

dwindsor commented Dec 14, 2022 •

edited

Loading

incertum commented Dec 15, 2022

jasondellaluce commented Dec 16, 2022

incertum commented Dec 20, 2022

incertum commented Jan 11, 2023

[WIP] new(libscap): POC optimized syscalls #788

[WIP] new(libscap): POC optimized syscalls #788

Conversation

incertum commented Dec 13, 2022

poiana commented Dec 13, 2022

Andreagit97 commented Dec 13, 2022

incertum commented Dec 13, 2022

dwindsor commented Dec 14, 2022 • edited Loading

incertum commented Dec 15, 2022

jasondellaluce commented Dec 16, 2022

incertum commented Dec 20, 2022

incertum commented Jan 11, 2023

dwindsor commented Dec 14, 2022 •

edited

Loading