Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] new(libscap): POC optimized syscalls #788

Closed

Conversation

incertum
Copy link
Contributor

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap-engine-udig

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

POC: Optimize syscalls of interest selection based on Falco's rules. For each syscall encode all syscall dependencies to allow for no ambiguity or negative impacts on sinsp state engine build-up or life-cycle management.

Disclaimer: This is a very early draft. Edited the syscall table with string replacement and only explored few initial use cases through additional manual edits. If this or a similar approach were accepted, it would require careful review of the sinsp state engine. Open to other suggestions to accomplish the same end goal and no feelings hurt should this be completely off.

Pros:

  • syscall table encodes all dependencies
  • syscall table now acts as "documentation" of how the sinsp state engine works
  • no ambiguity
  • only trace the syscalls that are defined in Falco's rules plus the syscalls that are absolutely necessary
  • supports all configurations
  • ...

Cons:

  • cumbersome at the beginning to extend the syscall table
  • redundancy (since it's small data Big O should not be of concern at agent start-up)
  • makes syscall table wider / more crowded
  • ...

Examples:

Solely newly spawned processes are monitored -> no need to update threads in state engine as no other syscalls a process can make are monitored.

# execve, execveat
sudo ./libscap/examples/01-open/scap-open --bpf driver/bpf/probe.o --optimized_syscalls --ppm_sc 324 --ppm_sc 325

On the other hand, below scenarios include all syscalls that change a thread's params like uid, pgid, or dir and for the syscalls that create an fd, close is included to destroy fds.

#open, openat, openat2
sudo ./libscap/examples/01-open/scap-open --bpf driver/bpf/probe.o --optimized_syscalls --ppm_sc 5 --ppm_sc 322 --ppm_sc 197

#connect
sudo ./libscap/examples/01-open/scap-open --bpf driver/bpf/probe.o --optimized_syscalls --ppm_sc 243

#accept
sudo ./libscap/examples/01-open/scap-open --bpf driver/bpf/probe.o --optimized_syscalls --ppm_sc 245

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
…l_table

Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
@poiana
Copy link
Contributor

poiana commented Dec 13, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: incertum
Once this PR has been reviewed and has the lgtm label, please assign mstemm for approval by writing /assign @mstemm in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana added kind/feature New feature or request area/libscap labels Dec 13, 2022
@Andreagit97
Copy link
Member

this seems an interesting idea, the real question could be is sinsp-example able to correctly detect something without crashing/leakages with just a little set of syscalls 🤔 ?

@incertum
Copy link
Contributor Author

this seems an interesting idea, the real question could be is sinsp-example able to correctly detect something without crashing/leakages with just a little set of syscalls 🤔 ?

On top of my head the initial scap filesystem scan at agent startup also needs to be a bit more configurable, most notably if you only care about spawned processes, no need to populate fd tables.

For the rest of the critical syscalls there will be the distinction: (1) Some dependent syscalls are responsible to add and remove items to the state engine vs (2) update information ensuring correct current view. (1) being most critical to get right, but (2) is critical too. Thread table cleanup is through tp sched_process_exit, for the fds need to look more into it, but close syscall is only one that has EF_DESTROYS_FD flag and hence always needs to be traced for any syscall that is fd related ...

In summary, let's go 1:1 through every syscall and audit sinsp in order to derive a roadmap re what changes would be needed in addition to adjusting the syscall selection mechanisms. The potential impact of these optimizations in terms of savings in compute overhead -> very very large sums of money plus new opportunities of using Falco for more tailored environments with varying monitoring scopes.

@dwindsor
Copy link
Contributor

dwindsor commented Dec 14, 2022

In summary, let's go 1:1 through every syscall and audit sinsp in order to derive a roadmap re what changes would be needed in addition to adjusting the syscall selection mechanisms.

While I like this idea, any syscall dependency list we come up with will be fragile and subject to becoming out of date fairly quickly. The syscall dependency list will likely be different depending on compile options (e.g. whether the libs were built using -DMINIMAL_BUILD, which IIUC has a less compilcated/smaller state engine than the normal build). Also, changes to the libs' stage engine will create new/non-obvious syscall dependencies, so we'd need a way to quickly/automatically rebuild the syscall dependency tree after any such changes..

@incertum
Copy link
Contributor Author

While I like this idea, any syscall dependency list we come up with will be fragile and subject to becoming out of date fairly quickly. The syscall dependency list will likely be different depending on compile options (e.g. whether the libs were built using -DMINIMAL_BUILD, which IIUC has a less complicated/smaller state engine than the normal build). Also, changes to the libs' stage engine will create new/non-obvious syscall dependencies, so we'd need a way to quickly/automatically rebuild the syscall dependency tree after any such changes.

💯 absolutely @dwindsor, thanks for calling this out!

To your point, how to automatically create a syscall dependency tree ... let me think more what options there could be besides manual expert knowledge input.

How would you feel if we extended this POC to sinsp-example soon, start the sinsp audit and then iron out details based on outcomes?

@jasondellaluce
Copy link
Contributor

@incertum thanks a lot for driving the initiative on this topic. I support this and I'm looking forward to accomplishing the end goal, but IMO this is not the right path from few perspectives:

  • Maintainability - Totally endorse @dwindsor's point. It will be very hard to maintain the dependency list, and automating the process might be even harder because it's super dependent on the internal state management logic of libsinsp. Not only, this may vary between different architectures and compilation options (MINIMAL_BUILD is a valid example). The event table is already expected to grow in the future when new events will be added, I don't see making it even heavier to maintain as a long-term sustainable solution.
  • Separation of concerns - I don't think it's a good idea to push into libscap additional logic that is heavily dependent on the internals of libsinsp. Although it is true that sometimes the responsibility boundaries are broken (e.g. we need to define new event types in libscap to represent libsinsp's meta-evts), we can generally say that the two libraries have different scopes & goals, and should have a good separation layer. I think we need to do better in the future to further improve the differentiation of concerns, and this proposal goes in the opposite direction by representing the sinsp state logic into libscap. IMO, libscap should never be concerned about the notion of "state".
  • Information modeling - Here we are assuming the existence of a dependency relation between different syscalls (or events, more specifically) in order to compose libsinsp's state. I would argue that this is not an optimal way to model the problem. I think the dependency relation should be defined between subparts of the libsinsp's state and the events that help recostructing them, and not between events and other events. For example, here we're modeling the open syscall dependent to clone and close, whereas I would instead see the "sinsp file descriptor state" to be dependent on open and close (with non-mandatory dependencies on read/write too), the "sinsp process state" to be dependent on clone and execve, and the two "sinsp file descriptor state" and "sinsp process state" being loosely connected too (e.g. a file descriptor should reference its thread).

Let me know what you think. Of course, my assumptions imply that changes will mostly/entirely happen in libsinsp and that a polishing phase must happen in the state management code first. I think that should be the first step to look forward to.

@incertum
Copy link
Contributor Author

Thanks @jasondellaluce overall this is all excellent feedback and enough food for thought to try tackling a POC#2 that truly is efficient (you only pay for what you get and need) and more easily maintainable. However, I think we won't get away without some sort of knowledge base, at least in part, but everything can live in libsinsp, heard and acknowledged 🙏 .

I'll leave this PR open until then, in case more folks want to offer additional thoughts. Expectation is to then attach RIP POC#1 and close it.

@incertum
Copy link
Contributor Author

Closing in favor of POC (take 2) #826.

@incertum incertum closed this Jan 11, 2023
@incertum incertum deleted the poc-optimized-syscalls branch December 8, 2023 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants