Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ProcessTree rogue entries #4582

Merged
merged 7 commits into from
Feb 19, 2025

Conversation

geyslan
Copy link
Member

@geyslan geyslan commented Feb 8, 2025

Close: #4580
Close: #3878

1. Explain what the PR does

f2e4130 chore(proctree): mov hash computation to userspace
5d943de chore(proctree): add debug comment
71bb922 fix(proctree): exec event already brings data
0ed6a1a fix(proctree): treat kthread as special case
fa4bc32 fix(proctree): time normalization momentum
0e9065f fix(tests): kill only tracee (-x exact match)
ab4e184 chore(proctree): add exittime to debug

71bb922 fix(proctree): exec event already brings data

Populate Process TaskInfo with data from sched_process_exec event.
Without it, the ProcessTree when set as source=events will contain
entries with no relevant data.

0ed6a1a fix(proctree): treat kthread as special case

kthread was being added only as parent, so its taskinfo was not being
updated.

fa4bc32 fix(proctree): time normalization momentum

ProcessTree was populated with rogue entries because time normalization
was not performed at the correct moment, leading to a mismatch between
hashes from signals and those from events.

2. Explain how to test it

Uncomment https://github.com/aquasecurity/tracee/pull/4582/files#diff-72195925b110a2e9579f4bd9491d18052a9fb1875d2b3757987cfdfc288a1cffR183 and run tracee with --proctree source=both, --proctree source=events, --proctree source=signals for checking the current state of the ProcessTree. One might also wanna run INSTTESTS="PROCTREE_DATA_SOURCE" ./tests/e2e-inst-test.sh changing the source type inside that script before.

One won't see anymore:

  • Rogue entries (hashes without pid, tid or ppid)
  • Different hashes for the same pid, tid and ppid
  • pid 2 as a rogue entry

3. Other comments

@NDStrahilevitz
Copy link
Collaborator

@geyslan updating here our offline discussion:

Instead of changing the normalization ordering, we will align the timestamp arguments between the signal and event. This will:

  1. Give userspace the option to work on the same kind of timestamp, that is, userspace will have access to the kernel determined timestamp from the buffer
  2. Wether the hash will be calculated from the normalized or kernel original timestamp is irrelevant to the current PR, the point is consistency
  3. By submitting the timestamp from the kernel we can reduce the instruction size of the signal ebpf program since we will not hash in kernel time

Please note if i've missed or misrepresented anything by mistake

@geyslan
Copy link
Member Author

geyslan commented Feb 13, 2025

By submitting the timestamp from the kernel we can reduce the instruction size of the signal ebpf program since we will not hash in kernel time

It's a trade-off... we reduce computation time but enlarge the signal buffer with more args. Anyway, I believe the latter is better.

@NDStrahilevitz
Copy link
Collaborator

NDStrahilevitz commented Feb 13, 2025

By submitting the timestamp from the kernel we can reduce the instruction size of the signal ebpf program since we will not hash in kernel time

It's a trade-off... we reduce computation time but enlarge the signal buffer with more args. Anyway, I believe the latter is better.

Hmmm, I was under the assumption we were submitting the pids and tids. Then, this could actually increase instruction size (calculation is: -3hash,+3 new submits, +0(?) switching out hash submit to timestamp submit).

@geyslan
Copy link
Member Author

geyslan commented Feb 13, 2025

Hmmm, I was under the assumption we were submitting the pids and tids. Then, this could actually increase instruction size (calculation is: -3hash,+3 new submits, +0(?) switching out hash submit to timestamp submit).

For each process task level relation we have to submit a pair of id and timestart instead of their hash only.

@geyslan geyslan force-pushed the proctree-the-fourth branch 3 times, most recently from d6bf28c to 2aabbd4 Compare February 14, 2025 15:10
Copy link
Collaborator

@NDStrahilevitz NDStrahilevitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the this commit relevant given the later commit chore(proctree): mov hash computation to userspace?

@geyslan
Copy link
Member Author

geyslan commented Feb 17, 2025

Is the this commit relevant given the later commit chore(proctree): mov hash computation to userspace?

I didn't squash them because the previous fixes are atomic. This last one being atomic as well makes it easier to revert the hash computation relocation if decided for.

@geyslan geyslan force-pushed the proctree-the-fourth branch 2 times, most recently from c6be9b0 to 7a978f9 Compare February 17, 2025 18:36
Copy link
Collaborator

@rscampos rscampos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ProcessTree was populated with rogue entries because time normalization
was not performed at the correct moment, leading to a mismatch between
hashes from signals and those from events.
kthread was being added only as parent, so its taskinfo was not being
updated.
Populate Process TaskInfo with data from sched_process_exec event.
Without it, the ProcessTree when set as source=events will contain
entries with no relevant data.
@geyslan geyslan force-pushed the proctree-the-fourth branch from 7a978f9 to f2e4130 Compare February 19, 2025 11:53
@geyslan
Copy link
Member Author

geyslan commented Feb 19, 2025

/fast-forward

@github-actions github-actions bot merged commit f2e4130 into aquasecurity:main Feb 19, 2025
41 checks passed
@geyslan geyslan deleted the proctree-the-fourth branch February 19, 2025 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants