Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve cpu/memory (proctree wise) #4503

Merged
merged 26 commits into from
Jan 28, 2025

Conversation

geyslan
Copy link
Member

@geyslan geyslan commented Jan 14, 2025

Close: #4546

1. Explain what the PR does

Running local proctree stressor (deterministic workload) we got these results:

| **Metric**         | **main**  | **new**   | **Difference (%)** |
| ------------------ | --------- | --------- | ------------------ |
| **CPU Peak (%)**   | 3.51%     | 3.27%     | -6.84%             |
| **CPU Avg (%)**    | 2.93%     | 2.72%     | -7.17%             |
| **Malloc Rate**    | 2,143,174 | 1,884,787 | -12.06%            |
| **Heap Peak (MB)** | 747 MB    | 694 MB    | -7.09%             |
| **Heap Avg (MB)**  | 258 MB    | 229 MB    | -11.24%            |
| **Heap Obj Avg**   | 1,102,839 | 1,101,027 | -0.16%             |

Tracee flags:

-e sched_process_exec,sched_process_fork,sched_process_exit --proctree source=both --proctree process-cache=16384 --proctree thread-cache=32768 --proctree disable-procfs -o none

Stressor details:

#!/bin/sh

cleanup() {
  echo "Finishing ..."
  kill -TERM -- $tracee_pid
  wait $tracee_pid

  kill -TERM -- $run_stress_1_pid $run_stress_2_pid $run_stress_3_pid $run_stress_4_pid $run_stress_5_pid 2>/dev/null
  exit 0
}

trap cleanup INT

get_pmap() {
	echo "$1"
	sudo pmap -x $(pgrep tracee) | grep -E "c000000000"
}

run_stress() {
	echo "Running stress $1"
	i=0
	iterations=2000000
	while [ "$i" -ne "$iterations" ]; do
		ls >/dev/null
		if [ "$1" -eq 1 ] && [ $((i % (iterations/5))) -eq 0 ]; then
			get_pmap "pmap after $i iterations"
		fi
		i=$((i+1))
	done
	echo "Finished stress $1"
}

default_proctree_cache_size=32768
# default_proctree_cache_size=16384
# default_proctree_cache_size=8192
process_cache_size=16384
thread_cache_size=32768
if [ -z "$process_cache_size" ]; then
	process_cache_size=$default_proctree_cache_size
fi
if [ -z "$thread_cache_size" ]; then
	thread_cache_size=$default_proctree_cache_size
fi
proctree_cache_flag="--proctree process-cache=${process_cache_size} --proctree thread-cache=${thread_cache_size}"
proctree_source_flag="both"
proctree_flag="--proctree source=$proctree_source_flag $proctree_cache_flag"
# proctree_flag="$proctree_flag --proctree disable-procfs" # disabling procfs
proctree_flag= # disabling proctree

eventsflag="-e sched_process_exec,sched_process_fork,sched_process_exit"
eventsflag= # disabling events (only default events)

gogc=5
tracee_bin="$TRACEE"
if [ -z "$tracee_bin" ]; then
	tracee_bin="/home/gg/code/tracee/dist/tracee"
fi

tracee_cmd="$tracee_bin --metrics --pyroscope --pprof -s tree=$$ $eventsflag $proctree_flag -o none"

echo "Running tracee in background"
echo "$tracee_cmd"
GOGC=$gogc $tracee_cmd &
tracee_pid=$!

sleep 1
get_pmap "pmap right before start"
sleep 5 && get_pmap "pmap after 5 seconds from start"
run_stress 1 &
run_stress_1_pid=$!
run_stress 2 &
run_stress_2_pid=$!
run_stress 3 &
run_stress_3_pid=$!
run_stress 4 &
run_stress_4_pid=$!
run_stress 5 &
run_stress_5_pid=$!
run_stress 6 &
run_stress_6_pid=$!
run_stress 7 &
run_stress_7_pid=$!
run_stress 8 &
run_stress_8_pid=$!
wait $run_stress_1_pid
wait $run_stress_2_pid
wait $run_stress_3_pid
wait $run_stress_4_pid
wait $run_stress_5_pid
wait $run_stress_6_pid
wait $run_stress_7_pid
wait $run_stress_8_pid

get_pmap "pmap right after finished stress"
sleep 20 && get_pmap "pmap after 20s"
# sleep 30 && get_pmap "pmap after 30s"
# sleep 30 && get_pmap "pmap after 30s"
# sleep 120 && get_pmap "pmap after 120s"

cleanup

8 threads with 2_000_000 ops each running on:

cpu: AMD Ryzen 9 7950X 16-Core Processor
MemTotal: 64923992 kB (64GB)


eaf0311 chore(proctree): set new default cache sizes
5eba933 chore(cmd): add proctree disable-procfs
45d4ae8 perf(controlplane): introduce signal pool
2b6d41c perf(proctree): improve Process concurrency ctrl
5ea4bee perf(proctree): change Thread concurrency control
8cc3abf perf(proctree): reduce lock contention
fbfbe99 chore(proctree): remove leftover
4d7fe28 chore/perf(proctree): comment out exit fields
2457a6d perf(proctree): introduce feed pools
1c9f52a perf(proctree): move functions from FeedFromFork
a1ceb10 perf(events): improve ArgVal
617fe40 chore(events): add BenchmarkArgVal
f39eaa1 perf: improve procTreeExitProcessor
70717e9 chore: add Benchmark_procTreeExitProcessor
3156921 perf: remove unused ExecFeed interpreter fields
5c109d6 perf(controlplane): improve procTreeExecProcessor
42e1c5e chore(controlplane): add Benchmark_procTreeExecProcessor
255fec0 perf(ebpf): improve procTreeExecProcessor
c9c8723 chore(ebpf): add Benchmark_procTreeExecProcessor
c2223ff perf(controlplane): improve procTreeForkProcessor
fd9a666 chore(controlplane): add procTreeForkProcessor bench
2584505 perf(ebpf): improve procTreeForkProcessor
7eb6b91 chore(ebpf): add Benchmark_procTreeForkProcessor
0a698fa perf: reduce events.Core lock contention
5f1275c chore(bufferdecoder): set zero from def fields
c83828f chore(bufferdecode): add DecodeArguments benchmark

eaf0311 chore(proctree): set new default cache sizes

processes: 10928
threads: 21856

5eba933 chore(cmd): add proctree disable-procfs

This also remove some leftovers.

45d4ae8 perf(controlplane): introduce signal pool

It helps to reduce the stack dynamic growth and the number of
allocations, which is good for performance.

5ea4bee perf(proctree): change Thread concurrency control

Mutex is a heavy lock, and it's not necessary to use it in the Thread
concurrency control. This change replaces the mutex with atomic
operations to reduce contention, what also reduces memory footprint.

8cc3abf perf(proctree): reduce lock contention

Reuse the same TaskInfo reference avoiding the need to lock to fetch it.

This also reorders the creation of the process and thread.

4d7fe28 chore/perf(proctree): comment out exit fields

The unique ExitFeed fields being tackeld by FeedFromExit() are
TaskHash and TimeStamp. Then this commit comments out the other fields
that are not being used by the proctree in this context.

2457a6d perf(proctree): introduce feed pools

It helps to reduce the stack dynamic growth and the number of
allocations, which is good for performance.

Changelog fields now holds pointers to the feeds, instead of the feeds
themselves. This way, it aligns with the new feed pointers avoiding
de-referencing.

a1ceb10 perf(events): improve ArgVal

| Sub-Benchmark    | Old (ns/op) | New (ns/op) | Change (%) |
|------------------|-------------|-------------|------------|
| valid_args       | 14.43       | 13.35       | -7.48%     |
| invalid_val_type | 551.7       | 589.8       | +6.90%     |
| not_found_arg    | 499.2       | 586.0       | +17.38%    |

The valid_args is the most relevant case, since it traverses args based
on a specific order. The other cases are not deterministic and used to
measure upcoming changes for the worst case.

---

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^BenchmarkArgVal$
github.com/aquasecurity/tracee/pkg/events/parse -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/events/parse
cpu: AMD Ryzen 9 7950X 16-Core Processor
BenchmarkArgVal/int32/valid_args-32        100000000   13.35 ns/op   0 B/op   0 allocs/op
BenchmarkArgVal/int32/invalid_val_type-32  100000000  589.8 ns/op  584 B/op  10 allocs/op
BenchmarkArgVal/int32/not_found_arg-32     100000000  586.0 ns/op  520 B/op  10 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/events/parse  118.922s

617fe40 chore(events): add BenchmarkArgVal

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^BenchmarkArgVal$
github.com/aquasecurity/tracee/pkg/events/parse -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/events/parse
cpu: AMD Ryzen 9 7950X 16-Core Processor
BenchmarkArgVal/int32/valid_args-32        100000000   14.43 ns/op   0 B/op  0 allocs/op
BenchmarkArgVal/int32/invalid_val_type-32  100000000  551.7 ns/op  584 B/op 10 allocs/op
BenchmarkArgVal/int32/not_found_arg-32     100000000  499.2 ns/op  520 B/op 10 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/events/parse  106.538s

f39eaa1 perf: improve procTreeExitProcessor

Improve procTreeExitProcessor for both Tracee and Controller.

-

Tracee

| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 159.9      | 95.71       | 40.14%          |
| Bytes allocated (B/op)  | 48         | 0           | 100.00%         |
| Allocations per op      | 2          | 0           | 100.00%         |
| Total runtime (s)       | 16.001     | 9.586       | 40.14%          |

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExitProcessor$
github.com/aquasecurity/tracee/pkg/ebpf -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExitProcessor-32  100000000  95.71 ns/op  0 B/op  0 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf  9.586s

---

Controller

| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 335.5      | 115.4       | 65.60%          |
| Bytes allocated (B/op)  | 240        | 0           | 100.00%         |
| Allocations per op      | 4          | 0           | 100.00%         |
| Total runtime (s)       | 33.558     | 11.553      | 65.60%          |

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExitProcessor$
github.com/aquasecurity/tracee/pkg/ebpf/controlplane -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf/controlplane
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExitProcessor-32  100000000  115.4 ns/op  0 B/op  0 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf/controlplane  11.553s

70717e9 chore: add Benchmark_procTreeExitProcessor

For both Tracee and Controller.

-

Tracee

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExitProcessor$
github.com/aquasecurity/tracee/pkg/ebpf -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExitProcessor-32  100000000  159.9 ns/op  48 B/op  2 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf  16.001s

---

Controller

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExitProcessor$
github.com/aquasecurity/tracee/pkg/ebpf/controlplane -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf/controlplane
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExitProcessor-32  100000000 335.5 ns/op  240 B/op  4 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf/controlplane  33.558s

3156921 perf: remove unused ExecFeed interpreter fields

Disable (comment out) ExecFeed interpreter fields not used by the
feeders. This removal was already started by 4a5bb5d0f.

---

Tracee

| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 215.6      | 168.1       | 22.03%          |
| Bytes allocated (B/op)  | 4          | 4           | 0.00%           |
| Allocations per op      | 1          | 1           | 0.00%           |
| Total runtime (s)       | 21.571     | 16.825      | 22.03%          |

-

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExecProcessor$
github.com/aquasecurity/tracee/pkg/ebpf -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExecProcessor-32  100000000  168.1 ns/op  4 B/op 1 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf  16.825s

---

Controller

| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 284.2      | 209.7       | 26.20%          |
| Bytes allocated (B/op)  | 4          | 4           | 0.00%           |
| Allocations per op      | 1          | 1           | 0.00%           |
| Total runtime (s)       | 28.435     | 20.983      | 26.20%          |

-

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExecProcessor$
github.com/aquasecurity/tracee/pkg/ebpf/controlplane -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf/controlplane
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExecProcessor-32  100000000  209.7 ns/op  4 B/op  1 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf/controlplane  20.983s

5c109d6 perf(controlplane): improve procTreeExecProcessor

| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 649.7      | 284.2       | 56.26%          |
| Bytes allocated (B/op)  | 500        | 4           | 99.20%          |
| Allocations per op      | 6          | 1           | 83.33%          |
| Total runtime (s)       | 64.981     | 28.435      | 56.26%          |

---

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExecProcessor$
github.com/aquasecurity/tracee/pkg/ebpf/controlplane -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf/controlplane
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExecProcessor-32  100000000  284.2 ns/op  4 B/op  1 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf/controlplane  28.435s

42e1c5e chore(controlplane): add Benchmark_procTreeExecProcessor

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExecProcessor$
github.com/aquasecurity/tracee/pkg/ebpf/controlplane
-benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf/controlplane
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExecProcessor-32 100000000  649.7 ns/op  500 B/op  6 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf/controlplane  64.981s

255fec0 perf(ebpf): improve procTreeExecProcessor

| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 514.7      | 215.6       | 58.12%          |
| Bytes allocated (B/op)  | 500        | 4           | 99.20%          |
| Allocations per op      | 6          | 1           | 83.33%          |
| Total runtime (s)       | 51.483     | 21.571      | 58.12%          |

---

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExecProcessor$
github.com/aquasecurity/tracee/pkg/ebpf -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExecProcessor-32  100000000  215.6 ns/op  4 B/op  1 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf  21.571s

c9c8723 chore(ebpf): add Benchmark_procTreeExecProcessor

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExecProcessor$
github.com/aquasecurity/tracee/pkg/ebpf -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExecProcessor-32 100000000  514.7 ns/op  500 B/op  6 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf  51.483s

c2223ff perf(controlplane): improve procTreeForkProcessor

| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 618.2      | 274.0       | 55.67%          |
| Bytes allocated (B/op)  | 496        | 0           | 100.00%         |
| Allocations per op      | 5          | 0           | 100.00%         |
| Total runtime (s)       | 61.827     | 27.415      | 55.67%          |

---

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeForkProcessor$
github.com/aquasecurity/tracee/pkg/ebpf/controlplane
-benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf/controlplane
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeForkProcessor-32  100000000  274.0 ns/op  0 B/op  0 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf/controlplane  27.415s

fd9a666 chore(controlplane): add procTreeForkProcessor bench

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeForkProcessor$
github.com/aquasecurity/tracee/pkg/ebpf/controlplane
-benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf/controlplane
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeForkProcessor-32 100000000  618.2 ns/op  496 B/op  5 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf/controlplane  61.827s

2584505 perf(ebpf): improve procTreeForkProcessor

| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 547.4      | 267.5       | 51.14%          |
| Bytes allocated (B/op)  | 496        | 0           | 100.00%         |
| Allocations per op      | 5          | 0           | 100.00%         |
| Total runtime (s)       | 54.757     | 26.763      | 51.13%          |

---

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeForkProcessor$
github.com/aquasecurity/tracee/pkg/ebpf -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeForkProcessor-32 100000000  267.5 ns/op  0 B/op  0 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf  26.763s

7eb6b91 chore(ebpf): add Benchmark_procTreeForkProcessor

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeForkProcessor$
github.com/aquasecurity/tracee/pkg/ebpf -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeForkProcessor-32 100000000  547.4 ns/op  496 B/op  5 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf  54.757s

0a698fa perf: reduce events.Core lock contention

When retrieving the event definition, there is no longer a need to check
beforehand Core.IsDefined(). Validation can now be performed directly
using the NotValid() method on the Definition type returned by
GetEventDefinitionID() and GetEventDefinitionName().

Besides the lock contention reduction, this also gets rid of the window
where the event definition could be changed between the check and the
actual use of the definition.

This also fixes a wrong logger usage in the pipeline.

5f1275c chore(bufferdecoder): set zero from def fields

It's a cosmetic change to make the code more readable.

c83828f chore(bufferdecode): add DecodeArguments benchmark

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^BenchmarkDecodeArguments$
github.com/aquasecurity/tracee/pkg/bufferdecoder -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/bufferdecoder
cpu: AMD Ryzen 9 7950X 16-Core Processor
BenchmarkDecodeArguments-32 100000000  206.3 ns/op  512 B/op  1 alloc/op
PASS
ok  github.com/aquasecurity/tracee/pkg/bufferdecoder  20.646s

2. Explain how to test it

3. Other comments

pkg/ebpf/event_parameters.go Fixed Show resolved Hide resolved
@geyslan geyslan force-pushed the improv-cpu-time branch 4 times, most recently from cb59e3e to cde5a80 Compare January 17, 2025 00:38
@geyslan geyslan changed the title Improve cpu time Improve cpu/memory (proctree wise) Jan 17, 2025
@geyslan geyslan marked this pull request as ready for review January 17, 2025 17:31
pkg/ebpf/events_pipeline.go Outdated Show resolved Hide resolved
@rscampos

This comment was marked as resolved.

@geyslan
Copy link
Member Author

geyslan commented Jan 27, 2025

@rscampos proctree still uses an old format of help (tracee-ebpf). The majority was migrated to tracee man style. For now, I'm leaving it as it's.

It's a cosmetic change to make the code more readable.
When retrieving the event definition, there is no longer a need to check
beforehand Core.IsDefined(). Validation can now be performed directly
using the NotValid() method on the Definition type returned by
GetEventDefinitionID() and GetEventDefinitionName().

Besides the lock contention reduction, this also gets rid of the window
where the event definition could be changed between the check and the
actual use of the definition.

This also fixes a wrong logger usage in the pipeline.
Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeForkProcessor$
github.com/aquasecurity/tracee/pkg/ebpf -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeForkProcessor-32 100000000  547.4 ns/op  496 B/op  5 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf  54.757s
| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 547.4      | 267.5       | 51.14%          |
| Bytes allocated (B/op)  | 496        | 0           | 100.00%         |
| Allocations per op      | 5          | 0           | 100.00%         |
| Total runtime (s)       | 54.757     | 26.763      | 51.13%          |

---

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeForkProcessor$
github.com/aquasecurity/tracee/pkg/ebpf -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeForkProcessor-32 100000000  267.5 ns/op  0 B/op  0 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf  26.763s
Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeForkProcessor$
github.com/aquasecurity/tracee/pkg/ebpf/controlplane
-benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf/controlplane
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeForkProcessor-32 100000000  618.2 ns/op  496 B/op  5 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf/controlplane  61.827s
| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 618.2      | 274.0       | 55.67%          |
| Bytes allocated (B/op)  | 496        | 0           | 100.00%         |
| Allocations per op      | 5          | 0           | 100.00%         |
| Total runtime (s)       | 61.827     | 27.415      | 55.67%          |

---

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeForkProcessor$
github.com/aquasecurity/tracee/pkg/ebpf/controlplane
-benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf/controlplane
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeForkProcessor-32  100000000  274.0 ns/op  0 B/op  0 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf/controlplane  27.415s
Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExecProcessor$
github.com/aquasecurity/tracee/pkg/ebpf -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExecProcessor-32 100000000  514.7 ns/op  500 B/op  6 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf  51.483s
| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 514.7      | 215.6       | 58.12%          |
| Bytes allocated (B/op)  | 500        | 4           | 99.20%          |
| Allocations per op      | 6          | 1           | 83.33%          |
| Total runtime (s)       | 51.483     | 21.571      | 58.12%          |

---

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExecProcessor$
github.com/aquasecurity/tracee/pkg/ebpf -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExecProcessor-32  100000000  215.6 ns/op  4 B/op  1 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf  21.571s
Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExecProcessor$
github.com/aquasecurity/tracee/pkg/ebpf/controlplane
-benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf/controlplane
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExecProcessor-32 100000000  649.7 ns/op  500 B/op  6 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf/controlplane  64.981s
| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 649.7      | 284.2       | 56.26%          |
| Bytes allocated (B/op)  | 500        | 4           | 99.20%          |
| Allocations per op      | 6          | 1           | 83.33%          |
| Total runtime (s)       | 64.981     | 28.435      | 56.26%          |

---

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExecProcessor$
github.com/aquasecurity/tracee/pkg/ebpf/controlplane -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf/controlplane
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExecProcessor-32  100000000  284.2 ns/op  4 B/op  1 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf/controlplane  28.435s
Disable (comment out) ExecFeed interpreter fields not used by the
feeders. This removal was already started by 4a5bb5d.

---

Tracee

| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 215.6      | 168.1       | 22.03%          |
| Bytes allocated (B/op)  | 4          | 4           | 0.00%           |
| Allocations per op      | 1          | 1           | 0.00%           |
| Total runtime (s)       | 21.571     | 16.825      | 22.03%          |

-

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExecProcessor$
github.com/aquasecurity/tracee/pkg/ebpf -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExecProcessor-32  100000000  168.1 ns/op  4 B/op 1 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf  16.825s

---

Controller

| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 284.2      | 209.7       | 26.20%          |
| Bytes allocated (B/op)  | 4          | 4           | 0.00%           |
| Allocations per op      | 1          | 1           | 0.00%           |
| Total runtime (s)       | 28.435     | 20.983      | 26.20%          |

-

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExecProcessor$
github.com/aquasecurity/tracee/pkg/ebpf/controlplane -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf/controlplane
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExecProcessor-32  100000000  209.7 ns/op  4 B/op  1 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf/controlplane  20.983s
For both Tracee and Controller.

-

Tracee

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExitProcessor$
github.com/aquasecurity/tracee/pkg/ebpf -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExitProcessor-32  100000000  159.9 ns/op  48 B/op  2 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf  16.001s

---

Controller

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExitProcessor$
github.com/aquasecurity/tracee/pkg/ebpf/controlplane -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf/controlplane
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExitProcessor-32  100000000 335.5 ns/op  240 B/op  4 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf/controlplane  33.558s
Improve procTreeExitProcessor for both Tracee and Controller.

-

Tracee

| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 159.9      | 95.71       | 40.14%          |
| Bytes allocated (B/op)  | 48         | 0           | 100.00%         |
| Allocations per op      | 2          | 0           | 100.00%         |
| Total runtime (s)       | 16.001     | 9.586       | 40.14%          |

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExitProcessor$
github.com/aquasecurity/tracee/pkg/ebpf -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExitProcessor-32  100000000  95.71 ns/op  0 B/op  0 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf  9.586s

---

Controller

| Metric                  | Old Value  | New Value   | Improvement (%) |
|-------------------------|------------|-------------|-----------------|
| Time per operation (ns) | 335.5      | 115.4       | 65.60%          |
| Bytes allocated (B/op)  | 240        | 0           | 100.00%         |
| Allocations per op      | 4          | 0           | 100.00%         |
| Total runtime (s)       | 33.558     | 11.553      | 65.60%          |

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_procTreeExitProcessor$
github.com/aquasecurity/tracee/pkg/ebpf/controlplane -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/ebpf/controlplane
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_procTreeExitProcessor-32  100000000  115.4 ns/op  0 B/op  0 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/ebpf/controlplane  11.553s
Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^BenchmarkArgVal$
github.com/aquasecurity/tracee/pkg/events/parse -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/events/parse
cpu: AMD Ryzen 9 7950X 16-Core Processor
BenchmarkArgVal/int32/valid_args-32        100000000   14.43 ns/op   0 B/op  0 allocs/op
BenchmarkArgVal/int32/invalid_val_type-32  100000000  551.7 ns/op  584 B/op 10 allocs/op
BenchmarkArgVal/int32/not_found_arg-32     100000000  499.2 ns/op  520 B/op 10 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/events/parse  106.538s
| Sub-Benchmark    | Old (ns/op) | New (ns/op) | Change (%) |
|------------------|-------------|-------------|------------|
| valid_args       | 14.43       | 13.35       | -7.48%     |
| invalid_val_type | 551.7       | 589.8       | +6.90%     |
| not_found_arg    | 499.2       | 586.0       | +17.38%    |

The valid_args is the most relevant case, since it traverses args based
on a specific order. The other cases are not deterministic and used to
measure upcoming changes for the worst case.

---

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^BenchmarkArgVal$
github.com/aquasecurity/tracee/pkg/events/parse -benchtime=100000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/events/parse
cpu: AMD Ryzen 9 7950X 16-Core Processor
BenchmarkArgVal/int32/valid_args-32        100000000   13.35 ns/op   0 B/op   0 allocs/op
BenchmarkArgVal/int32/invalid_val_type-32  100000000  589.8 ns/op  584 B/op  10 allocs/op
BenchmarkArgVal/int32/not_found_arg-32     100000000  586.0 ns/op  520 B/op  10 allocs/op
PASS
ok  github.com/aquasecurity/tracee/pkg/events/parse  118.922s
It helps to reduce the stack dynamic growth and the number of
allocations, which is good for performance.

Changelog fields now holds pointers to the feeds, instead of the feeds
themselves. This way, it aligns with the new feed pointers avoiding
de-referencing.
The unique ExitFeed fields being tackeld by FeedFromExit() are
TaskHash and TimeStamp. Then this commit comments out the other fields
that are not being used by the proctree in this context.
Reuse the same TaskInfo reference avoiding the need to lock to fetch it.

This also reorders the creation of the process and thread.
Mutex is a heavy lock, and it's not necessary to use it in the Thread
concurrency control. This change replaces the mutex with atomic
operations to reduce contention, what also reduces memory footprint.
It helps to reduce the stack dynamic growth and the number of
allocations, which is good for performance.
This also remove some leftovers.
processes: 10928
threads: 21856
@rscampos
Copy link
Collaborator

@geyslan I've doubled-checked the performance. Used a AWS t4g.2xlarge (8 core and 32G RAM). Tested 4 threads with 500_000 ops each. The results are slightly different because I didn't use the same configuration as you.. but overall, there's a noticeable improvement. Congrats for the work.

CPU Peak (%): Main: 33.4%, New: 32.8%, Difference: -0.6%
CPU Avg (%): Main: 30.3%, New: 19.8%, Difference: -34.65%
Malloc Avg Peak: Main: 2,508,459, New: 1,779,362, Difference: -29.05%
Heap Peak (MB): Main: 202 MB, New: 281 MB, Difference: +39.11%
Heap Avg (MB): Main: 132 MB, New: 128 MB, Difference: -3.03%
Heap Obj Avg: Main: 994,136, New: 970,217, Difference: -2.41%

Copy link
Collaborator

@rscampos rscampos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM... congrats! @geyslan

@geyslan
Copy link
Member Author

geyslan commented Jan 28, 2025

/fast-forward

@github-actions github-actions bot merged commit eaf0311 into aquasecurity:main Jan 28, 2025
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reduce cpu/memory of proctree pkg
2 participants