You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects - the results are available here. So that's why I think it's worth trying to apply PGO to sd. I already performed some benchmarks and want to share my results here.
Test environment
Fedora 38
Linux kernel 6.5.5
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.73
sd version: the latest for now from the master branch on commit efb8198a4268cb6c74468e42ec3446cc1cd5b92c
Benchmark setup
As a test file, I use this large enough JSON file. sd is tested with this command line: sd -p "(\w+)" "\$1\$1" dump.json > /dev/null. I took these arguments from the issue #52 . For PGO profile collection the same arguments and test file were used.
All benchmarks are done multiple times, on the same hardware/software setup, with the same background "noise" (as much I can guarantee ofc).
Results
I got the following results:
hyperfine --warmup 10 --min-runs 100 'sd_release -p "(\w+)" "\$1\$1" dump.json > /dev/null' 'sd_pgo_optimized -p "(\w+)" "\$1\$1" dump.json > /dev/null'
Benchmark 1: sd_release -p "(\w+)" "\$1\$1" dump.json > /dev/null
Time (mean ± σ): 916.7 ms ± 21.3 ms [User: 881.4 ms, System: 33.1 ms]
Range (min … max): 875.5 ms … 1032.8 ms 100 runs
Benchmark 2: sd_pgo_optimized -p "(\w+)" "\$1\$1" dump.json > /dev/null
Time (mean ± σ): 745.3 ms ± 9.4 ms [User: 710.3 ms, System: 33.1 ms]
Range (min … max): 713.1 ms … 782.3 ms 100 runs
Summary
sd_pgo_optimized -p "(\w+)" "\$1\$1" dump.json > /dev/null ran
1.23 ± 0.03 times faster than sd_release -p "(\w+)" "\$1\$1" dump.json > /dev/null
Just for reference, sd in the Instrumentation mode (during the PGO profile collection) has the following results (in time format):
time sd_pgo_instrumented -p "(\w+)" "\$1\$1" dump.json > /dev/null
sd_pgo_instrumented -p "(\w+)" "\$1\$1" dump.json 1,49s user 0,04s system 99% cpu 1,534 total
At least according to the simple benchmark above, PGO has a measurable positive effect on sd performance.
Further steps
I can suggest the following things to do:
Evaluate PGO's applicability to sd in more scenarios.
If PGO helps to achieve better performance - add a note to sd's documentation about that (probably somewhere in the README file). In this case, users and maintainers will be aware of another optimization opportunity for sd.
Provide PGO integration into the build scripts. It can help users and maintainers easily apply PGO for their own workloads.
Optimize prebuilt binaries with PGO.
Here are some examples of how PGO is already integrated into other projects' build scripts:
Hi!
Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects - the results are available here. So that's why I think it's worth trying to apply PGO to
sd
. I already performed some benchmarks and want to share my results here.Test environment
sd
version: the latest for now from themaster
branch on commitefb8198a4268cb6c74468e42ec3446cc1cd5b92c
Benchmark setup
As a test file, I use this large enough JSON file.
sd
is tested with this command line:sd -p "(\w+)" "\$1\$1" dump.json > /dev/null
. I took these arguments from the issue #52 . For PGO profile collection the same arguments and test file were used.PGO optimization is done with cargo-pgo.
All benchmarks are done multiple times, on the same hardware/software setup, with the same background "noise" (as much I can guarantee ofc).
Results
I got the following results:
Just for reference,
sd
in the Instrumentation mode (during the PGO profile collection) has the following results (intime
format):At least according to the simple benchmark above, PGO has a measurable positive effect on
sd
performance.Further steps
I can suggest the following things to do:
sd
in more scenarios.Here are some examples of how PGO is already integrated into other projects' build scripts:
configure
scriptAfter PGO, I can suggest evaluating LLVM BOLT as an additional optimization step after PGO.
The text was updated successfully, but these errors were encountered: