You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I read your post about the project and was interested in it with improving its performance a bit more. I already evaluated Profile-Guided Optimization (PGO) on many projects - all the results are available at https://github.com/zamazan4ik/awesome-pgo . Since this compiler optimization works well in many places, especially different parsers, I decided to apply it to the project - here are my benchmark results.
Test environment
Fedora 40
Linux kernel 6.8.10
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.80.0 (nightly)
rustwire version: the latest for now from the master branch on commit 0a6e7afce2e05d7bc4108c675b3f14f0f2640e45
Disabled Turbo boost
Benchmark
For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use cargo-pgo tool. Release bench result I got with taskset -c 0 cargo +nightly bench command. The PGO training phase is done with taskset -c 0 cargo +nightly pgo bench, PGO optimization phase - with taskset -c 0 cargo +nightly pgo optimize bench. taskset -c 0 is used for reducing the OS scheduler influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).
Results
I got the following results:
Release:
Running benches/read_11_bytes.rs (target/release/deps/read_11_bytes-8057156ca5325836)
running 2 tests
test bench_prost_extraction ... bench: 36 ns/iter (+/- 1)
test bench_rustwire_extraction ... bench: 27 ns/iter (+/- 0)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out; finished in 0.70s
Running benches/read_4k_bytes.rs (target/release/deps/read_4k_bytes-8390424a56aada34)
running 2 tests
test bench_prost_extraction ... bench: 241 ns/iter (+/- 3)
test bench_rustwire_extraction ... bench: 40 ns/iter (+/- 0)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out; finished in 1.18s
Running benches/read_75_bytes.rs (target/release/deps/read_75_bytes-b03d8a0a7f5dc940)
running 2 tests
test bench_prost_extraction ... bench: 165 ns/iter (+/- 1)
test bench_rustwire_extraction ... bench: 108 ns/iter (+/- 0)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out; finished in 0.98s
Running benches/write_replace_nested.rs (target/release/deps/write_replace_nested-f37ae198a15ed235)
running 2 tests
test bench_prost_replace ... bench: 291 ns/iter (+/- 5)
test bench_rustwire_replace ... bench: 27 ns/iter (+/- 0)
PGO optimized compared to Release:
Running benches/read_11_bytes.rs (target/x86_64-unknown-linux-gnu/release/deps/read_11_bytes-b64f3e24fc22b28d)
running 2 tests
test bench_prost_extraction ... bench: 24 ns/iter (+/- 0)
test bench_rustwire_extraction ... bench: 19 ns/iter (+/- 0)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out; finished in 0.97s
Running benches/read_4k_bytes.rs (target/x86_64-unknown-linux-gnu/release/deps/read_4k_bytes-8121f31f389b50e9)
running 2 tests
test bench_prost_extraction ... bench: 230 ns/iter (+/- 5)
test bench_rustwire_extraction ... bench: 17 ns/iter (+/- 0)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out; finished in 2.72s
Running benches/read_75_bytes.rs (target/x86_64-unknown-linux-gnu/release/deps/read_75_bytes-4dbb9d5cd7a3f5d7)
running 2 tests
test bench_prost_extraction ... bench: 128 ns/iter (+/- 1)
test bench_rustwire_extraction ... bench: 32 ns/iter (+/- 0)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out; finished in 0.54s
Running benches/write_replace_nested.rs (target/x86_64-unknown-linux-gnu/release/deps/write_replace_nested-79d84e7870a75b14)
running 2 tests
test bench_prost_replace ... bench: 247 ns/iter (+/- 5)
test bench_rustwire_replace ... bench: 12 ns/iter (+/- 1)
(just for reference) PGO instrumentation compared to Release:
Running benches/read_11_bytes.rs (target/x86_64-unknown-linux-gnu/release/deps/read_11_bytes-b64f3e24fc22b28d)
running 2 tests
test bench_prost_extraction ... bench: 46 ns/iter (+/- 0)
test bench_rustwire_extraction ... bench: 39 ns/iter (+/- 0)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out; finished in 2.27s
Running benches/read_4k_bytes.rs (target/x86_64-unknown-linux-gnu/release/deps/read_4k_bytes-8121f31f389b50e9)
running 2 tests
test bench_prost_extraction ... bench: 311 ns/iter (+/- 2)
test bench_rustwire_extraction ... bench: 55 ns/iter (+/- 0)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out; finished in 0.47s
Running benches/read_75_bytes.rs (target/x86_64-unknown-linux-gnu/release/deps/read_75_bytes-4dbb9d5cd7a3f5d7)
running 2 tests
test bench_prost_extraction ... bench: 284 ns/iter (+/- 14)
test bench_rustwire_extraction ... bench: 174 ns/iter (+/- 0)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out; finished in 1.26s
Running benches/write_replace_nested.rs (target/x86_64-unknown-linux-gnu/release/deps/write_replace_nested-79d84e7870a75b14)
running 2 tests
test bench_prost_replace ... bench: 409 ns/iter (+/- 8)
test bench_rustwire_replace ... bench: 41 ns/iter (+/- 0)
According to the results, PGO measurably improves the library's performance in many cases.
Further steps
I can suggest the following action points:
Perform more PGO benchmarks with other datasets (if you are interested enough in it). If it shows improvements - add a note to the documentation (the README file, I guess) about possible improvements in the library's performance with PGO.
Probably, you can try to get some insights about how the code can be optimized further based on the changes that the compiler performed with PGO. It can be done via analyzing flamegraphs before and after applying PGO to understand the difference.
Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO). However, I recommend starting from the usual PGO since it's a much more stable technology with much fewer limitations.
I would be happy to answer your questions about PGO.
P.S. Please do not treat the issue like a bug or something like that - it's just a benchmark report. Since the "Discussions" functionality is disabled in this repo, I created the Issue instead.
The text was updated successfully, but these errors were encountered:
Hi!
I read your post about the project and was interested in it with improving its performance a bit more. I already evaluated Profile-Guided Optimization (PGO) on many projects - all the results are available at https://github.com/zamazan4ik/awesome-pgo . Since this compiler optimization works well in many places, especially different parsers, I decided to apply it to the project - here are my benchmark results.
Test environment
rustwire
version: the latest for now from themaster
branch on commit0a6e7afce2e05d7bc4108c675b3f14f0f2640e45
Benchmark
For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use cargo-pgo tool. Release bench result I got with
taskset -c 0 cargo +nightly bench
command. The PGO training phase is done withtaskset -c 0 cargo +nightly pgo bench
, PGO optimization phase - withtaskset -c 0 cargo +nightly pgo optimize bench
.taskset -c 0
is used for reducing the OS scheduler influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).Results
I got the following results:
Release:
PGO optimized compared to Release:
(just for reference) PGO instrumentation compared to Release:
According to the results, PGO measurably improves the library's performance in many cases.
Further steps
I can suggest the following action points:
Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO). However, I recommend starting from the usual PGO since it's a much more stable technology with much fewer limitations.
I would be happy to answer your questions about PGO.
P.S. Please do not treat the issue like a bug or something like that - it's just a benchmark report. Since the "Discussions" functionality is disabled in this repo, I created the Issue instead.
The text was updated successfully, but these errors were encountered: