Evaluate Profile-Guided Optimization (PGO) usage #1

zamazan4ik · 2024-05-23T10:17:12Z

Hi!

Since the README file mentions a lot of performance-oriented things, I decided to test one compiler optimization - Profile-Guided Optimization (PGO) on genson-rs. I already tested it on various projects with positive results (you can find all benchmarks here: https://github.com/zamazan4ik/awesome-pgo), so here are the benchmark results for genson-rs.

Test environment

Fedora 39
Linux kernel 6.8.7
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.78
The project version: the latest for now from the main branch on commit 67afe6d3ad8d10affb65b251694ca7b52b978769
Disabled Turbo boost

Benchmark

For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use cargo-pgo tool. Release bench result I got with the taskset -c 0 cargo bench command. The PGO training phase is done with taskset -c 0 cargo pgo bench, PGO optimization phase - with taskset -c 0 cargo pgo optimize bench.

All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee). taskset -c 0 is used for reducing OS scheduler "noise".

Results

I got the following results:

Release: https://gist.github.com/zamazan4ik/f2f108acf6c8f232816b1bccd9a0bb26
PGO optimized compared to Release: https://gist.github.com/zamazan4ik/01699361bfbcfa96fc505d091084a4f2
(just for reference) PGO instrumented compared to Release: https://gist.github.com/zamazan4ik/1b98f888fc596b4fc682ffb4754a0a8d

According to the results, PGO measurably improves the tool's performance at least in the benchmark above.

Further steps

I can suggest the following action points:

Perform more PGO benchmarks with other test files. If it shows improvements - add a note to the documentation (README file?) about possible improvements in the tool's performance with PGO.
Optimize prebuilt binaries with PGO (if any). As a training set, you can try to gather multiple real-life files, train PGO on them, and deliver pre-PGO-optimized binaries to the users.
Consider enabling Link-Time Optimization (LTO) for the tool. It can help with optimizing performance and reducing the binary size.

Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

I would be happy to answer your questions about PGO.

P.S. I created the Issue since Discussions are disabled for the repo. Since it's not the issue but an improvement idea, probably Discussions is a better place to discuss such things.

The text was updated successfully, but these errors were encountered:

junyu-w · 2024-05-24T00:01:14Z

This is great idea! I will test it out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate Profile-Guided Optimization (PGO) usage #1

Evaluate Profile-Guided Optimization (PGO) usage #1

zamazan4ik commented May 23, 2024

junyu-w commented May 24, 2024

Evaluate Profile-Guided Optimization (PGO) usage #1

Evaluate Profile-Guided Optimization (PGO) usage #1

Comments

zamazan4ik commented May 23, 2024

Test environment

Benchmark

Results

Further steps

junyu-w commented May 24, 2024