Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT #588

Open
zamazan4ik opened this issue Oct 26, 2023 · 0 comments
Open

Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT #588

zamazan4ik opened this issue Oct 26, 2023 · 0 comments

Comments

@zamazan4ik
Copy link

Hi!

Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. Since PGO helps with achieving better performance in many projects I think trying to optimize difftastic with PGO can be a good idea.

I already did some benchmarks and want to share my results.

Test environment

  • Fedora 38
  • Linux kernel 6.5.6
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.59
  • Difftastic version: the latest for now from the master branch on commit 21ed3ec48b383511b08ffe20cc91697af8f64d78
  • Disabled Turbo boost

Benchmark

For benchmark purposes, I use difft difftastic/sample_files/dir_before/ difftastic/sample_files/dir_after/ as a usual way for using difftastic in practice. For the training PGO phase, I use completely the same command. The release version is built with cargo pgo --release, and PGO (instrumentation and optimization phases) are done with cargo-pgo.

Results

I got the following results:

hyperfine --warmup 10 --min-runs 50 './difft_release ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null' './difft_optimized ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null'
Benchmark 1: ./difft_release ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null
  Time (mean ± σ):     384.2 ms ±   5.2 ms    [User: 288.5 ms, System: 126.8 ms]
  Range (min … max):   373.6 ms … 396.9 ms    50 runs

Benchmark 2: ./difft_optimized ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null
  Time (mean ± σ):     354.7 ms ±   4.1 ms    [User: 257.7 ms, System: 127.3 ms]
  Range (min … max):   347.0 ms … 362.7 ms    50 runs

Summary
  ./difft_optimized ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null ran
    1.08 ± 0.02 times faster than ./difft_release ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null

where difft_release - Release binary, difft_optimized - Release + PGO binary.

Regarding binary sizes:

  • Release: 68 Mib
  • Release + PGO: 68 Mib
  • Instrumented: 75 Mib

At least in the scenario above, PGO helps with optimizing performance.

Further steps

I can suggest the following action points:

  • Perform more PGO benchmarks on difftastic. If it shows improvements - add a note about possible improvements in difftastic's performance with PGO.
  • Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize difftastic according to their own workloads.
  • Optimize pre-built binaries

Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Here are some examples of how PGO optimization is integrated in other projects:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant