Evaluate using Link-Time Optimization (LTO) and Profile-Guided Optimization (PGO) for the project #147
Replies: 2 comments 1 reply
-
just write up a PR with all of that implemented, i'd be happy to merge it if the only thing it affects is the output size. also if PGO only saves up 0.1 MB, and requires more than a line in |
Beta Was this translation helpful? Give feedback.
-
@zamazan4ik Thanks for your research and contribution in this area! I think that LTO can be added right away since it doesn't require any additional work. We should focus more on feature-completeness and stability if the optimisations do not improve Amber's compile time by that much. Great work nonetheless! 🙌 |
Beta Was this translation helpful? Give feedback.
-
Hi!
I checked various compiler optimizations (like Profile-Guided Optimization (PGO)) on many projects (including compilers) - all the results are available at https://github.com/zamazan4ik/awesome-pgo . Since such optimizations help with optimizing such projects, I decided to perform some LTO and PGO tests with the Amber compiler. Below are the results.
Test environment
Amber
version: the latest for now from themaster
branch on commit0a780488caa8f980a736bc1d1593728a121123d7
Benchmark
I decided to perform PGO benchmarks on a simple scenario -
amber input.ab output.sh
command. For PGO optimization I use the cargo-pgo tool. Release build is done withcargo build --release
, PGO instrumented withcargo pgo build
, PGO optimized -cargo pgo optimize build
. The training workload is the same for PGO and PLO -amber input.sh output.sh
, where theinput.sh
script is this one.taskset -c 0
is used to reduce the OS scheduler's influence on the results during the benchmarks. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).LTO is enabled by adding the following lines to the
profile.release
section in theCargo.toml
root file:Results
Here are the results:
where:
amber_release
- Release buildamber_release_lto
- Release + LTO buildamber_lto_optimized
- Release + LTO + PGO optimized buildAt least in the very simple test above, we don't see performance improvements from enabling LTO. The improvement from enabling PGO is consistent across multiple tests (not only this one) but isn't huge.
Just for reference, the slowdown during the PGO training phase:
where:
amber_release
- Release buildamber_lto_instrumented
- Release + LTO + PGO instrumented buildFor reference, the binary sizes:
Further steps
I can suggest the following action points:
I would be happy to answer your questions about PGO and PLO.
For now, I don't think that there is a huge rush to integrate PGO into the current Amber build. Later, when more features are integrated into the project, and the compiler performance becomes a more critical thing to consider (compared to other tasks) - maybe it will be worth it. For now, I recommend at least enable LTO in the build scripts at least for the Release builds.
Beta Was this translation helpful? Give feedback.
All reactions