-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PGO optimization results for LLVM-based projects #63486
Comments
@llvm/issue-subscribers-clangd |
I think similar projects like Clang Tidy and Clang Static Analyzer also should have a performance boost from PGO. |
On the same hardware/software setup I got the following results for optimizing Clang-Tidy (version - current The results are the following:
|
FWIW, I don't think this is a regarding clangd, I believe the indexing is not the most latency-sensitive workflow we have. It's done in the background, on the idle cores and despite being quite useful for a good experience, having a full index of the codebase as soon as possible is not detrimental to clangd functionality. the rather latency-sensitive workflows actually involve interactive interactions like code completion/signature help and main file ast builds (similar to indexing, but with a preamble). so i think having some profiling based on those interactions would be better, but I guess any profiles are fine as long as they don't clearly regress performance of latency-sensitive interactions i mentioned above. |
Right now I am talking about improving PGO optimization for other LLVM projects to the same level as Clang has now (CMake-specific scripts in the LLVM repo). I am not talking about the builds in different Linux distributions - that's a different talk for every distro since they need to choose the balance between "performance improvements" and "maintainability costs".
Anyway, it improves a lot experience with checkouting pretty large codebases on the local machines and switching between fast-evolving branches (in this case reindexing for multiple files is required, so even in this case the UX will be improved as well).
I didn't perform such benchmarks but Jetbrains did (warning - on Windows): link. |
Another LLVM project - LLD. As a test project, I chose ClickHouse. It has a large binary (2.3 Gib in Release mode, unstripped). So as a benchmark I link ClickHouse in Release mode with ThinLTO. For this test case, I have the following results (
All other flags are the same. Hardware is the same as above. PGO mode - |
One more - The results are shown in
All other flags are the same. Hardware is the same as above. PGO mode: |
Another LLVM project - LLDB. As a benchmark, I used
For Release LLDB I got the result in LLDB was built locally with If you have a better idea for a measurable workload for LLDB - feel free to share it with me. |
@kadircet according to my benchmarks, can we update the page https://llvm.org/docs/HowToBuildWithPGO.html with results about PGO benefits on other LLVM parts, not just Clang and compile time? I think it's a good thing to know for the users as well. Another suggestion - put information about LLVM BOLT effects on LLVM projects as well (e.g. based on this benchmark). Why I am asking about updating the documentation? Because for the users it's much easier to find a source of the info, instead of searching over the GitHub issues. |
Would be interesting to see numbers for AutoFDO. |
Agree! But right now I have no hardware with LBR/BRS support to test it (AutoFDO needs it). According to multiple readings, AutoFDO results should be almost the same as with instrumentation. However, according to the Google papers, AutoFDO is a little bit less efficient than instrumentation from the performed optimizations perspective.
I would argue with that statement. Feasible or not - it totally depends on the use case. The case when we run instrumented clangd once on some workload, collect profiles and then PGO-optimize clangd is totally fine if we update clangd rarely (so we do not need to run instrumented clangd often to update the profiles). There are multiple mitigation strategies to reduce instrumentation drawbacks in production like training only on a small but representative-enough workload subset. One more point against AutoFDO - the quality of the AutoFDO converter itself. You can find multiple issues in the AutoFDO upstream like google/autofdo#179 or google/autofdo#162 (and others). It also could bring some problems. If we are talking about "PGO at scale" (as it's used in Google), there is another problem - lack of tooling. In Google paper almost all tooling around their PGO approach is closed-sourced (like profiles collectors, storage, etc.) and there is no open-source alternative yet. I also believe that the AutoFDO approach is a more practical way to apply PGO in production but right now it has many limitations that should be carefully considered. That's why I am saying that instrumented PGO is still a completely fine way for doing PGO in practice. |
Hi!
LLVM right now supports PGO only for Clang. I want to share PGO results for other LLVM-based projects.
Here I want to share my results with applying PGO on
clangd
(version - currentmain
branch). According to my local tests on AMD 5900x/48 Gib RAM/Fedora 38/Clang 16 (for buildingclangd
),clangd
in Release mode without PGO finishes indexingllvm-project
sources in ~10 minutes (9m55s), andclangd
in Release mode with PGO (fprofile-instr-generate
/-fprofile-instr-use
) finishes indexingllvm-project
in ~8 minutes (7m50s - 7m55s). Tests were performed multiple times, withclangd
cache reset between runs, on the latestmain
branch. Compilation options:CC=clang CXX=clang++ cmake -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/zamazan4ik/open_source/install_clangd -DLLVM_INCLUDE_BENCHMARKS=1 -DCMAKE_POSITION_INDEPENDENT_CODE=1 -DLLVM_USE_LINKER=lld -G "Ninja" ../llvm-project/llvm
(PGO mode just additionally sets-fprofile-instr-generate/-fprofile-instr-use
options).I think the results are quite good to add PGO support to
clangd
to the repository (as it's already done withclang
itself). Another article about applying PGO onclangd
you can find here: JetBrains blog.Additionally want to note that Clangd in Instrumentation mode works toooo slow, so I just waited for indexing ~200 files from the LLVM repo. I think that's enough (and the benchmark confirms it).
The text was updated successfully, but these errors were encountered: