-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split out LLVM PGO step and use clang 13 to compile LLVM #89499
Conversation
r? @kennytm (rust-highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 0e4a8f9fb9123cb3406509e618ac6184e2c3f936 with merge ff6bd962b9c1f4d5fed6f6b8ae63ae13662ce2e8... |
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
Sorry for the driveby question: Is it possible the same can manifest as a |
Not sure. Coverage instrumentation goes through pretty different paths as far as I know, I don't think it should be related to this PR. |
@bors try |
⌛ Trying commit 0e4a8f9fb9123cb3406509e618ac6184e2c3f936 with merge 9d861e7379a238b570cab700aa938a014f881e94... |
☀️ Try build successful - checks-actions |
Queued 9d861e7379a238b570cab700aa938a014f881e94 with parent e737694, future comparison URL. |
Since we currently cache LLVM in our CI and don't rebuild it every time, would it potentially make sense to just always rebuild our LLVM using itself, so that it always matches? |
Finished benchmarking commit (9d861e7379a238b570cab700aa938a014f881e94): comparison url. Summary: This change led to very large relevant mixed results 🤷 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never |
Reviewing the performance results, this looks like a massive improvement. |
We rebuild LLVM twice each CI build right now, because mozilla/sccache#952 hasn't been merged/finished. If the docker cache is invalidated, that adds a third build. We can adjust the in-docker build to use the same LLVM as in our fork, I suppose, but I'm not sure if that's necessary - we can probably just apply patches like this PR on future LLVM upgrades. |
One thing I'm concerned about is that the current setup will result in spurious PGO related regressions with future LLVM updates, as we typically update our LLVM before we update the clang used to build it. |
I'm not sure how to check, but my current theory is that the optimization here is pretty much entirely due to better optimized LLVM, not PGO. The PGO PR for LLVM did produce a speedup on merge despite these "version mismatch errors". Our setup is such that the LLVM PGO data is configured to go to a separate location from the rustc data, and we use the right llvm-profdata for each to merge it. So I think the warning/error message is probably spurious, perhaps arising from LLVM's profiling runtime not supporting being dlopen'd while already linked in, or something like that... In other words, at least as far as I can tell, future LLVM upgrades are likely to reintroduce the error message until we bump our c toolchain, but there's no evidence this will cause performance regressions at this time. |
Right, it's hard to distinguish whether the impact is due to Clang 13 doing a better job of optimizing, or something about the handling of profile data being fixed. What's interesting is that if you filter the perf result by check builds only, this is a universal small regression. I would have expected that check builds wouldn't be affected by optimizations applied to LLVM. |
Overlap in profiles from 9dbb26e and its parent commit (last "noop" change I found):
And for this PR's try commit:
So it does look like the version mismatch presumably had a not insignificant effect, though the results are a bit weird. Presumably this relates to the check builds as well. Maybe we're not successfully putting LLVM results in one file and rustc in another (e.g., race which one loads first or something)? Worth figuring it out, probably... though not sure how to investigate that. |
e5f3895
to
86608f1
Compare
Okay, so that try build fixes the "LLVM Profile error" message by doing LLVM PGO collection and rustc PGO collection in separate phases of the build. This just adds ~10 minutes to our build so is pretty cheap -- but it demonstrates that fixing the error doesn't seem to have any major impact. (FWIW, that also only collects information of the LLVM compilation from libcore, and based on these results that seems to be sufficient or better). I've now readded the LLVM 13 bootstrap (i.e., building with clang v13 for all C/C++ compilation we do). Presumably this has less chance of somehow creating interference between the two profiling runtimes now, so maybe we'll see less weird noise in the rustc results. @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 86608f1 with merge db2a4db59506e78eb0110e7b4ebeaff1ecbc497c... |
☀️ Try build successful - checks-actions |
Queued db2a4db59506e78eb0110e7b4ebeaff1ecbc497c with parent e1e9319, future comparison URL. |
Finished benchmarking commit (db2a4db59506e78eb0110e7b4ebeaff1ecbc497c): comparison url. Summary: This change led to very large relevant improvements 🎉 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. @bors rollup=never |
r? @nikic OK, I think the new implementation here seems to avoid any performance hit and so seems reasonable to merge. I suspect that having both instrumentations in the same binary, particularly with different versions, was causing us some sort of subtle problem that ended up hurting PGO collection -- the new performance delta is strictly due to an improved baseline C++ compiler I'm pretty sure which better optimizes the LLVM we ship, as the first commit already resolves any potential mixup with version information. I think that should resolve the concern around future LLVM upgrades not benefiting from PGO (and so having a "false" regression) -- we continue to not rely (and in fact seem to rely less) on the C++ profdata version matching up with rustc's. So this PR seems like a pretty clear cut win to me. |
@bors r+ |
📌 Commit 86608f1 has been approved by |
☀️ Test successful - checks-actions |
Finished benchmarking commit (5e02151): comparison url. Summary: This change led to very large relevant improvements 🎉 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression |
We're seeing a PGO version mismatch error in CI logs:
which is likely due to the version bumped here differing from that used by
rustc.
This PR fixes this by splitting out the PGO step for LLVM into a separate phase of the pgo.sh script, which nets no change to performance (see these results). Then, it follows that up with an upgrade to LLVM/clang version 13 as our bootstrap compiler, which yields the performance improvements for this PR -- around 5%. This depends on the first step here, because otherwise we end up somehow clobbering or otherwise hurting our ability to effectively collect performance data, yielding reductions in performance for a subset of benchmarks -- it is not clear what the cause here was precisely, but the split only costs ~10 minutes and seems worthwhile.