Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out LLVM PGO step and use clang 13 to compile LLVM #89499

Merged
merged 2 commits into from
Oct 18, 2021

Conversation

Mark-Simulacrum
Copy link
Member

@Mark-Simulacrum Mark-Simulacrum commented Oct 3, 2021

We're seeing a PGO version mismatch error in CI logs:

LLVM Profile Error: Runtime and instrumentation version mismatch : expected 5, but get 7

which is likely due to the version bumped here differing from that used by
rustc.

This PR fixes this by splitting out the PGO step for LLVM into a separate phase of the pgo.sh script, which nets no change to performance (see these results). Then, it follows that up with an upgrade to LLVM/clang version 13 as our bootstrap compiler, which yields the performance improvements for this PR -- around 5%. This depends on the first step here, because otherwise we end up somehow clobbering or otherwise hurting our ability to effectively collect performance data, yielding reductions in performance for a subset of benchmarks -- it is not clear what the cause here was precisely, but the split only costs ~10 minutes and seems worthwhile.

@rust-highfive
Copy link
Collaborator

r? @kennytm

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 3, 2021
@Mark-Simulacrum
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 3, 2021
@bors
Copy link
Contributor

bors commented Oct 3, 2021

⌛ Trying commit 0e4a8f9fb9123cb3406509e618ac6184e2c3f936 with merge ff6bd962b9c1f4d5fed6f6b8ae63ae13662ce2e8...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Oct 3, 2021

💔 Test failed - checks-actions

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 3, 2021
@Swatinem
Copy link
Contributor

Swatinem commented Oct 3, 2021

We're seeing a PGO version mismatch error in CI logs

Sorry for the driveby question:

Is it possible the same can manifest as a malformed instrumentation profile data error when trying to use llvm-profdata merge? I have seen that -Z instrument-coverage has been broken for some time in nightly, but didn’t invest the time yet to fully diagnose the cause of that.

@Mark-Simulacrum
Copy link
Member Author

Not sure. Coverage instrumentation goes through pretty different paths as far as I know, I don't think it should be related to this PR.

@Mark-Simulacrum
Copy link
Member Author

@bors try

@bors
Copy link
Contributor

bors commented Oct 4, 2021

⌛ Trying commit 0e4a8f9fb9123cb3406509e618ac6184e2c3f936 with merge 9d861e7379a238b570cab700aa938a014f881e94...

@bors
Copy link
Contributor

bors commented Oct 4, 2021

☀️ Try build successful - checks-actions
Build commit: 9d861e7379a238b570cab700aa938a014f881e94 (9d861e7379a238b570cab700aa938a014f881e94)

@rust-timer
Copy link
Collaborator

Queued 9d861e7379a238b570cab700aa938a014f881e94 with parent e737694, future comparison URL.

@joshtriplett
Copy link
Member

Since we currently cache LLVM in our CI and don't rebuild it every time, would it potentially make sense to just always rebuild our LLVM using itself, so that it always matches?

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (9d861e7379a238b570cab700aa938a014f881e94): comparison url.

Summary: This change led to very large relevant mixed results 🤷 in compiler performance.

  • Very large improvement in instruction counts (up to -12.6% on incr-patched: println builds of cargo)
  • Very large regression in instruction counts (up to 2.8% on full builds of token-stream-stress)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

@rustbot rustbot added perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Oct 4, 2021
@joshtriplett
Copy link
Member

Reviewing the performance results, this looks like a massive improvement.

@Mark-Simulacrum
Copy link
Member Author

Since we currently cache LLVM in our CI and don't rebuild it every time, would it potentially make sense to just always rebuild our LLVM using itself, so that it always matches?

We rebuild LLVM twice each CI build right now, because mozilla/sccache#952 hasn't been merged/finished. If the docker cache is invalidated, that adds a third build.

We can adjust the in-docker build to use the same LLVM as in our fork, I suppose, but I'm not sure if that's necessary - we can probably just apply patches like this PR on future LLVM upgrades.

@nikic
Copy link
Contributor

nikic commented Oct 4, 2021

Since we currently cache LLVM in our CI and don't rebuild it every time, would it potentially make sense to just always rebuild our LLVM using itself, so that it always matches?

We rebuild LLVM twice each CI build right now, because mozilla/sccache#952 hasn't been merged/finished. If the docker cache is invalidated, that adds a third build.

We can adjust the in-docker build to use the same LLVM as in our fork, I suppose, but I'm not sure if that's necessary - we can probably just apply patches like this PR on future LLVM upgrades.

One thing I'm concerned about is that the current setup will result in spurious PGO related regressions with future LLVM updates, as we typically update our LLVM before we update the clang used to build it.

@Mark-Simulacrum
Copy link
Member Author

I'm not sure how to check, but my current theory is that the optimization here is pretty much entirely due to better optimized LLVM, not PGO. The PGO PR for LLVM did produce a speedup on merge despite these "version mismatch errors".

Our setup is such that the LLVM PGO data is configured to go to a separate location from the rustc data, and we use the right llvm-profdata for each to merge it. So I think the warning/error message is probably spurious, perhaps arising from LLVM's profiling runtime not supporting being dlopen'd while already linked in, or something like that...

In other words, at least as far as I can tell, future LLVM upgrades are likely to reintroduce the error message until we bump our c toolchain, but there's no evidence this will cause performance regressions at this time.

@nikic
Copy link
Contributor

nikic commented Oct 4, 2021

Right, it's hard to distinguish whether the impact is due to Clang 13 doing a better job of optimizing, or something about the handling of profile data being fixed. What's interesting is that if you filter the perf result by check builds only, this is a universal small regression. I would have expected that check builds wouldn't be affected by optimizations applied to LLVM.

@Mark-Simulacrum
Copy link
Member Author

Overlap in profiles from 9dbb26e and its parent commit (last "noop" change I found):

$ llvm-profdata overlap before-rustc-pgo.profdata after-rustc-pgo.profdata
Profile overlap infomation for base_profile: before-rustc-pgo.profdata and test_profile: after-rustc-pgo.profdata
Program level:
  # of functions overlap: 260502
  Edge profile overlap: 99.910%
  Edge profile base count sum: 230867115195
  Edge profile test count sum: 231072749498
  IndirectCall profile overlap: 99.996%
  IndirectCall profile base count sum: 722318628
  IndirectCall profile test count sum: 722335980
  MemOP profile overlap: 99.981%
  MemOP profile base count sum: 488656567
  MemOP profile test count sum: 488639611

$ llvm-profdata overlap before-llvm-pgo.profdata after-llvm-pgo.profdata
Profile overlap infomation for base_profile: before-llvm-pgo.profdata and test_profile: after-llvm-pgo.profdata
Program level:
  # of functions overlap: 96161
  Edge profile overlap: 99.742%
  Edge profile base count sum: 1054888702705
  Edge profile test count sum: 1050720657781
  IndirectCall profile overlap: 99.894%
  IndirectCall profile base count sum: 8906460397
  IndirectCall profile test count sum: 8906009877
  MemOP profile overlap: 99.921%
  MemOP profile base count sum: 5927918788
  MemOP profile test count sum: 5920677320

And for this PR's try commit:

$ llvm-profdata overlap before-rustc-pgo.profdata after-rustc-pgo.profdata
Profile overlap infomation for base_profile: before-rustc-pgo.profdata and test_profile: after-rustc-pgo.profdata
Program level:
  # of functions overlap: 260504
  # of functions only in test_profile: 89905
  Edge profile overlap: 40.433%
  Percentage of Edge profile only in test_profile: 59.566%
  Edge profile base count sum: 230860056427
  Edge profile test count sum: 571155717471
  IndirectCall profile overlap: 18.746%
  Percentage of IndirectCall profile only in test_profile: 81.254%
  IndirectCall profile base count sum: 722242054
  IndirectCall profile test count sum: 3853524645
  MemOP profile overlap: 21.263%
  Percentage of MemOP profile only in test_profile: 78.737%
  MemOP profile base count sum: 488641949
  MemOP profile test count sum: 2298031353

$ llvm-profdata overlap before-llvm-pgo.profdata after-llvm-pgo.profdata
Profile overlap infomation for base_profile: before-llvm-pgo.profdata and test_profile: after-llvm-pgo.profdata
Program level:
  # of functions overlap: 84177
  # of functions mismatch: 6928
  # of functions only in test_profile: 2696
  Edge profile overlap: 46.840%
  Mismatched count percentage (Edge): 52.323%
  Percentage of Edge profile only in test_profile: 0.685%
  Edge profile base count sum: 1047797180915
  Edge profile test count sum: 1153062758787
  IndirectCall profile overlap: 40.414%
  Mismatched count percentage (IndirectCall): 59.122%
  Percentage of IndirectCall profile only in test_profile: 0.310%
  IndirectCall profile base count sum: 8901901446
  IndirectCall profile test count sum: 8877805375
  MemOP profile overlap: 84.344%
  Mismatched count percentage (MemOP): 15.542%
  Percentage of MemOP profile only in test_profile: 0.001%
  MemOP profile base count sum: 5916707088
  MemOP profile test count sum: 5921229195

So it does look like the version mismatch presumably had a not insignificant effect, though the results are a bit weird. Presumably this relates to the check builds as well.

Maybe we're not successfully putting LLVM results in one file and rustc in another (e.g., race which one loads first or something)? Worth figuring it out, probably... though not sure how to investigate that.

@rustbot rustbot removed S-waiting-on-perf Status: Waiting on a perf run to be completed. perf-regression Performance regression. labels Oct 14, 2021
@Mark-Simulacrum
Copy link
Member Author

Okay, so that try build fixes the "LLVM Profile error" message by doing LLVM PGO collection and rustc PGO collection in separate phases of the build. This just adds ~10 minutes to our build so is pretty cheap -- but it demonstrates that fixing the error doesn't seem to have any major impact. (FWIW, that also only collects information of the LLVM compilation from libcore, and based on these results that seems to be sufficient or better).

I've now readded the LLVM 13 bootstrap (i.e., building with clang v13 for all C/C++ compilation we do). Presumably this has less chance of somehow creating interference between the two profiling runtimes now, so maybe we'll see less weird noise in the rustc results.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 14, 2021
@bors
Copy link
Contributor

bors commented Oct 14, 2021

⌛ Trying commit 86608f1 with merge db2a4db59506e78eb0110e7b4ebeaff1ecbc497c...

@bors
Copy link
Contributor

bors commented Oct 14, 2021

☀️ Try build successful - checks-actions
Build commit: db2a4db59506e78eb0110e7b4ebeaff1ecbc497c (db2a4db59506e78eb0110e7b4ebeaff1ecbc497c)

@rust-timer
Copy link
Collaborator

Queued db2a4db59506e78eb0110e7b4ebeaff1ecbc497c with parent e1e9319, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (db2a4db59506e78eb0110e7b4ebeaff1ecbc497c): comparison url.

Summary: This change led to very large relevant improvements 🎉 in compiler performance.

  • Very large improvement in instruction counts (up to -13.5% on incr-patched: println builds of cargo)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 15, 2021
@Mark-Simulacrum Mark-Simulacrum changed the title Use LLVM/clang v13 to build LLVM Split out LLVM PGO step and use clang 13 to compile LLVM Oct 15, 2021
@Mark-Simulacrum Mark-Simulacrum removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Oct 15, 2021
@Mark-Simulacrum
Copy link
Member Author

r? @nikic

OK, I think the new implementation here seems to avoid any performance hit and so seems reasonable to merge. I suspect that having both instrumentations in the same binary, particularly with different versions, was causing us some sort of subtle problem that ended up hurting PGO collection -- the new performance delta is strictly due to an improved baseline C++ compiler I'm pretty sure which better optimizes the LLVM we ship, as the first commit already resolves any potential mixup with version information.

I think that should resolve the concern around future LLVM upgrades not benefiting from PGO (and so having a "false" regression) -- we continue to not rely (and in fact seem to rely less) on the C++ profdata version matching up with rustc's.

So this PR seems like a pretty clear cut win to me.

@rust-highfive rust-highfive assigned nikic and unassigned kennytm Oct 15, 2021
@nikic
Copy link
Contributor

nikic commented Oct 17, 2021

@bors r+

@bors
Copy link
Contributor

bors commented Oct 17, 2021

📌 Commit 86608f1 has been approved by nikic

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 17, 2021
@bors
Copy link
Contributor

bors commented Oct 17, 2021

⌛ Testing commit 86608f1 with merge 5e02151...

@bors
Copy link
Contributor

bors commented Oct 18, 2021

☀️ Test successful - checks-actions
Approved by: nikic
Pushing 5e02151 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 18, 2021
@bors bors merged commit 5e02151 into rust-lang:master Oct 18, 2021
@rustbot rustbot added this to the 1.58.0 milestone Oct 18, 2021
@Mark-Simulacrum Mark-Simulacrum deleted the with-llvm-13 branch October 18, 2021 01:50
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (5e02151): comparison url.

Summary: This change led to very large relevant improvements 🎉 in compiler performance.

  • Very large improvement in instruction counts (up to -13.6% on incr-patched: println builds of cargo)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.