Split out LLVM PGO step and use clang 13 to compile LLVM #89499

Mark-Simulacrum · 2021-10-03T18:16:51Z

We're seeing a PGO version mismatch error in CI logs:

LLVM Profile Error: Runtime and instrumentation version mismatch : expected 5, but get 7

which is likely due to the version bumped here differing from that used by
rustc.

This PR fixes this by splitting out the PGO step for LLVM into a separate phase of the pgo.sh script, which nets no change to performance (see these results). Then, it follows that up with an upgrade to LLVM/clang version 13 as our bootstrap compiler, which yields the performance improvements for this PR -- around 5%. This depends on the first step here, because otherwise we end up somehow clobbering or otherwise hurting our ability to effectively collect performance data, yielding reductions in performance for a subset of benchmarks -- it is not clear what the cause here was precisely, but the split only costs ~10 minutes and seems worthwhile.

rust-highfive · 2021-10-03T18:16:54Z

r? @kennytm

(rust-highfive has picked a reviewer for you, use r? to override)

Mark-Simulacrum · 2021-10-03T18:16:59Z

@bors try @rust-timer queue

rust-timer · 2021-10-03T18:17:00Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-10-03T18:17:06Z

⌛ Trying commit 0e4a8f9fb9123cb3406509e618ac6184e2c3f936 with merge ff6bd962b9c1f4d5fed6f6b8ae63ae13662ce2e8...

bors · 2021-10-03T18:21:56Z

💔 Test failed - checks-actions

Swatinem · 2021-10-03T19:39:08Z

We're seeing a PGO version mismatch error in CI logs

Sorry for the driveby question:

Is it possible the same can manifest as a malformed instrumentation profile data error when trying to use llvm-profdata merge? I have seen that -Z instrument-coverage has been broken for some time in nightly, but didn’t invest the time yet to fully diagnose the cause of that.

Mark-Simulacrum · 2021-10-03T19:54:00Z

Not sure. Coverage instrumentation goes through pretty different paths as far as I know, I don't think it should be related to this PR.

Mark-Simulacrum · 2021-10-04T01:57:14Z

@bors try

bors · 2021-10-04T01:57:21Z

⌛ Trying commit 0e4a8f9fb9123cb3406509e618ac6184e2c3f936 with merge 9d861e7379a238b570cab700aa938a014f881e94...

bors · 2021-10-04T04:03:14Z

☀️ Try build successful - checks-actions
Build commit: 9d861e7379a238b570cab700aa938a014f881e94 (9d861e7379a238b570cab700aa938a014f881e94)

rust-timer · 2021-10-04T04:03:15Z

Queued 9d861e7379a238b570cab700aa938a014f881e94 with parent e737694, future comparison URL.

joshtriplett · 2021-10-04T04:40:46Z

Since we currently cache LLVM in our CI and don't rebuild it every time, would it potentially make sense to just always rebuild our LLVM using itself, so that it always matches?

rust-timer · 2021-10-04T05:39:47Z

Finished benchmarking commit (9d861e7379a238b570cab700aa938a014f881e94): comparison url.

Summary: This change led to very large relevant mixed results 🤷 in compiler performance.

Very large improvement in instruction counts (up to -12.6% on incr-patched: println builds of cargo)
Very large regression in instruction counts (up to 2.8% on full builds of token-stream-stress)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

joshtriplett · 2021-10-04T05:47:51Z

Reviewing the performance results, this looks like a massive improvement.

Mark-Simulacrum · 2021-10-04T11:09:25Z

Since we currently cache LLVM in our CI and don't rebuild it every time, would it potentially make sense to just always rebuild our LLVM using itself, so that it always matches?

We rebuild LLVM twice each CI build right now, because mozilla/sccache#952 hasn't been merged/finished. If the docker cache is invalidated, that adds a third build.

We can adjust the in-docker build to use the same LLVM as in our fork, I suppose, but I'm not sure if that's necessary - we can probably just apply patches like this PR on future LLVM upgrades.

nikic · 2021-10-04T11:37:44Z

Since we currently cache LLVM in our CI and don't rebuild it every time, would it potentially make sense to just always rebuild our LLVM using itself, so that it always matches?

We rebuild LLVM twice each CI build right now, because mozilla/sccache#952 hasn't been merged/finished. If the docker cache is invalidated, that adds a third build.

We can adjust the in-docker build to use the same LLVM as in our fork, I suppose, but I'm not sure if that's necessary - we can probably just apply patches like this PR on future LLVM upgrades.

One thing I'm concerned about is that the current setup will result in spurious PGO related regressions with future LLVM updates, as we typically update our LLVM before we update the clang used to build it.

Mark-Simulacrum · 2021-10-04T11:49:14Z

I'm not sure how to check, but my current theory is that the optimization here is pretty much entirely due to better optimized LLVM, not PGO. The PGO PR for LLVM did produce a speedup on merge despite these "version mismatch errors".

Our setup is such that the LLVM PGO data is configured to go to a separate location from the rustc data, and we use the right llvm-profdata for each to merge it. So I think the warning/error message is probably spurious, perhaps arising from LLVM's profiling runtime not supporting being dlopen'd while already linked in, or something like that...

In other words, at least as far as I can tell, future LLVM upgrades are likely to reintroduce the error message until we bump our c toolchain, but there's no evidence this will cause performance regressions at this time.

nikic · 2021-10-04T12:07:24Z

Right, it's hard to distinguish whether the impact is due to Clang 13 doing a better job of optimizing, or something about the handling of profile data being fixed. What's interesting is that if you filter the perf result by check builds only, this is a universal small regression. I would have expected that check builds wouldn't be affected by optimizations applied to LLVM.

Mark-Simulacrum · 2021-10-04T12:48:51Z

Overlap in profiles from 9dbb26e and its parent commit (last "noop" change I found):

$ llvm-profdata overlap before-rustc-pgo.profdata after-rustc-pgo.profdata
Profile overlap infomation for base_profile: before-rustc-pgo.profdata and test_profile: after-rustc-pgo.profdata
Program level:
  # of functions overlap: 260502
  Edge profile overlap: 99.910%
  Edge profile base count sum: 230867115195
  Edge profile test count sum: 231072749498
  IndirectCall profile overlap: 99.996%
  IndirectCall profile base count sum: 722318628
  IndirectCall profile test count sum: 722335980
  MemOP profile overlap: 99.981%
  MemOP profile base count sum: 488656567
  MemOP profile test count sum: 488639611

$ llvm-profdata overlap before-llvm-pgo.profdata after-llvm-pgo.profdata
Profile overlap infomation for base_profile: before-llvm-pgo.profdata and test_profile: after-llvm-pgo.profdata
Program level:
  # of functions overlap: 96161
  Edge profile overlap: 99.742%
  Edge profile base count sum: 1054888702705
  Edge profile test count sum: 1050720657781
  IndirectCall profile overlap: 99.894%
  IndirectCall profile base count sum: 8906460397
  IndirectCall profile test count sum: 8906009877
  MemOP profile overlap: 99.921%
  MemOP profile base count sum: 5927918788
  MemOP profile test count sum: 5920677320

And for this PR's try commit:

$ llvm-profdata overlap before-rustc-pgo.profdata after-rustc-pgo.profdata
Profile overlap infomation for base_profile: before-rustc-pgo.profdata and test_profile: after-rustc-pgo.profdata
Program level:
  # of functions overlap: 260504
  # of functions only in test_profile: 89905
  Edge profile overlap: 40.433%
  Percentage of Edge profile only in test_profile: 59.566%
  Edge profile base count sum: 230860056427
  Edge profile test count sum: 571155717471
  IndirectCall profile overlap: 18.746%
  Percentage of IndirectCall profile only in test_profile: 81.254%
  IndirectCall profile base count sum: 722242054
  IndirectCall profile test count sum: 3853524645
  MemOP profile overlap: 21.263%
  Percentage of MemOP profile only in test_profile: 78.737%
  MemOP profile base count sum: 488641949
  MemOP profile test count sum: 2298031353

$ llvm-profdata overlap before-llvm-pgo.profdata after-llvm-pgo.profdata
Profile overlap infomation for base_profile: before-llvm-pgo.profdata and test_profile: after-llvm-pgo.profdata
Program level:
  # of functions overlap: 84177
  # of functions mismatch: 6928
  # of functions only in test_profile: 2696
  Edge profile overlap: 46.840%
  Mismatched count percentage (Edge): 52.323%
  Percentage of Edge profile only in test_profile: 0.685%
  Edge profile base count sum: 1047797180915
  Edge profile test count sum: 1153062758787
  IndirectCall profile overlap: 40.414%
  Mismatched count percentage (IndirectCall): 59.122%
  Percentage of IndirectCall profile only in test_profile: 0.310%
  IndirectCall profile base count sum: 8901901446
  IndirectCall profile test count sum: 8877805375
  MemOP profile overlap: 84.344%
  Mismatched count percentage (MemOP): 15.542%
  Percentage of MemOP profile only in test_profile: 0.001%
  MemOP profile base count sum: 5916707088
  MemOP profile test count sum: 5921229195

So it does look like the version mismatch presumably had a not insignificant effect, though the results are a bit weird. Presumably this relates to the check builds as well.

Maybe we're not successfully putting LLVM results in one file and rustc in another (e.g., race which one loads first or something)? Worth figuring it out, probably... though not sure how to investigate that.

Mark-Simulacrum · 2021-10-14T19:25:57Z

Okay, so that try build fixes the "LLVM Profile error" message by doing LLVM PGO collection and rustc PGO collection in separate phases of the build. This just adds ~10 minutes to our build so is pretty cheap -- but it demonstrates that fixing the error doesn't seem to have any major impact. (FWIW, that also only collects information of the LLVM compilation from libcore, and based on these results that seems to be sufficient or better).

I've now readded the LLVM 13 bootstrap (i.e., building with clang v13 for all C/C++ compilation we do). Presumably this has less chance of somehow creating interference between the two profiling runtimes now, so maybe we'll see less weird noise in the rustc results.

@bors try @rust-timer queue

rust-timer · 2021-10-14T19:25:58Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-10-14T19:26:09Z

⌛ Trying commit 86608f1 with merge db2a4db59506e78eb0110e7b4ebeaff1ecbc497c...

bors · 2021-10-14T21:36:28Z

☀️ Try build successful - checks-actions
Build commit: db2a4db59506e78eb0110e7b4ebeaff1ecbc497c (db2a4db59506e78eb0110e7b4ebeaff1ecbc497c)

rust-timer · 2021-10-14T21:36:30Z

Queued db2a4db59506e78eb0110e7b4ebeaff1ecbc497c with parent e1e9319, future comparison URL.

rust-timer · 2021-10-15T00:24:28Z

Finished benchmarking commit (db2a4db59506e78eb0110e7b4ebeaff1ecbc497c): comparison url.

Summary: This change led to very large relevant improvements 🎉 in compiler performance.

Very large improvement in instruction counts (up to -13.5% on incr-patched: println builds of cargo)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Mark-Simulacrum · 2021-10-15T00:34:50Z

r? @nikic

OK, I think the new implementation here seems to avoid any performance hit and so seems reasonable to merge. I suspect that having both instrumentations in the same binary, particularly with different versions, was causing us some sort of subtle problem that ended up hurting PGO collection -- the new performance delta is strictly due to an improved baseline C++ compiler I'm pretty sure which better optimizes the LLVM we ship, as the first commit already resolves any potential mixup with version information.

I think that should resolve the concern around future LLVM upgrades not benefiting from PGO (and so having a "false" regression) -- we continue to not rely (and in fact seem to rely less) on the C++ profdata version matching up with rustc's.

So this PR seems like a pretty clear cut win to me.

nikic · 2021-10-17T14:18:55Z

@bors r+

bors · 2021-10-17T14:18:57Z

📌 Commit 86608f1 has been approved by nikic

bors · 2021-10-17T22:29:34Z

⌛ Testing commit 86608f1 with merge 5e02151...

bors · 2021-10-18T01:38:05Z

☀️ Test successful - checks-actions
Approved by: nikic
Pushing 5e02151 to master...

rust-timer · 2021-10-18T03:09:53Z

Finished benchmarking commit (5e02151): comparison url.

Summary: This change led to very large relevant improvements 🎉 in compiler performance.

Very large improvement in instruction counts (up to -13.6% on incr-patched: println builds of cargo)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

rust-highfive assigned kennytm Oct 3, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 3, 2021

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 3, 2021

This comment has been minimized.

Sign in to view

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 3, 2021

rustbot added perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Oct 4, 2021

rustbot removed S-waiting-on-perf Status: Waiting on a perf run to be completed. perf-regression Performance regression. labels Oct 14, 2021

Mark-Simulacrum added 2 commits October 14, 2021 15:21

Move LLVM profiling to a separate phase of compilation

f70232f

Switch to clang v13 as the C/C++ compiler used for bootstrap

86608f1

Mark-Simulacrum force-pushed the with-llvm-13 branch from e5f3895 to 86608f1 Compare October 14, 2021 19:21

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 14, 2021

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 15, 2021

Mark-Simulacrum changed the title ~~Use LLVM/clang v13 to build LLVM~~ Split out LLVM PGO step and use clang 13 to compile LLVM Oct 15, 2021

Mark-Simulacrum removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Oct 15, 2021

rust-highfive assigned nikic and unassigned kennytm Oct 15, 2021

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 17, 2021

bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 18, 2021

bors merged commit 5e02151 into rust-lang:master Oct 18, 2021

rustbot added this to the 1.58.0 milestone Oct 18, 2021

Mark-Simulacrum deleted the with-llvm-13 branch October 18, 2021 01:50

Split out LLVM PGO step and use clang 13 to compile LLVM #89499

Split out LLVM PGO step and use clang 13 to compile LLVM #89499

Uh oh!

Conversation

Mark-Simulacrum commented Oct 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Oct 3, 2021

Uh oh!

Mark-Simulacrum commented Oct 3, 2021

Uh oh!

rust-timer commented Oct 3, 2021

Uh oh!

bors commented Oct 3, 2021

Uh oh!

This comment has been minimized.

bors commented Oct 3, 2021

Uh oh!

Swatinem commented Oct 3, 2021

Uh oh!

Mark-Simulacrum commented Oct 3, 2021

Uh oh!

Mark-Simulacrum commented Oct 4, 2021

Uh oh!

bors commented Oct 4, 2021

Uh oh!

bors commented Oct 4, 2021

Uh oh!

rust-timer commented Oct 4, 2021

Uh oh!

joshtriplett commented Oct 4, 2021

Uh oh!

rust-timer commented Oct 4, 2021

Uh oh!

joshtriplett commented Oct 4, 2021

Uh oh!

Mark-Simulacrum commented Oct 4, 2021

Uh oh!

nikic commented Oct 4, 2021

Uh oh!

Mark-Simulacrum commented Oct 4, 2021

Uh oh!

nikic commented Oct 4, 2021

Uh oh!

Mark-Simulacrum commented Oct 4, 2021

Uh oh!

Mark-Simulacrum commented Oct 14, 2021

Uh oh!

rust-timer commented Oct 14, 2021

Uh oh!

bors commented Oct 14, 2021

Uh oh!

bors commented Oct 14, 2021

Uh oh!

rust-timer commented Oct 14, 2021

Uh oh!

rust-timer commented Oct 15, 2021

Uh oh!

Mark-Simulacrum commented Oct 15, 2021

Uh oh!

nikic commented Oct 17, 2021

Uh oh!

bors commented Oct 17, 2021

Uh oh!

bors commented Oct 17, 2021

Uh oh!

bors commented Oct 18, 2021

Uh oh!

rust-timer commented Oct 18, 2021

Uh oh!

Uh oh!

Mark-Simulacrum commented Oct 3, 2021 •

edited

Loading