-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement basic support for PGO in rustbuild for rustc #80033
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
91ed9ea
to
8ee85b4
Compare
So I guess we know that this empty file actively hurts - but at least PGO is having an effect :) @michaelwoerister - can you share what workload you used? I pushed an update to use libcore, we'll see if that works better @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
33765d1
to
b293693
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Finished benchmarking try commit (9b7eb5a3e3872fa2b9ab043432288b75aa917f4b): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
849ad1c
to
1a6402e
Compare
@bors try |
⌛ Trying commit 1a6402e23a4f3a513af6f9a695bb9393b5f590e2 with merge 4e4f30e50d7c8ee770c0773a1a742cf5e08a7faa... |
☀️ Try build successful - checks-actions |
Queued 4e4f30e50d7c8ee770c0773a1a742cf5e08a7faa with parent caeb333, future comparison URL. @rustbot label: +S-waiting-on-perf |
Finished benchmarking try commit (4e4f30e50d7c8ee770c0773a1a742cf5e08a7faa): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Apart from a ~30% regression in ctfe-stress-4-doc there are only a few <2% regressions and overall a huge improvement of up 23% for task-clock. Instruction count does show some regressins, but that is expected as PGO optimized for cycle count rather than instruction count. |
1a6402e
to
cdb8746
Compare
The job Click to see the possible cause of the failure (guessed by this bot)
|
cdb8746
to
c3ed30c
Compare
The job Click to see the possible cause of the failure (guessed by this bot)
|
This implements support for applying PGO to the rustc compilation step (not standard library or any tooling, including rustdoc). Expanding PGO to more tools is not terribly difficult but will involve more work and greater CI time commitment. For the same reason of avoiding greater time commitment, this currently avoids implementing for platforms outside of x86_64-unknown-linux-gnu, though in practice it should be quite simple to extend over time to more platforms. The initial implementation is intentionally minimal here to avoid too much work investment before we start seeing wins for a subset of Rust users. The choice of workloads to profile here is somewhat arbitrary, but the general rationale was to aim for a small set that largely avoided time regressions on perf.rust-lang.org's full suite of crates. The set chosen is libcore, cargo (and its dependencies), and a few ad-hoc stress tests from perf.rlo. The stress tests are arguably the most controversial, but they benefit those cases (avoiding regressions) and do not really remove wins from other benchmarks. The primary next step after this PR lands is to implement support for PGO in LLVM. It is unclear whether we can afford a full LLVM rebuild in CI, though, so the approach taken there may need to be more staggered. rustc-only PGO seems well affordable on linux at least, giving us up to 20% wall time wins on some crates for 15 minutes of extra CI time (1 hour up from 45 minutes).
c3ed30c
to
88f3a6e
Compare
@bors try @rust-timer queue to just re-check that we're still seeing similar behavior but I am expecting to close this PR after that and open one with intent to merge (rather than just experiment). I think getting these wins out the door to users quickly makes sense and given that we can pull off the direct PGO in one CI builder for rustc at least I think we should, it doesn't make sense to do the more complicated scheme of using the previous build's artifacts if we can avoid it. |
Awaiting bors try build completion. |
⌛ Trying commit 88f3a6e with merge 2ce5173d842eb52a0df4069b1cbb74e212510626... |
☀️ Try build successful - checks-actions |
Queued 2ce5173d842eb52a0df4069b1cbb74e212510626 with parent 50a9097, future comparison URL. @rustbot label: +S-waiting-on-perf |
Finished benchmarking try commit (2ce5173d842eb52a0df4069b1cbb74e212510626): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
This implements support for applying PGO to the rustc compilation step (not
standard library or any tooling, including rustdoc). Expanding PGO to more tools
is not terribly difficult but will involve more work and greater CI time
commitment.
For the same reason of avoiding greater time commitment, this currently avoids
implementing for platforms outside of x86_64-unknown-linux-gnu, though in
practice it should be quite simple to extend over time to more platforms. The
initial implementation is intentionally minimal here to avoid too much work
investment before we start seeing wins for a subset of Rust users.
The choice of workloads to profile here is somewhat arbitrary, but the general
rationale was to aim for a small set that largely avoided time regressions on
perf.rust-lang.org's full suite of crates. The set chosen is libcore, cargo (and
its dependencies), and a few ad-hoc stress tests from perf.rlo. The stress tests
are arguably the most controversial, but they benefit those cases (avoiding
regressions) and do not really remove wins from other benchmarks.
The primary next step after this PR lands is to implement support for PGO in
LLVM. It is unclear whether we can afford a full LLVM rebuild in CI, though, so
the approach taken there may need to be more staggered. rustc-only PGO seems
well affordable on linux at least, giving us up to 20% wall time wins on some
crates for 15 minutes of extra CI time (1 hour up from 45 minutes).