-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utilize PGO for rustc linux dist builds #80262
Conversation
Nice! I'll review asap. |
cargo.rustflag(&format!("-Cprofile-use={}", path)); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll probably also need the following LLVM flags (in both the -generate
and the -use
case) so that we don't end up with absolute file system paths in profile data:
The PGO parts look good to me (with the comment about absolute paths adressed). Since I don't know too much about the dist/infrastructure part and this affecting one of the most critical dist builds, maybe someone else can give the final go-ahead. Implementing the LLVM part is blocked on sccache supporting cached |
657f5ff
to
3f1b83e
Compare
Since we added the LLVM arg to strip things want to confirm that didn't accidentally hurt our performance: @bors try @rust-timer queue |
Awaiting bors try build completion. |
⌛ Trying commit 3f1b83ed51fd6a8dd08d5aa13f57f76afb70c029 with merge 0c9f42674071faf0a9d86cf01ca48f294eb0797c... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a review for the CI bits.
☀️ Try build successful - checks-actions |
Queued 0c9f42674071faf0a9d86cf01ca48f294eb0797c with parent 75e1acb, future comparison URL. @rustbot label: +S-waiting-on-perf |
3f1b83e
to
67242e2
Compare
r? @pietroalbini - I think I resolved all of your comments |
Looks great! The only thing I'd change is moving the PGO script from r=me after that change |
The job Click to see the possible cause of the failure (guessed by this bot)
|
This implements support for applying PGO to the rustc compilation step (not standard library or any tooling, including rustdoc). Expanding PGO to more tools is not terribly difficult but will involve more work and greater CI time commitment. For the same reason of avoiding greater time commitment, this currently avoids implementing for platforms outside of x86_64-unknown-linux-gnu, though in practice it should be quite simple to extend over time to more platforms. The initial implementation is intentionally minimal here to avoid too much work investment before we start seeing wins for a subset of Rust users. The choice of workloads to profile here is somewhat arbitrary, but the general rationale was to aim for a small set that largely avoided time regressions on perf.rust-lang.org's full suite of crates. The set chosen is libcore, cargo (and its dependencies), and a few ad-hoc stress tests from perf.rlo. The stress tests are arguably the most controversial, but they benefit those cases (avoiding regressions) and do not really remove wins from other benchmarks. The primary next step after this PR lands is to implement support for PGO in LLVM. It is unclear whether we can afford a full LLVM rebuild in CI, though, so the approach taken there may need to be more staggered. rustc-only PGO seems well affordable on linux at least, giving us up to 20% wall time wins on some crates for 15 minutes of extra CI time (1 hour up from 45 minutes). The PGO data is uploaded to allow others to reuse it if attempting to reproduce the CI build or potentially, in the future, on other platforms where an off-by-one strategy is used for dist builds at minimal performance cost.
67242e2
to
a448f88
Compare
Finished benchmarking try commit (0c9f42674071faf0a9d86cf01ca48f294eb0797c): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
@bors r=pietroalbini |
📌 Commit a448f88 has been approved by |
⌛ Testing commit a448f88 with merge f91eb622f1f1eec6370afd0a6a7a67a0fef8d735... |
The job Click to see the possible cause of the failure (guessed by this bot)
|
💔 Test failed - checks-actions |
@bors retry network failure |
☀️ Test successful - checks-actions |
This implements support for applying PGO to the rustc compilation step (not
standard library or any tooling, including rustdoc). Expanding PGO to more tools
is not terribly difficult but will involve more work and greater CI time
commitment.
For the same reason of avoiding greater implementation time commitment,
implementing for platforms outside of x86_64-unknown-linux-gnu is skipped.
In practice it should be quite simple to extend over time to more platforms. The
initial implementation is intentionally minimal here to avoid too much work
investment before we start seeing wins for a subset of Rust users.
The choice of workloads to profile here is somewhat arbitrary, but the general
rationale was to aim for a small set that largely avoided time regressions on
perf.rust-lang.org's full suite of crates. The set chosen is libcore, cargo (and
its dependencies), and a few ad-hoc stress tests from perf.rlo. The stress tests
are arguably the most controversial, but they benefit those cases (avoiding
regressions) and do not really remove wins from other benchmarks.
The primary next step after this PR lands is to implement support for PGO in
LLVM. It is unclear whether we can afford a full LLVM rebuild in CI, though, so
the approach taken there may need to be more staggered. rustc-only PGO seems
well affordable on linux at least, giving us up to 20% wall time wins on some
crates for 15 minutes of extra CI time (1 hour with this PR, up from 45 minutes).
The PGO data is uploaded to allow others to reuse it if attempting to reproduce
the CI build or potentially, in the future, on other platforms where an
off-by-one strategy is used for dist builds at minimal performance cost.
r? @michaelwoerister (but tell me if you don't want to / don't feel comfortable approving and we can find others)