Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update logic around PGO staging #101744

Closed
wants to merge 1 commit into from

Conversation

chriswailes
Copy link
Contributor

PGO instrumentation occurs fairly late in the code generation pipeline after many transformations and optimizations are applied. This means that profiles for the same source code can differ if the binary that generated them was compiled with a different compiler or options.

When bootstrapping the Rust compiler this is relevant because it means that profiles that are generated by the Stage 1 compiler may have entries with checksums that don't match the functions in the Stage 2 compiler. This isn't actively harmful, but it may prevent some functions in later stage compilers from receiving profile-guided optimizations.

This commit changes the build system to instrument the Stage 2 and 3 compilers (with one exception discussed below) and to use the provided profile to optimize the compiler at every stage.

Not instrumenting the Stage 1 compiler has two advantages:

  1. It decreases the size of the eventual profile (as there currently isn't a way to provide a separate profile path name to each stage)
  2. It speeds up the bootstrap process; instrumented binaries are slower and if we aren't going to use the profile from the Stage 1 compiler we might as well not generate it

Because the LLVM libraries are compiled only once if they are PGO instrumented then the compiler must be PGO instrumented at every stage, including Stage 1.

CI definitions will need to be modified to use the Stage 2 compiler when generating PGO profiles.

@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @jyn514 (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 13, 2022
@rustbot rustbot added the T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) label Sep 13, 2022
Copy link
Member

@jyn514 jyn514 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall your description makes sense to me, except this bit:

Because the LLVM libraries are compiled only once if they are PGO instrumented then the compiler must be PGO instrumented at every stage, including Stage 1.

Why does the compiler need to be instrumented if LLVM is instrumented? Isn't instrumenting stage 2 enough?

CI definitions will need to be modified to use the Stage 2 compiler when generating PGO profiles.

Can you please do that as part of this pr? I think src/ci/pgo.sh is the place to start looking.

@@ -678,13 +678,9 @@ impl Step for Rustc {
false
}
} else if let Some(path) = &builder.config.rust_profile_use {
if compiler.stage == 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this condition removed? Shouldn't it be the same as above, !link_shared() || stage >= 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a profile is provided there is no harm in using it. If there are checksum missmatches or missing functions then we have the same result as not using one, but if there are signature matches then the Stage1 compiler will be faster and the total build should complete sooner.

@jyn514 jyn514 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 14, 2022
@Mark-Simulacrum
Copy link
Member

Please make sure we run perf to ensure equivalency of results before merging.

@jyn514 jyn514 added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 14, 2022
@chriswailes
Copy link
Contributor Author

Sorry for taking so long with my response but I wanted to do some additional investigations to see if I could find an alternative way of addressing the issue below.

Overall your description makes sense to me, except this bit:

Because the LLVM libraries are compiled only once if they are PGO instrumented then the compiler must be PGO instrumented at every stage, including Stage 1.

Why does the compiler need to be instrumented if LLVM is instrumented? Isn't instrumenting stage 2 enough?

If the LLVM library is instrumented it needs to link against support libraries for functions related to writing the profile data. When LLVM is linked dynamically that isn't a problem as the profile related symbols can be resolved dynamically as well. However, when LLVM is instrumented and linked statically those symbols need to be resolved at link time.

I have been unable to find a way to get an instrumented libLLVM.a to link against an un-instrumented build of rustc. With the only other alternative being to build a second, uninstrumented, copy of LLVM to link against an un-instrumented Stage1 compiler, building Stage1 with instrumentation as well in this particular case seems like the better choice.

Can you please do that as part of this pr? I think src/ci/pgo.sh is the place to start looking.

Will do.

@chriswailes
Copy link
Contributor Author

Regarding my previous comment, I now realize that several of my points rely on details of how we use the build system and not the build system itself. The bootstrap system doesn't provide functionality for PGO instrumenting the LLVM build and we achieve this by passing LLVM the appropriate flags, so it's entirely possible for others to build an instrumented Rust toolchain with static linkage against a non-instrumented LLVM library.

I think the change is still worthwhile though because with it developers building the toolchain without a PGO instrumented LLVM will get slightly slower builds while without the change those building with PGO instrumented LLVM will have to make a similar change to the bootstrapy system. Providing the profile arguments via RUSTFLAGS won't work as that would use the same profile arguments for all Rust objects, not just the compiler.

@chriswailes
Copy link
Contributor Author

Can you please do that as part of this pr? I think src/ci/pgo.sh is the place to start looking.

I looked at src/ci/pgo.sh and it looks like it's already building the Stage2 compiler. This makes me wonder if the CI is accidentally only collecting the LLVM profiles and not Rust profiles.

It looks like no changes need to be made?

@chriswailes
Copy link
Contributor Author

Please make sure we run perf to ensure equivalency of results before merging.

What's the easiest way to invoke the PGO pipeline locally?

@jyn514
Copy link
Member

jyn514 commented Sep 26, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@bors
Copy link
Collaborator

bors commented Sep 26, 2022

⌛ Trying commit 2071506c1d60d52e72539f183b819f8554023c20 with merge 591c425928b66f3e73ef8a840c109792df09441a...

@bors
Copy link
Collaborator

bors commented Sep 27, 2022

💔 Test failed - checks-actions

@jyn514
Copy link
Member

jyn514 commented Sep 27, 2022

+ cd /checkout/obj
+ RUSTC_PROFILE_MERGED_FILE=/tmp/tmp-pgo/rustc-pgo.profdata
+ /checkout/obj/build/x86_64-unknown-linux-gnu/llvm/bin/llvm-profdata merge -o /tmp/tmp-pgo/rustc-pgo.profdata /tmp/tmp-pgo/rustc-pgo
error: /tmp/tmp-pgo/rustc-pgo: No such file or directory

@Kobzol
Copy link
Contributor

Kobzol commented Sep 27, 2022

@chriswailes You can run

./src/ci/docker/run.sh dist-x86_64-linux

locally to run the dist CI pipeline locally.

On CI, we gather both LLVM and Rust PGO profiles. It's done separately in two runs to avoid the profiling runtimes to clash. I want to revisit this decision soon to see if it's still needed. Removing this two stage approach would shorten dist/perf CI times.

@rust-log-analyzer

This comment has been minimized.

@Kobzol
Copy link
Contributor

Kobzol commented Sep 27, 2022

I wonder what exactly is this PR trying to achieve? 🤔 Currently we only instrument stage 2 compilers with PGO.
The condition

if compiler.stage == 1

means that we only instrument stage 2, since the compiler here is the compiling compiler, not the compiler being compiled.

@JohnCSimon
Copy link
Member

ping from triage:
@chriswailes

Can you please post your status on this PR?

FYI: when a PR is ready for review, send a message containing
@rustbot ready to switch to S-waiting-on-review so the PR is in the reviewer's backlog.

@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 16, 2022
@rustbot
Copy link
Collaborator

rustbot commented Nov 16, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

Some changes occurred in src/tools/cargo

cc @ehuss

@chriswailes
Copy link
Contributor Author

Sorry for the delay (I got distracted by other projects) and thanks for the additional context on the compiler object. I've updated the CL to remove that bit as well as add additional config.toml support for PGO configuration options.

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@bors
Copy link
Collaborator

bors commented Nov 17, 2022

⌛ Trying commit 257ec3b with merge fa338c9ddc5996f71f5ee1cfd9b6bbc5d5318bf4...

Copy link
Member

@jyn514 jyn514 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How have you tested this change?

if compiler.stage == 1 {
cargo.rustflag(&format!("-Cprofile-use={}", path));
cargo.rustflag("-Cllvm-args=-pgo-warn-missing-function");
} else if let Some(path) = &builder.config.llvm_profile_generate {
Copy link
Member

@jyn514 jyn514 Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are rust_profile_generate and llvm_profile_generate mutually exclusive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding this function is only responsible for building Rust crates, not the LLVM libraries. If llvm_profile_generate is set, the code that handles building LLVM will take care of that and then this code will compile the crates appropriately.

This code now makes the original check to see if Rust instrumentation was requested. If not, it will still instrument the Rust crates if the LLVM libraries are instrumented and linkage is set to static, as a way of ensuring that the necessary runtime libraries are present.

@bors
Copy link
Collaborator

bors commented Nov 17, 2022

💔 Test failed - checks-actions

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 17, 2022
@rust-log-analyzer
Copy link
Collaborator

The job dist-x86_64-linux failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
+ python3 /checkout/x.py build --target=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu --stage 2 library/std --llvm-profile-generate=/tmp/tmp-pgo/llvm-pgo/prof-%p
Building rustbuild
    Finished dev [unoptimized] target(s) in 0.07s

Option 'llvm-profile-generate' does not take an argument
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1`,
  left: `1`,
 right: `2`: x.py should be run with `--stage 2` on CI, but was run with `--stage 1`', config.rs:1381:21
Build completed unsuccessfully in 0:00:00

@chriswailes
Copy link
Contributor Author

After some experimentation I've confirmed that when compiling both LLVM and the Rust toolchain with PGO instrumentation and static LLVM linkage the build will fail if two different profile paths are used. Do people have a preference on how this is handled by the bootstrap system? Should an error be produced if the provided profile paths differ? Should the compiler silently (or with a warning) use Rust's profile path?

@Mark-Simulacrum
Copy link
Member

Do we have some idea why it fails? Is this a bug that should be filed with LLVM, for example?

@chriswailes
Copy link
Contributor Author

I don't think it's a bug. I believe it's due to conflicting entries in the two different profiles. I'll reproduce the exact error and post it here.

@Mark-Simulacrum
Copy link
Member

I guess what it sounds like to me is that when we share a common profile directory, it's likely that we're just clobbering one or both of the pgo data collections which "fixes" the problem but gives us worse performance (less benefit).

I recall that we landed a PR a while back which stopped sharing a profile directory intentionally and it was a significant (several percent) win in terms of performance.

I wonder if it makes sense to "allow" static linkage during pgo collection - presumably we might be able to only do it for the final artifacts produced?

@chriswailes
Copy link
Contributor Author

Looking at the profile here I'm seeing both LLVM and Rust symbols, so I don't think the profiles are clobbering each other. Our entire PGO pipeline is defined here if you'd like to check. It's possible we're making other errors.

@jyn514
Copy link
Member

jyn514 commented Dec 8, 2022

r? @Mark-Simulacrum

@rustbot rustbot assigned Mark-Simulacrum and unassigned jyn514 Dec 8, 2022
@bors
Copy link
Collaborator

bors commented Dec 10, 2022

☔ The latest upstream changes (presumably #105525) made this pull request unmergeable. Please resolve the merge conflicts.

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 10, 2022
@Mark-Simulacrum
Copy link
Member

Hm, well, I can't say that just having both types of symbols is fully convincing that nothing is getting clobbered. But, I think our CI doesn't use static linkage for LLVM anywhere we do PGO, so it probably is fine to error on this case with a comment that it didn't work but may not be impossible to fix (or something to that effect).

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Dec 10, 2022
@chriswailes
Copy link
Contributor Author

I'd rather it not error in this configuration as it is one that we currently build and test on occasion. Would a warning be OK?

@Mark-Simulacrum
Copy link
Member

Hm, well, let's include an error on the specific scenario:

when compiling both LLVM and the Rust toolchain with PGO instrumentation and static LLVM linkage the build will fail if two different profile paths are used

But I'm okay leaving the maybe working scenario as a warning. (Even if it's likely to be useless in practice, because no one reads a full log of pgo output...)

@Dylan-DPC
Copy link
Member

@chriswailes any updates on this?

@Dylan-DPC
Copy link
Member

Closing this as inactive. Feel free to reöpen this pr or create a new pr if you get the time to work on this. Thanks

@Dylan-DPC Dylan-DPC closed this May 16, 2023
@Dylan-DPC Dylan-DPC added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.