-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option to tune compiler crates' CGUs to bootstrap #107560
Conversation
(rustbot has picked a reviewer for you, use r? to override) |
I think this warrants a perf build @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit b3fd7ad with merge 30a86919c3e1ec444a496583deaf735e856805ad... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (30a86919c3e1ec444a496583deaf735e856805ad): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
There's 3 crates with large regressions. I wonder if this is due to The rustc bootstrap times look good though. |
Some of the regressions there look like derives (used for building the compiler?) Does it make sense to also do this for build dependencies? |
@Swatinem These regress because they can only use a single thread with 1 CGU, and rustc-perf compiles them one at a time (to reduce noise) unlike a regular compilation. |
Let's try again |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 0c55b04 with merge 52030c72a1e46f9e8975475953b74c0cb2441ee4... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (52030c72a1e46f9e8975475953b74c0cb2441ee4): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
if profile == "fast" { | ||
Some(1) | ||
} else { | ||
if crate_name.starts_with("rustc_") { | ||
None | ||
} else { | ||
// Compile crates.io crates with a single CGU for faster compile times | ||
Some(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand - doesn't decreasing the CGUs slow down compilation because we aren't running multiple LLVM threads in parallel? Or do you mean this makes the compiler faster at runtime in exchange for slower bootstrap times? In that case "fast" seems like a misleading name ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the crates are still compiled in parallel even if a single crate is limited to 1 thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but rustc_driver still can't start building until all other crates have finished, and this is still pushing up the overall build time.
I'm not saying it's a bad change, for CI it makes sense the same way that PGO makes sense, I'd just like to find a different name. Maybe "reduce-codegen-units" or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It reduces build time with 7 CPU cores, it may increase it for high core count CPUs, so I agree the name is not ideal. Maybe codegen-units-reduce
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be most useful in combination with LTO, rarely by itself right ? So maybe it could be one of the LTO options, since that's already an enum that doesn't exactly match -C lto
's values (e.g. "thin-local"). Something like rust.lto = "thin-1cgu"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. This option is intended to reduce the compilation time of rustc. LTO increases compile time to give better runtime performance and is better paired with the existing codegen-units=1
so all crates have a single CGU.
I guess the extra |
Here's a benchmark of ThinLTO off, vs ThinLTO on, vs (ThinLTO on + codegen-units-fast):
|
I suppose that when you write Thin-LTO off, you really mean Thin-Local LTO, right? |
Yeah. I don't really consider 'Thin-Local LTO' 'real' LTO. |
First |
@mati865 |
CI is far too noisy to draw any conclusions. |
The 2 try builds in this PR built the stage 1 compiler in 2m 55s and 4m 48s. That's at least 37% of noise :) |
Hello @Zoxc! I noticed there's some merge conflicts for this PR. What's the status of it? |
I think it may be a better idea to try to split up |
This adds an option
codegen-units-fast
to compile most compiler crates with a single CGU. This results in a 13.3% reduction in the compile time of the compiler (7m 13.0s to 6m 15.4s), a 6% reduction ofrustc_driver
's code size and 5% reduced runtime of the compiler incheck
builds. This option is turned on for CI.It also defaults to compiling dependencies of compiler crates with a single CGU. Additionally
rustc_query_impl
gets 96 CGUs to extract some extra parallelism.