Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to tune compiler crates' CGUs to bootstrap #107560

Closed
wants to merge 2 commits into from

Conversation

Zoxc
Copy link
Contributor

@Zoxc Zoxc commented Feb 1, 2023

This adds an option codegen-units-fast to compile most compiler crates with a single CGU. This results in a 13.3% reduction in the compile time of the compiler (7m 13.0s to 6m 15.4s), a 6% reduction of rustc_driver's code size and 5% reduced runtime of the compiler in check builds. This option is turned on for CI.

It also defaults to compiling dependencies of compiler crates with a single CGU. Additionally rustc_query_impl gets 96 CGUs to extract some extra parallelism.

@rustbot
Copy link
Collaborator

rustbot commented Feb 1, 2023

r? @albertlarsan68

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Feb 1, 2023
@albertlarsan68
Copy link
Member

I think this warrants a perf build

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 1, 2023
@bors
Copy link
Contributor

bors commented Feb 1, 2023

⌛ Trying commit b3fd7ad with merge 30a86919c3e1ec444a496583deaf735e856805ad...

@bors
Copy link
Contributor

bors commented Feb 1, 2023

☀️ Try build successful - checks-actions
Build commit: 30a86919c3e1ec444a496583deaf735e856805ad (30a86919c3e1ec444a496583deaf735e856805ad)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (30a86919c3e1ec444a496583deaf735e856805ad): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [0.4%, 2.8%] 33
Regressions ❌
(secondary)
4.7% [0.4%, 27.2%] 56
Improvements ✅
(primary)
-0.8% [-3.2%, -0.2%] 103
Improvements ✅
(secondary)
-1.4% [-3.0%, -0.2%] 111
All ❌✅ (primary) -0.4% [-3.2%, 2.8%] 136

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.2% [-6.3%, -0.4%] 55
Improvements ✅
(secondary)
-3.7% [-10.1%, -1.2%] 187
All ❌✅ (primary) -2.2% [-6.3%, -0.4%] 55

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.0% [1.6%, 2.8%] 4
Regressions ❌
(secondary)
7.2% [2.5%, 18.0%] 21
Improvements ✅
(primary)
-2.1% [-4.8%, -0.8%] 33
Improvements ✅
(secondary)
-2.6% [-4.8%, -1.2%] 48
All ❌✅ (primary) -1.6% [-4.8%, 2.8%] 37

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 1, 2023
@Zoxc
Copy link
Contributor Author

Zoxc commented Feb 1, 2023

There's 3 crates with large regressions. I wonder if this is due to rustc_query_impl's CGU bump. Feel free to start another perf run to find out :)

The rustc bootstrap times look good though.

@Swatinem
Copy link
Contributor

Swatinem commented Feb 1, 2023

The rustc bootstrap times look good though.

Some of the regressions there look like derives (used for building the compiler?) Does it make sense to also do this for build dependencies?

@Zoxc
Copy link
Contributor Author

Zoxc commented Feb 1, 2023

@Swatinem These regress because they can only use a single thread with 1 CGU, and rustc-perf compiles them one at a time (to reduce noise) unlike a regular compilation.

@albertlarsan68
Copy link
Member

Let's try again
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 1, 2023
@bors
Copy link
Contributor

bors commented Feb 1, 2023

⌛ Trying commit 0c55b04 with merge 52030c72a1e46f9e8975475953b74c0cb2441ee4...

@bors
Copy link
Contributor

bors commented Feb 1, 2023

☀️ Try build successful - checks-actions
Build commit: 52030c72a1e46f9e8975475953b74c0cb2441ee4 (52030c72a1e46f9e8975475953b74c0cb2441ee4)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (52030c72a1e46f9e8975475953b74c0cb2441ee4): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.9% [0.2%, 3.0%] 69
Regressions ❌
(secondary)
4.7% [0.3%, 27.1%] 58
Improvements ✅
(primary)
-1.0% [-3.9%, -0.2%] 68
Improvements ✅
(secondary)
-1.5% [-3.4%, -0.3%] 113
All ❌✅ (primary) -0.0% [-3.9%, 3.0%] 137

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.3% [-8.1%, -0.5%] 69
Improvements ✅
(secondary)
-4.1% [-9.6%, -1.4%] 188
All ❌✅ (primary) -2.3% [-8.1%, -0.5%] 69

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.2% [0.9%, 1.6%] 2
Regressions ❌
(secondary)
6.5% [2.1%, 13.7%] 17
Improvements ✅
(primary)
-2.2% [-4.6%, -1.0%] 42
Improvements ✅
(secondary)
-2.9% [-5.2%, -1.1%] 68
All ❌✅ (primary) -2.1% [-4.6%, 1.6%] 44

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 2, 2023
Comment on lines +81 to +88
if profile == "fast" {
Some(1)
} else {
if crate_name.starts_with("rustc_") {
None
} else {
// Compile crates.io crates with a single CGU for faster compile times
Some(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand - doesn't decreasing the CGUs slow down compilation because we aren't running multiple LLVM threads in parallel? Or do you mean this makes the compiler faster at runtime in exchange for slower bootstrap times? In that case "fast" seems like a misleading name ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the crates are still compiled in parallel even if a single crate is limited to 1 thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but rustc_driver still can't start building until all other crates have finished, and this is still pushing up the overall build time.

I'm not saying it's a bad change, for CI it makes sense the same way that PGO makes sense, I'd just like to find a different name. Maybe "reduce-codegen-units" or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It reduces build time with 7 CPU cores, it may increase it for high core count CPUs, so I agree the name is not ideal. Maybe codegen-units-reduce?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be most useful in combination with LTO, rarely by itself right ? So maybe it could be one of the LTO options, since that's already an enum that doesn't exactly match -C lto's values (e.g. "thin-local"). Something like rust.lto = "thin-1cgu" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. This option is intended to reduce the compilation time of rustc. LTO increases compile time to give better runtime performance and is better paired with the existing codegen-units=1 so all crates have a single CGU.

@Zoxc
Copy link
Contributor Author

Zoxc commented Feb 2, 2023

I guess the extra rustc_query_impl was not to blame. match-stress's compile time is reduced by 8% locally. I did try to compile the compiler with ThinLTO, but match-stress's compile time is reduced by 0.50% then.

@Zoxc
Copy link
Contributor Author

Zoxc commented Feb 2, 2023

Here's a benchmark of ThinLTO off, vs ThinLTO on, vs (ThinLTO on + codegen-units-fast):

clap:check                        1.7997s   1.7348s  -3.61%   1.6162s -10.20%
hyper:check                       0.2617s   0.2535s  -3.12%   0.2358s  -9.92%
syntex_syntax:check               6.2346s   5.9775s  -4.12%   5.5946s -10.26%
syn:check                         1.6488s   1.5639s  -5.15%   1.4579s -11.58%
regex:check                       1.0393s   0.9976s  -4.01%   0.9352s -10.01%
match-stress:check                1.1822s   1.0656s  -9.87%   1.0578s -10.53%

Total                            12.1663s  11.5929s  -4.71%  10.8975s -10.43%
Summary                           2.0000s   1.9004s  -4.98%   1.7917s -10.42%

@albertlarsan68
Copy link
Member

I suppose that when you write Thin-LTO off, you really mean Thin-Local LTO, right?

@Zoxc
Copy link
Contributor Author

Zoxc commented Feb 2, 2023

Yeah. I don't really consider 'Thin-Local LTO' 'real' LTO.

@mati865
Copy link
Contributor

mati865 commented Feb 2, 2023

This results in a 13.3% reduction in the compile time of the compiler (7m 13.0s to 6m 15.4s), a 6% reduction of rustc_driver's code size and 5% reduced runtime of the compiler in check builds. This option is turned on for CI.

First rust-timer build shows 10.5s (1.5%) regression in Rust build time, 2nd build shows 288.0s (40.2%) regression in Rust build time. Doesn't seem like it would be beneficial for CI to enable this.

@Zoxc
Copy link
Contributor Author

Zoxc commented Feb 2, 2023

@mati865 rustc-perf builds a single crate at a time to reduce noise. It's not representative of CI or regular builds.

@jyn514
Copy link
Member

jyn514 commented Feb 3, 2023

@Zoxc x86_64-gnu-llvm-13 should be representative of CI though, right? Can you find another PR to compare the times to - it looks like e.g. #107615 takes 39 minutes compared to the 49 minutes here.

@Zoxc
Copy link
Contributor Author

Zoxc commented Feb 3, 2023

CI is far too noisy to draw any conclusions.

@jyn514
Copy link
Member

jyn514 commented Feb 3, 2023

Ok, but 10 minutes out of 40 total seems pretty significant ... #107614 shows 36 minutes, #107608 shows 33 minutes, #107599 shows 41 minutes. Seems unlikely to be just noise, especially when the stated goal is to decrease CI times.

@Zoxc
Copy link
Contributor Author

Zoxc commented Feb 3, 2023

The 2 try builds in this PR built the stage 1 compiler in 2m 55s and 4m 48s. That's at least 37% of noise :)

@anden3 anden3 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 13, 2023
@anden3
Copy link
Contributor

anden3 commented Apr 13, 2023

Hello @Zoxc! I noticed there's some merge conflicts for this PR. What's the status of it?

@Zoxc
Copy link
Contributor Author

Zoxc commented Apr 19, 2023

I think it may be a better idea to try to split up rustc_query_impl, so I'll close this for now.

@Zoxc Zoxc closed this Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc perf-regression Performance regression. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants