Add an option to tune compiler crates' CGUs to bootstrap #107560

Zoxc · 2023-02-01T12:44:33Z

This adds an option codegen-units-fast to compile most compiler crates with a single CGU. This results in a 13.3% reduction in the compile time of the compiler (7m 13.0s to 6m 15.4s), a 6% reduction of rustc_driver's code size and 5% reduced runtime of the compiler in check builds. This option is turned on for CI.

It also defaults to compiling dependencies of compiler crates with a single CGU. Additionally rustc_query_impl gets 96 CGUs to extract some extra parallelism.

rustbot · 2023-02-01T12:44:40Z

r? @albertlarsan68

(rustbot has picked a reviewer for you, use r? to override)

albertlarsan68 · 2023-02-01T13:42:12Z

I think this warrants a perf build

@bors try @rust-timer queue

bors · 2023-02-01T13:42:22Z

⌛ Trying commit b3fd7ad with merge 30a86919c3e1ec444a496583deaf735e856805ad...

bors · 2023-02-01T15:56:17Z

☀️ Try build successful - checks-actions
Build commit: 30a86919c3e1ec444a496583deaf735e856805ad (30a86919c3e1ec444a496583deaf735e856805ad)

rust-timer · 2023-02-01T18:31:44Z

Finished benchmarking commit (30a86919c3e1ec444a496583deaf735e856805ad): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.0%	[0.4%, 2.8%]	33
Regressions ❌ (secondary)	4.7%	[0.4%, 27.2%]	56
Improvements ✅ (primary)	-0.8%	[-3.2%, -0.2%]	103
Improvements ✅ (secondary)	-1.4%	[-3.0%, -0.2%]	111
All ❌✅ (primary)	-0.4%	[-3.2%, 2.8%]	136

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.2%	[-6.3%, -0.4%]	55
Improvements ✅ (secondary)	-3.7%	[-10.1%, -1.2%]	187
All ❌✅ (primary)	-2.2%	[-6.3%, -0.4%]	55

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.0%	[1.6%, 2.8%]	4
Regressions ❌ (secondary)	7.2%	[2.5%, 18.0%]	21
Improvements ✅ (primary)	-2.1%	[-4.8%, -0.8%]	33
Improvements ✅ (secondary)	-2.6%	[-4.8%, -1.2%]	48
All ❌✅ (primary)	-1.6%	[-4.8%, 2.8%]	37

Zoxc · 2023-02-01T19:09:12Z

There's 3 crates with large regressions. I wonder if this is due to rustc_query_impl's CGU bump. Feel free to start another perf run to find out :)

The rustc bootstrap times look good though.

Swatinem · 2023-02-01T19:16:16Z

The rustc bootstrap times look good though.

Some of the regressions there look like derives (used for building the compiler?) Does it make sense to also do this for build dependencies?

Zoxc · 2023-02-01T19:30:44Z

@Swatinem These regress because they can only use a single thread with 1 CGU, and rustc-perf compiles them one at a time (to reduce noise) unlike a regular compilation.

albertlarsan68 · 2023-02-01T19:35:24Z

Let's try again
@bors try @rust-timer queue

bors · 2023-02-01T19:35:33Z

⌛ Trying commit 0c55b04 with merge 52030c72a1e46f9e8975475953b74c0cb2441ee4...

bors · 2023-02-01T22:36:40Z

☀️ Try build successful - checks-actions
Build commit: 52030c72a1e46f9e8975475953b74c0cb2441ee4 (52030c72a1e46f9e8975475953b74c0cb2441ee4)

rust-timer · 2023-02-02T01:21:15Z

Finished benchmarking commit (52030c72a1e46f9e8975475953b74c0cb2441ee4): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.9%	[0.2%, 3.0%]	69
Regressions ❌ (secondary)	4.7%	[0.3%, 27.1%]	58
Improvements ✅ (primary)	-1.0%	[-3.9%, -0.2%]	68
Improvements ✅ (secondary)	-1.5%	[-3.4%, -0.3%]	113
All ❌✅ (primary)	-0.0%	[-3.9%, 3.0%]	137

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.3%	[-8.1%, -0.5%]	69
Improvements ✅ (secondary)	-4.1%	[-9.6%, -1.4%]	188
All ❌✅ (primary)	-2.3%	[-8.1%, -0.5%]	69

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.2%	[0.9%, 1.6%]	2
Regressions ❌ (secondary)	6.5%	[2.1%, 13.7%]	17
Improvements ✅ (primary)	-2.2%	[-4.6%, -1.0%]	42
Improvements ✅ (secondary)	-2.9%	[-5.2%, -1.1%]	68
All ❌✅ (primary)	-2.1%	[-4.6%, 1.6%]	44

jyn514 · 2023-02-02T05:47:24Z

src/bootstrap/bin/rustc.rs

+                    if profile == "fast" {
+                        Some(1)
+                    } else {
+                        if crate_name.starts_with("rustc_") {
+                            None
+                        } else {
+                            // Compile crates.io crates with a single CGU for faster compile times
+                            Some(1)


I don't understand - doesn't decreasing the CGUs slow down compilation because we aren't running multiple LLVM threads in parallel? Or do you mean this makes the compiler faster at runtime in exchange for slower bootstrap times? In that case "fast" seems like a misleading name ...

All the crates are still compiled in parallel even if a single crate is limited to 1 thread.

Right, but rustc_driver still can't start building until all other crates have finished, and this is still pushing up the overall build time.

I'm not saying it's a bad change, for CI it makes sense the same way that PGO makes sense, I'd just like to find a different name. Maybe "reduce-codegen-units" or something like that?

It reduces build time with 7 CPU cores, it may increase it for high core count CPUs, so I agree the name is not ideal. Maybe codegen-units-reduce?

This will be most useful in combination with LTO, rarely by itself right ? So maybe it could be one of the LTO options, since that's already an enum that doesn't exactly match -C lto's values (e.g. "thin-local"). Something like rust.lto = "thin-1cgu" ?

Not quite. This option is intended to reduce the compilation time of rustc. LTO increases compile time to give better runtime performance and is better paired with the existing codegen-units=1 so all crates have a single CGU.

Zoxc · 2023-02-02T05:58:17Z

I guess the extra rustc_query_impl was not to blame. match-stress's compile time is reduced by 8% locally. I did try to compile the compiler with ThinLTO, but match-stress's compile time is reduced by 0.50% then.

Zoxc · 2023-02-02T07:06:56Z

Here's a benchmark of ThinLTO off, vs ThinLTO on, vs (ThinLTO on + codegen-units-fast):

clap:check                        1.7997s   1.7348s  -3.61%   1.6162s -10.20%
hyper:check                       0.2617s   0.2535s  -3.12%   0.2358s  -9.92%
syntex_syntax:check               6.2346s   5.9775s  -4.12%   5.5946s -10.26%
syn:check                         1.6488s   1.5639s  -5.15%   1.4579s -11.58%
regex:check                       1.0393s   0.9976s  -4.01%   0.9352s -10.01%
match-stress:check                1.1822s   1.0656s  -9.87%   1.0578s -10.53%

Total                            12.1663s  11.5929s  -4.71%  10.8975s -10.43%
Summary                           2.0000s   1.9004s  -4.98%   1.7917s -10.42%

albertlarsan68 · 2023-02-02T07:17:49Z

I suppose that when you write Thin-LTO off, you really mean Thin-Local LTO, right?

Zoxc · 2023-02-02T07:26:03Z

Yeah. I don't really consider 'Thin-Local LTO' 'real' LTO.

mati865 · 2023-02-02T15:55:48Z

This results in a 13.3% reduction in the compile time of the compiler (7m 13.0s to 6m 15.4s), a 6% reduction of rustc_driver's code size and 5% reduced runtime of the compiler in check builds. This option is turned on for CI.

First rust-timer build shows 10.5s (1.5%) regression in Rust build time, 2nd build shows 288.0s (40.2%) regression in Rust build time. Doesn't seem like it would be beneficial for CI to enable this.

Zoxc · 2023-02-02T22:56:59Z

@mati865 rustc-perf builds a single crate at a time to reduce noise. It's not representative of CI or regular builds.

jyn514 · 2023-02-03T00:08:34Z

@Zoxc x86_64-gnu-llvm-13 should be representative of CI though, right? Can you find another PR to compare the times to - it looks like e.g. #107615 takes 39 minutes compared to the 49 minutes here.

Zoxc · 2023-02-03T00:22:33Z

CI is far too noisy to draw any conclusions.

jyn514 · 2023-02-03T00:29:56Z

Ok, but 10 minutes out of 40 total seems pretty significant ... #107614 shows 36 minutes, #107608 shows 33 minutes, #107599 shows 41 minutes. Seems unlikely to be just noise, especially when the stated goal is to decrease CI times.

Zoxc · 2023-02-03T00:40:01Z

The 2 try builds in this PR built the stage 1 compiler in 2m 55s and 4m 48s. That's at least 37% of noise :)

anden3 · 2023-04-13T19:05:52Z

Hello @Zoxc! I noticed there's some merge conflicts for this PR. What's the status of it?

Zoxc · 2023-04-19T20:16:55Z

I think it may be a better idea to try to split up rustc_query_impl, so I'll close this for now.

rustbot assigned albertlarsan68 Feb 1, 2023

Zoxc force-pushed the cgu-tune2 branch from 7767c21 to 31496fa Compare February 1, 2023 12:53

Add an option to tune compiler crates' CGUs to bootstrap

b3fd7ad

Zoxc force-pushed the cgu-tune2 branch from 31496fa to b3fd7ad Compare February 1, 2023 13:02

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 1, 2023

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 1, 2023

WIP performance test

0c55b04

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 1, 2023

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 2, 2023

jyn514 reviewed Feb 2, 2023

View reviewed changes

Zoxc mentioned this pull request Feb 4, 2023

[WIP] Build rustc with a single CGU on x64 Linux #107651

Merged

anden3 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 13, 2023

Zoxc closed this Apr 19, 2023

Add an option to tune compiler crates' CGUs to bootstrap #107560

Add an option to tune compiler crates' CGUs to bootstrap #107560

Uh oh!

Conversation

Zoxc commented Feb 1, 2023

Uh oh!

rustbot commented Feb 1, 2023

Uh oh!

albertlarsan68 commented Feb 1, 2023

Uh oh!

This comment has been minimized.

bors commented Feb 1, 2023

Uh oh!

bors commented Feb 1, 2023

Uh oh!

This comment has been minimized.

rust-timer commented Feb 1, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Uh oh!

Zoxc commented Feb 1, 2023

Uh oh!

Swatinem commented Feb 1, 2023

Uh oh!

Zoxc commented Feb 1, 2023

Uh oh!

albertlarsan68 commented Feb 1, 2023

Uh oh!

This comment has been minimized.

bors commented Feb 1, 2023

Uh oh!

bors commented Feb 1, 2023

Uh oh!

This comment has been minimized.

rust-timer commented Feb 2, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zoxc commented Feb 2, 2023

Uh oh!

Zoxc commented Feb 2, 2023

Uh oh!

albertlarsan68 commented Feb 2, 2023

Uh oh!

Zoxc commented Feb 2, 2023

Uh oh!

mati865 commented Feb 2, 2023

Uh oh!

Zoxc commented Feb 2, 2023

Uh oh!

jyn514 commented Feb 3, 2023

Uh oh!

Zoxc commented Feb 3, 2023

Uh oh!

jyn514 commented Feb 3, 2023

Uh oh!

Zoxc commented Feb 3, 2023

Uh oh!

anden3 commented Apr 13, 2023

Uh oh!

Zoxc commented Apr 19, 2023

Uh oh!

Uh oh!