-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc: Default 32 codegen units at O0 #44853
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
a6c4f73
to
8e09bfb
Compare
I feel this is an Cargo thing. |
@ishitatsuyuki Cargo can definitely control this - but we want a sane default when running raw This will make it so that if no other configuration happens - either via rustc parameters, or cargo configuration, rustc defaults to a very parallel build of a single crate. Beginner projects which don't use cargo shouldn't have to have a slow build just because they aren't using cargo. This makes a sane default for all uses of |
This is the jobserver (for cpu resource management) and async-llvm (for peak memory consumption) really pay off r=me with the tests fixed. |
On a separate note: I don't like how we often duplicate things between |
What's the situation with perf.rlo? Are we still limiting benchmarking to a single core there? |
To my knowledge, all benchmarks on perf.rlo currently are using 8 threads of parallelism. @alexcrichton may be able to correct me if I recall incorrectly. |
This commit changes the default of rustc to use 32 codegen units when compiling in debug mode, typically an opt-level=0 compilation. Since their inception codegen units have matured quite a bit, gaining features such as: * Parallel translation and codegen enabling codegen units to get worked on even more quickly. * Deterministic and reliable partitioning through the same infrastructure as incremental compilation. * Global rate limiting through the `jobserver` crate to avoid overloading the system. The largest benefit of codegen units has forever been faster compilation through parallel processing of modules on the LLVM side of things, using all the cores available on build machines that typically have many available. Some downsides have been fixed through the features above, but the major downside remaining is that using codegen units reduces opportunities for inlining and optimization. This, however, doesn't matter much during debug builds! In this commit the default number of codegen units for debug builds has been raised from 1 to 32. This should enable most `cargo build` compiles that are bottlenecked on translation and/or code generation to immediately see speedups through parallelization on available cores. Work is being done to *always* enable multiple codegen units (and therefore parallel codegen) but it requires rust-lang#44841 at least to be landed and stabilized, but stay tuned if you're interested in that aspect!
8e09bfb
to
9e35b79
Compare
@bors: r=michaelwoerister |
📌 Commit 9e35b79 has been approved by |
agreed that the duplication is unfortunate! I'd hope that one day we could just use functions to access these rather than accessing fields, but agreed that this is probably best left for a future PR |
Forwarding here a comment that I accidentally left in #44841 I've just ran some build time benchmarks on my project that uses a lot of popular rust libraries and codegen (diesel, hyper, serde, tokio, futures, reqwest) on my Intel Core i5 laptop (skylake 2c/4t) and got these results:
Cargo profile:
rustc 1.22.0-nightly (17f56c5 2017-09-21) As expected, the best results is for codegen units of number of cpus and 32 is way to much for an average machine. Did you concider an option to select number of codegen units depending on number of cpus, with Thank you for working on compile times! |
@mersinvald fascinating! First up though, can you clarify what you were measuring? Was it a Locally I ran
oddly though sometimes it was very variable what the build times were...
I've got an 8 core machine locally, but the number of cores vs number of codegen units should have little effect on compile time (in theory). The codegen units are chosen to be explicitly high here to hopefully make sure that no codegen unit takes too long in the optimizer, allowing ideally for optimal use of all available cores throughout compilation. Additionally, more cgus should mean a lower peak memory of rustc itself due to async translation/codegen. Are you sure you didn't have anythign else running in the background when you were collecting that timing? And were the timings you got reproducible? |
I'm building with Good point about background tasks, though. I disabled everything that can eat up CPU to get more steady results and re-ran each build three times:
Results seem to be quiet steady and reproducible 2-4 units are optimal for my setup. Btw, I've updated rustc to the latest nightly version before running new tests, so now it is:
|
@mersinvald hm so if you're on Linux, mind poking around with Otherwise though this is indeed curious! It may be worth drilling into specific crates as well, maybe going one at a time running rustc by hand. If one crate takes way longer in 32 codegen units than in 2 then that's something to investigate. Overall builds tend to be hard to drill into :( |
@alexcrichton ok, I'll do |
@alexcrichton i've collected statistics for clean https://drive.google.com/drive/folders/0B28cL71oGfpOTVRKMUZjTlFQU2M?usp=sharing
Hope it will help. I don't think I can interpret this data myself, but it you'll need me to run perf on some specific crates, feel free to ask, I'm happy to help. |
…rister rustc: Default 32 codegen units at O0 This commit changes the default of rustc to use 32 codegen units when compiling in debug mode, typically an opt-level=0 compilation. Since their inception codegen units have matured quite a bit, gaining features such as: * Parallel translation and codegen enabling codegen units to get worked on even more quickly. * Deterministic and reliable partitioning through the same infrastructure as incremental compilation. * Global rate limiting through the `jobserver` crate to avoid overloading the system. The largest benefit of codegen units has forever been faster compilation through parallel processing of modules on the LLVM side of things, using all the cores available on build machines that typically have many available. Some downsides have been fixed through the features above, but the major downside remaining is that using codegen units reduces opportunities for inlining and optimization. This, however, doesn't matter much during debug builds! In this commit the default number of codegen units for debug builds has been raised from 1 to 32. This should enable most `cargo build` compiles that are bottlenecked on translation and/or code generation to immediately see speedups through parallelization on available cores. Work is being done to *always* enable multiple codegen units (and therefore parallel codegen) but it requires #44841 at least to be landed and stabilized, but stay tuned if you're interested in that aspect!
☀️ Test successful - status-appveyor, status-travis |
rust-lang/rust#44853 changed the default number of codegen units from 1 to 32 for the dev profile. Unfortunately this broke our dev builds so we are reverting the change in the Cargo.toml.
This commit changes the default of rustc to use 32 codegen units when compiling
in debug mode, typically an opt-level=0 compilation. Since their inception
codegen units have matured quite a bit, gaining features such as:
more quickly.
incremental compilation.
jobserver
crate to avoid overloading thesystem.
The largest benefit of codegen units has forever been faster compilation through
parallel processing of modules on the LLVM side of things, using all the cores
available on build machines that typically have many available. Some downsides
have been fixed through the features above, but the major downside remaining is
that using codegen units reduces opportunities for inlining and optimization.
This, however, doesn't matter much during debug builds!
In this commit the default number of codegen units for debug builds has been
raised from 1 to 32. This should enable most
cargo build
compiles that arebottlenecked on translation and/or code generation to immediately see speedups
through parallelization on available cores.
Work is being done to always enable multiple codegen units (and therefore
parallel codegen) but it requires #44841 at least to be landed and stabilized,
but stay tuned if you're interested in that aspect!