-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize rustc via multi-process approach #47518
Comments
I still think compiling dependent crates in parallel with codegen is better. |
I'm thinking of cases like the |
Another benefit of this approach is that we can split a crate into |
Some crates are rather heavy on parsing and expansion, such as winapi, which spends 17% of its time on just that. Duplicating that work across multiple processes might not be the best idea. |
I wonder if cargo had a way to recompile the current crate multiple times with different flags and automatically update |
We could also scale the compiler to multiple machines by distributing codegen units compiled to LLVM bitcode and running optimizations on multiple machines. This would be very effective for release builds, given how LLVM dominates the build time and is already parallel. My plan to parallelize the compiler using Rayon would ensure we could generate and send LLVM bitcode even faster, making this more effective. This has a number of advantages:
The disadvantage is that only LLVM optimization and code generation can be distributed, though that is a large portion of the compile time. This seems like a good idea to me, especially if we could make it easy to setup. Distributing work across multiple machines also seems to be an effective way to speed up bors too. Does @rust-lang/infra have any opinions on this? |
#44675 (comment) indicated that tweaking codegen-units decreased bootstrap time but increased time taken to run tests. So even if we distributed and sped up compilation of rustc itself, that's only one part of the story for bors times. I can see us trying it if it was available, just wanted to note it may not be an easy win. |
@aidanhs I expect that ThinLTO will bring performance with multiple codegen units on pair with a single one. We may have to wait a bit for that though. Updating LLVM would be a good start. |
@Zoxc's variant would mesh well with MIR-only RLIBs. |
Another way to distribute work across multiple machines by sending the whole crate as source code to all the machines. Each machine could then run a single This scheme isn't as efficient as the one I proposed above, since parsing and other things would be done per machine, but other things like type checking could scale better. |
That's roughly what I proposed here originally (+work stealing, maybe?). |
@michaelwoerister And it would use a single parallel rustc instance per machine, instead of multiple rustc instances per machine, like you proposed. |
If we're compiling across multiple machines, https://github.com/distcc/distcc might be a good reference point. |
I am referring this issue to WG-compiler-parallel for assessment by them, given that they are focusing on parallelism (distinctly: via the multithreaded approach) and I believe they should accept/reject this. |
For what it's worth, I mostly opened this issue to explore the design space a bit. I don't think there's a reason to keep it open. |
With the parallel front-end just being shipped to nightly, I think we can close this. |
For big crates, the Rust compiler can be stuck in single-threaded execution for quite some time because only the last phase of compilation is properly parallelized. This issue describes one particular approach for making most of compilation parallel.
Basic Concept: Spawn multiple
rustc
processes that compile "vertical slices" of a crateThe compiler's internal architecture has become rather flexible and demand-driven over the last couple of years and one could imagine implementing an option for the compiler that allows it to just compile part of a crate. Given a deterministic partitioning for a crate, one could then run multiple compilation processes for compiling disjunct parts of a crate in parallel and then stitch those parts together in a final step. This is very similar to a traditional compiler & linker setup.
Advantages
Disadvantages
Conclusion
I am not particularly advocating for following this approach. This issue is meant to provide input for a wider discussion on how to bring more parallelism to the compilation process. This approach is kind of brute-force. However, I have to say, after thinking about it a little I am surprised to actually find it viable
:)
cc @rust-lang/compiler
The text was updated successfully, but these errors were encountered: