|
| 1 | +# Optimized build of the compiler |
| 2 | + |
| 3 | +<!-- toc --> |
| 4 | + |
| 5 | +There are multiple additional build configuration options and techniques that can used to compile a build of `rustc` |
| 6 | +that is as optimized as possible (for example when building `rustc` for a Linux distribution). The status of these configuration |
| 7 | +options for various Rust targets is tracked [here]. This page describes how you can use these approaches when |
| 8 | +building `rustc` yourself. |
| 9 | + |
| 10 | +[here]: https://github.com/rust-lang/rust/issues/103595 |
| 11 | + |
| 12 | +## Link-time optimization |
| 13 | +Link-time optimization is a powerful compiler technique that can increase program performance. To enable (Thin-)LTO when |
| 14 | +building `rustc`, set the `rust.lto` config option to `"thin"` in `config.toml`: |
| 15 | + |
| 16 | +```toml |
| 17 | +[rust] |
| 18 | +lto = "thin" |
| 19 | +``` |
| 20 | + |
| 21 | +> Note that LTO for `rustc` is currently supported and tested only for the `x86_64-unknown-linux-gnu` target. Other |
| 22 | +> targets *may* work, but no guarantees are provided. Notably, LTO optimized `rustc` currently produces |
| 23 | +> [miscompilations] on Windows. |
| 24 | +
|
| 25 | +[miscompilations]: https://github.com/rust-lang/rust/issues/109114 |
| 26 | + |
| 27 | +Enabling LTO on Linux has [produced] speed-ups by up to 10%. |
| 28 | + |
| 29 | +[produced]: https://github.com/rust-lang/rust/pull/101403#issuecomment-1288190019 |
| 30 | + |
| 31 | +## Memory allocator |
| 32 | +Using a different memory allocator for `rustc` can provide significant performance benefits. If you want to enable |
| 33 | +the `jemalloc` allocator, you can set the `rust.jemalloc` option to `true` in `config.toml`: |
| 34 | + |
| 35 | +```toml |
| 36 | +[rust] |
| 37 | +jemalloc = true |
| 38 | +``` |
| 39 | + |
| 40 | +> Note that this option is currently only supported for Linux and macOS targets. |
| 41 | +
|
| 42 | +## Codegen units |
| 43 | +Reducing the amount of codegen units per `rustc` crate can produce a faster build of the compiler. You can modify the |
| 44 | +number of codegen units for `rustc` and `libstd` in `config.toml` with the following options: |
| 45 | + |
| 46 | +```toml |
| 47 | +[rust] |
| 48 | +codegen-units = 1 |
| 49 | +codegen-units-std = 1 |
| 50 | +``` |
| 51 | + |
| 52 | +## Instruction set |
| 53 | +By default, `rustc` is compiled for a generic (and conservative) instruction set architecture (depending on the selected |
| 54 | +target), to make it support as many CPUs as possible. If you want to compile `rustc` for a specific instruction |
| 55 | +set architecture, you can set the `target_cpu` compiler option in `RUSTFLAGS`: |
| 56 | + |
| 57 | +```bash |
| 58 | +$ RUSTFLAGS="-C target_cpu=x86-64-v3" x.py build ... |
| 59 | +``` |
| 60 | + |
| 61 | +If you also want to compile LLVM for a specific instruction set, you can set `llvm` flags in `config.toml`: |
| 62 | + |
| 63 | +```toml |
| 64 | +[llvm] |
| 65 | +cxxflags = "-march=x86-64-v3" |
| 66 | +cflags = "-march=x86-64-v3" |
| 67 | +``` |
| 68 | + |
| 69 | +## Profile-guided optimization |
| 70 | +Applying profile-guided optimizations (or more generally, feedback-directed optimizations) can produce a large increase |
| 71 | +to `rustc` performance, by up to 25%. However, these techniques are not simply enabled by a configuration option, |
| 72 | +but rather they require a complex build workflow that compiles `rustc` multiple times and profiles it on selected benchmarks. |
| 73 | + |
| 74 | +There is a tool called `opt-dist` that is used to optimize `rustc` with [PGO] (profile-guided optimizations) and [BOLT] |
| 75 | +(a post-link binary optimizer) for builds distributed to end users. You can examine the tool, which is located |
| 76 | +in `src/tools/opt-dist`, and build a custom PGO build workflow based on it, or try to use it directly. Note that the tool |
| 77 | +is currently quite hardcoded to the way we use it in Rust's continuous integration workflows, and it might require some |
| 78 | +custom changes to make it work in a different environment. |
| 79 | + |
| 80 | +[PGO]: https://doc.rust-lang.org/rustc/profile-guided-optimization.html |
| 81 | +[BOLT]: https://github.com/llvm/llvm-project/blob/main/bolt/README.md |
| 82 | + |
| 83 | +To use the tool, you will need to provide some external dependencies: |
| 84 | +- A Python3 interpreter (for executing `x.py`). |
| 85 | +- Compiled LLVM toolchain, with the `llvm-profdata` binary. Optionally, if you want to use BOLT, the `llvm-bolt` and |
| 86 | +`merge-fdata` binaries have to be available in the toolchain. |
| 87 | +- Downloaded [Rust benchmark suite]. (You can also let the tool download it itself, if you implement a custom environment, |
| 88 | +see below). |
| 89 | + |
| 90 | +These dependencies are provided to `opt-dist` by an implementation of the [`Environment`] trait. You can either implement |
| 91 | +the trait for your custom environment, by providing paths to these dependencies in its methods, or reuse one of the existing |
| 92 | +implementations (currently, there is an implementation for Linux and Windows). If you want your environment to support |
| 93 | +BOLT, return `true` from the `supports_bolt` method. |
| 94 | + |
| 95 | +Here is an example of how can `opt-dist` be used with the default Linux environment (it assumes that you execute the |
| 96 | +following commands on a Linux system): |
| 97 | + |
| 98 | +1. Build the tool with the following command: |
| 99 | + ```bash |
| 100 | + $ python3 x.py build tools/opt-dist |
| 101 | + ``` |
| 102 | +2. Run the tool with the `PGO_HOST` environment variable set to the 64-bit Linux target: |
| 103 | + ```bash |
| 104 | + $ PGO_HOST=x86_64-unknown-linux-gnu ./build/host/stage0-tools-bin/opt-dist |
| 105 | + ``` |
| 106 | + Note that the default Linux environment expects several hardcoded paths to exist: |
| 107 | + - `/checkout` should contain a checkout of the Rust compiler repository that will be compiled. |
| 108 | + - `/rustroot` should contain the compiled LLVM toolchain (containing BOLT). |
| 109 | + - A Python 3 interpreter should be available under the `python3` binary. |
| 110 | + - `/tmp/rustc-perf` should contain a downloaded checkout of the Rust benchmark suite. |
| 111 | + |
| 112 | +You can modify `LinuxEnvironment` (or implement your own) to override these paths. |
| 113 | + |
| 114 | +[`Environment`]: https://github.com/rust-lang/rust/blob/65e468f9c259749c210b1ae8972bfe14781f72f1/src/tools/opt-dist/src/environment/mod.rs#L8-L7 |
| 115 | +[Rust benchmark suite]: https://github.com/rust-lang/rustc-perf |
0 commit comments