Skip to content

Latest commit

 

History

History
450 lines (345 loc) · 21.6 KB

bootstrapping.md

File metadata and controls

450 lines (345 loc) · 21.6 KB

Bootstrapping the Compiler

Bootstrapping is the process of using a compiler to compile itself. More accurately, it means using an older compiler to compile a newer version of the same compiler.

This raises a chicken-and-egg paradox: where did the first compiler come from? It must have been written in a different language. In Rust's case it was written in OCaml. However it was abandoned long ago and the only way to build a modern version of rustc is a slightly less modern version.

This is exactly how x.py works: it downloads the current beta release of rustc, then uses it to compile the new compiler.

Stages of bootstrapping

Compiling rustc is done in stages. Here's a diagram, adapted from Joshua Nelson's talk on bootstrapping at RustConf 2022, with detailed explanations below.

The A, B, C, and D show the ordering of the stages of bootstrapping. Blue nodes are downloaded, yellow nodes are built with the stage0 compiler, and green nodes are built with the stage1 compiler.

graph TD
    s0c["stage0 compiler (1.63)"]:::downloaded -->|A| s0l("stage0 std (1.64)"):::with-s0c;
    s0c & s0l --- stepb[ ]:::empty;
    stepb -->|B| s0ca["stage0 compiler artifacts (1.64)"]:::with-s0c;
    s0ca -->|copy| s1c["stage1 compiler (1.64)"]:::with-s0c;
    s1c -->|C| s1l("stage1 std (1.64)"):::with-s1c;
    s1c & s1l --- stepd[ ]:::empty;
    stepd -->|D| s1ca["stage1 compiler artifacts (1.64)"]:::with-s1c;
    s1ca -->|copy| s2c["stage2 compiler"]:::with-s1c;

    classDef empty width:0px,height:0px;
    classDef downloaded fill: lightblue;
    classDef with-s0c fill: yellow;
    classDef with-s1c fill: lightgreen;
Loading

Stage 0

The stage0 compiler is usually the current beta rustc compiler and its associated dynamic libraries, which x.py will download for you. (You can also configure x.py to use something else.)

The stage0 compiler is then used only to compile src/bootstrap, std, and rustc. When compiling rustc, the stage0 compiler uses the freshly compiled std. There are two concepts at play here: a compiler (with its set of dependencies) and its 'target' or 'object' libraries (std and rustc). Both are staged, but in a staggered manner.

Stage 1

The rustc source code is then compiled with the stage0 compiler to produce the stage1 compiler.

Stage 2

We then rebuild our stage1 compiler with itself to produce the stage2 compiler.

In theory, the stage1 compiler is functionally identical to the stage2 compiler, but in practice there are subtle differences. In particular, the stage1 compiler itself was built by stage0 and hence not by the source in your working directory. This means that the ABI generated by the stage0 compiler may not match the ABI that would have been made by the stage1 compiler, which can cause problems for dynamic libraries, tests, and tools using rustc_private.

Note that the proc_macro crate avoids this issue with a C FFI layer called proc_macro::bridge, allowing it to be used with stage 1.

The stage2 compiler is the one distributed with rustup and all other install methods. However, it takes a very long time to build because one must first build the new compiler with an older compiler and then use that to build the new compiler with itself. For development, you usually only want the stage1 compiler, which you can build with ./x.py build library. See Building the Compiler.

Stage 3

Stage 3 is optional. To sanity check our new compiler, we can build the libraries with the stage2 compiler. The result ought to be identical to before, unless something has broken.

Building the stages

x.py tries to be helpful and pick the stage you most likely meant for each subcommand. These defaults are as follows:

  • check: --stage 0
  • doc: --stage 0
  • build: --stage 1
  • test: --stage 1
  • dist: --stage 2
  • install: --stage 2
  • bench: --stage 2

You can always override the stage by passing --stage N explicitly.

For more information about stages, see below.

Complications of bootstrapping

Since the build system uses the current beta compiler to build the stage-1 bootstrapping compiler, the compiler source code can't use some features until they reach beta (because otherwise the beta compiler doesn't support them). On the other hand, for compiler intrinsics and internal features, the features have to be used. Additionally, the compiler makes heavy use of nightly features (#![feature(...)]). How can we resolve this problem?

There are two methods used:

  1. The build system sets --cfg bootstrap when building with stage0, so we can use cfg(not(bootstrap)) to only use features when built with stage1. This is useful for e.g. features that were just stabilized, which require #![feature(...)] when built with stage0, but not for stage1.
  2. The build system sets RUSTC_BOOTSTRAP=1. This special variable means to break the stability guarantees of rust: Allow using #![feature(...)] with a compiler that's not nightly. This should never be used except when bootstrapping the compiler.

Contributing to bootstrap

When you use the bootstrap system, you'll call it through x.py. However, most of the code lives in src/bootstrap. bootstrap has a difficult problem: it is written in Rust, but yet it is run before the Rust compiler is built! To work around this, there are two components of bootstrap: the main one written in rust, and bootstrap.py. bootstrap.py is what gets run by x.py. It takes care of downloading the stage0 compiler, which will then build the bootstrap binary written in Rust.

Because there are two separate codebases behind x.py, they need to be kept in sync. In particular, both bootstrap.py and the bootstrap binary parse config.toml and read the same command line arguments. bootstrap.py keeps these in sync by setting various environment variables, and the programs sometimes have to add arguments that are explicitly ignored, to be read by the other.

Adding a setting to config.toml

This section is a work in progress. In the meantime, you can see an example contribution here.

Understanding stages of bootstrap

Overview

This is a detailed look into the separate bootstrap stages.

The convention x.py uses is that:

  • A --stage N flag means to run the stage N compiler (stageN/rustc).
  • A "stage N artifact" is a build artifact that is produced by the stage N compiler.
  • The stage N+1 compiler is assembled from stage N artifacts. This process is called uplifting.

Build artifacts

Anything you can build with x.py is a build artifact. Build artifacts include, but are not limited to:

  • binaries, like stage0-rustc/rustc-main
  • shared objects, like stage0-sysroot/rustlib/libstd-6fae108520cf72fe.so
  • rlib files, like stage0-sysroot/rustlib/libstd-6fae108520cf72fe.rlib
  • HTML files generated by rustdoc, like doc/std

Examples

  • ./x.py build --stage 0 means to build with the beta rustc.
  • ./x.py doc --stage 0 means to document using the beta rustdoc.
  • ./x.py test --stage 0 library/std means to run tests on the standard library without building rustc from source ('build with stage 0, then test the artifacts'). If you're working on the standard library, this is normally the test command you want.
  • ./x.py test src/test/ui means to build the stage 1 compiler and run compiletest on it. If you're working on the compiler, this is normally the test command you want.

Examples of what not to do

  • ./x.py test --stage 0 src/test/ui is not useful: it runs tests on the beta compiler and doesn't build rustc from source. Use test src/test/ui instead, which builds stage 1 from source.
  • ./x.py test --stage 0 compiler/rustc builds the compiler but runs no tests: it's running cargo test -p rustc, but cargo doesn't understand Rust's tests. You shouldn't need to use this, use test instead (without arguments).
  • ./x.py build --stage 0 compiler/rustc builds the compiler, but does not build libstd or even libcore. Most of the time, you'll want ./x.py build library instead, which allows compiling programs without needing to define lang items.

Building vs. running

Note that build --stage N compiler/rustc does not build the stage N compiler: instead it builds the stage N+1 compiler using the stage N compiler.

In short, stage 0 uses the stage0 compiler to create stage0 artifacts which will later be uplifted to be the stage1 compiler.

In each stage, two major steps are performed:

  1. std is compiled by the stage N compiler.
  2. That std is linked to programs built by the stage N compiler, including the stage N artifacts (stage N+1 compiler).

This is somewhat intuitive if one thinks of the stage N artifacts as "just" another program we are building with the stage N compiler: build --stage N compiler/rustc is linking the stage N artifacts to the std built by the stage N compiler.

Stages and std

Note that there are two std libraries in play here:

  1. The library linked to stageN/rustc, which was built by stage N-1 (stage N-1 std)
  2. The library used to compile programs with stageN/rustc, which was built by stage N (stage N std).

Stage N std is pretty much necessary for any useful work with the stage N compiler. Without it, you can only compile programs with #![no_core] -- not terribly useful!

The reason these need to be different is because they aren't necessarily ABI-compatible: there could be new layout optimizations, changes to MIR, or other changes to Rust metadata on nightly that aren't present in beta.

This is also where --keep-stage 1 library/std comes into play. Since most changes to the compiler don't actually change the ABI, once you've produced a std in stage 1, you can probably just reuse it with a different compiler. If the ABI hasn't changed, you're good to go, no need to spend time recompiling that std. --keep-stage simply assumes the previous compile is fine and copies those artifacts into the appropriate place, skipping the cargo invocation.

Cross-compiling rustc

Cross-compiling is the process of compiling code that will run on another architecture. For instance, you might want to build an ARM version of rustc using an x86 machine. Building stage2 std is different when you are cross-compiling.

This is because x.py uses a trick: if HOST and TARGET are the same, it will reuse stage1 std for stage2! This is sound because stage1 std was compiled with the stage1 compiler, i.e. a compiler using the source code you currently have checked out. So it should be identical (and therefore ABI-compatible) to the std that stage2/rustc would compile.

However, when cross-compiling, stage1 std will only run on the host. So the stage2 compiler has to recompile std for the target.

(See in the table how stage2 only builds non-host std targets).

Why does only libstd use cfg(bootstrap)?

The rustc generated by the stage0 compiler is linked to the freshly-built std, which means that for the most part only std needs to be cfg-gated, so that rustc can use features added to std immediately after their addition, without need for them to get into the downloaded beta.

Note this is different from any other Rust program: stage1 rustc is built by the beta compiler, but using the master version of libstd!

The only time rustc uses cfg(bootstrap) is when it adds internal lints that use diagnostic items. This happens very rarely.

What is a 'sysroot'?

When you build a project with cargo, the build artifacts for dependencies are normally stored in target/debug/deps. This only contains dependencies cargo knows about; in particular, it doesn't have the standard library. Where do std or proc_macro come from? It comes from the sysroot, the root of a number of directories where the compiler loads build artifacts at runtime. The sysroot doesn't just store the standard library, though - it includes anything that needs to be loaded at runtime. That includes (but is not limited to):

  • libstd/libtest/libproc_macro
  • The compiler crates themselves, when using rustc_private. In-tree these are always present; out of tree, you need to install rustc-dev with rustup.
  • libLLVM.so, the shared object file for the LLVM project. In-tree this is either built from source or downloaded from CI; out-of-tree, you need to install llvm-tools-preview with rustup.

All the artifacts listed so far are compiler runtime dependencies. You can see them with rustc --print sysroot:

$ ls $(rustc --print sysroot)/lib
libchalk_derive-0685d79833dc9b2b.so  libstd-25c6acf8063a3802.so
libLLVM-11-rust-1.50.0-nightly.so    libtest-57470d2aa8f7aa83.so
librustc_driver-4f0cc9f50e53f0ba.so  libtracing_attributes-e4be92c35ab2a33b.so
librustc_macros-5f0ec4a119c6ac86.so  rustlib

There are also runtime dependencies for the standard library! These are in lib/rustlib, not lib/ directly.

$ ls $(rustc --print sysroot)/lib/rustlib/x86_64-unknown-linux-gnu/lib | head -n 5
libaddr2line-6c8e02b8fedc1e5f.rlib
libadler-9ef2480568df55af.rlib
liballoc-9c4002b5f79ba0e1.rlib
libcfg_if-512eb53291f6de7e.rlib
libcompiler_builtins-ef2408da76957905.rlib

rustlib includes libraries like hashbrown and cfg_if, which are not part of the public API of the standard library, but are used to implement it. rustlib is part of the search path for linkers, but lib will never be part of the search path.

-Z force-unstable-if-unmarked

Since rustlib is part of the search path, it means we have to be careful about which crates are included in it. In particular, all crates except for the standard library are built with the flag -Z force-unstable-if-unmarked, which means that you have to use #![feature(rustc_private)] in order to load it (as opposed to the standard library, which is always available).

The -Z force-unstable-if-unmarked flag has a variety of purposes to help enforce that the correct crates are marked as unstable. It was introduced primarily to allow rustc and the standard library to link to arbitrary crates on crates.io which do not themselves use staged_api. rustc also relies on this flag to mark all of its crates as unstable with the rustc_private feature so that each crate does not need to be carefully marked with unstable.

This flag is automatically applied to all of rustc and the standard library by the bootstrap scripts. This is needed because the compiler and all of its dependencies are shipped in the sysroot to all users.

This flag has the following effects:

  • Marks the crate as "unstable" with the rustc_private feature if it is not itself marked as stable or unstable.
  • Allows these crates to access other forced-unstable crates without any need for attributes. Normally a crate would need a #![feature(rustc_private)] attribute to use other unstable crates. However, that would make it impossible for a crate from crates.io to access its own dependencies since that crate won't have a feature(rustc_private) attribute, but everything is compiled with -Z force-unstable-if-unmarked.

Code which does not use -Z force-unstable-if-unmarked should include the #![feature(rustc_private)] crate attribute to access these force-unstable crates. This is needed for things that link rustc, such as miri or clippy.

You can find more discussion about sysroots in:

Passing flags to commands invoked by bootstrap

x.py allows you to pass stage-specific flags to rustc and cargo when bootstrapping. The RUSTFLAGS_BOOTSTRAP environment variable is passed as RUSTFLAGS to the bootstrap stage (stage0), and RUSTFLAGS_NOT_BOOTSTRAP is passed when building artifacts for later stages. RUSTFLAGS will work, but also affects the build of bootstrap itself, so it will be rare to want to use it. Finally, MAGIC_EXTRA_RUSTFLAGS bypasses the cargo cache to pass flags to rustc without recompiling all dependencies.

RUSTDOCFLAGS, RUSTDOCFLAGS_BOOTSTRAP, and RUSTDOCFLAGS_NOT_BOOTSTRAP are anologous to RUSTFLAGS, but for rustdoc.

CARGOFLAGS will pass arguments to cargo itself (e.g. --timings). CARGOFLAGS_BOOTSTRAP and CARGOFLAGS_NOT_BOOTSTRAP work analogously to RUSTFLAGS_BOOTSTRAP.

--test-args will pass arguments through to the test runner. For src/test/ui, this is compiletest; for unit tests and doctests this is the libtest runner. Most test runner accept --help, which you can use to find out the options accepted by the runner.

Environment Variables

During bootstrapping, there are a bunch of compiler-internal environment variables that are used. If you are trying to run an intermediate version of rustc, sometimes you may need to set some of these environment variables manually. Otherwise, you get an error like the following:

thread 'main' panicked at 'RUSTC_STAGE was not set: NotPresent', library/core/src/result.rs:1165:5

If ./stageN/bin/rustc gives an error about environment variables, that usually means something is quite wrong -- or you're trying to compile e.g. rustc or std or something that depends on environment variables. In the unlikely case that you actually need to invoke rustc in such a situation, you can tell the bootstrap shim to print all env variables by adding -vvv to your x.py command.

Directories and artifacts generated by bootstrap

This is an incomplete reference for the outputs generated by bootstrap:

Stage 0 Action Output
beta extracted build/HOST/stage0
stage0 builds bootstrap build/bootstrap
stage0 builds test/std build/HOST/stage0-std/TARGET
copy stage0-std (HOST only) build/HOST/stage0-sysroot/lib/rustlib/HOST
stage0 builds rustc with stage0-sysroot build/HOST/stage0-rustc/HOST
copy stage0-rustc (except executable) build/HOST/stage0-sysroot/lib/rustlib/HOST
build llvm build/HOST/llvm
stage0 builds codegen with stage0-sysroot build/HOST/stage0-codegen/HOST
stage0 builds rustdoc, clippy, miri, with stage0-sysroot build/HOST/stage0-tools/HOST

--stage=0 stops here.

Stage 1 Action Output
copy (uplift) stage0-rustc executable to stage1 build/HOST/stage1/bin
copy (uplift) stage0-codegen to stage1 build/HOST/stage1/lib
copy (uplift) stage0-sysroot to stage1 build/HOST/stage1/lib
stage1 builds test/std build/HOST/stage1-std/TARGET
copy stage1-std (HOST only) build/HOST/stage1/lib/rustlib/HOST
stage1 builds rustc build/HOST/stage1-rustc/HOST
copy stage1-rustc (except executable) build/HOST/stage1/lib/rustlib/HOST
stage1 builds codegen build/HOST/stage1-codegen/HOST

--stage=1 stops here.

Stage 2 Action Output
copy (uplift) stage1-rustc executable build/HOST/stage2/bin
copy (uplift) stage1-sysroot build/HOST/stage2/lib and build/HOST/stage2/lib/rustlib/HOST
stage2 builds test/std (not HOST targets) build/HOST/stage2-std/TARGET
copy stage2-std (not HOST targets) build/HOST/stage2/lib/rustlib/TARGET
stage2 builds rustdoc, clippy, miri build/HOST/stage2-tools/HOST
copy rustdoc build/HOST/stage2/bin

--stage=2 stops here.