-
Notifications
You must be signed in to change notification settings - Fork 546
Add chapter on libs and metadata. #1044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,192 @@ | ||
# Libraries and Metadata | ||
|
||
When the compiler sees a reference to an external crate, it needs to load some | ||
information about that crate. This chapter gives an overview of that process, | ||
and the supported file formats for crate libraries. | ||
|
||
## Libraries | ||
|
||
A crate dependency can be loaded from an `rlib`, `dylib`, or `rmeta` file. A | ||
key point of these file formats is that they contain `rustc`-specific | ||
[*metadata*](#metadata). This metadata allows the compiler to discover enough | ||
information about the external crate to understand the items it contains, | ||
which macros it exports, and *much* more. | ||
|
||
### rlib | ||
|
||
An `rlib` is an [archive file], which is similar to a tar file. This file | ||
format is specific to `rustc`, and may change over time. This file contains: | ||
|
||
* Object code, which is the result of code generation. This is used during | ||
regular linking. There is a separate `.o` file for each [codegen unit]. The | ||
codegen step can be skipped with the [`-C | ||
linker-plugin-lto`][linker-plugin-lto] CLI option, which means each `.o` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does rustc reuse .o as the extension for bitcode files? I would expect .ll or something. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean? My understanding is this is talking about object code (not LLVM bitcode), so There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See the line below, this is reused for bitcode too: https://github.com/rust-lang/rustc-dev-guide/pull/1044/files#diff-3a3e84f49881b6db90d4538e80369c04098f7a6fe6d20cca0ccad9e46fead011R24 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, it confused me that your comment only extended until the line before :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the past, the bitcode was stored as a separate, compressed file in the archive (with the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Kind of like |
||
file will only contain LLVM bitcode. | ||
* [LLVM bitcode], which is a binary representation of LLVM's intermediate | ||
representation, which is embedded as a section in the `.o` files. This can | ||
be used for [Link Time Optimization] (LTO). This can be removed with the | ||
[`-C embed-bitcode=no`][embed-bitcode] CLI option to improve compile times | ||
and reduce disk space if LTO is not needed. | ||
* `rustc` [metadata], in a file named `lib.rmeta`. | ||
* A symbol table, which is generally a list of symbols with offsets to the | ||
object file that contain that symbol. This is pretty standard for archive | ||
files. | ||
|
||
[archive file]: https://en.wikipedia.org/wiki/Ar_(Unix) | ||
[LLVM bitcode]: https://llvm.org/docs/BitCodeFormat.html | ||
[Link Time Optimization]: https://llvm.org/docs/LinkTimeOptimization.html | ||
[codegen unit]: ../backend/codegen.md | ||
[embed-bitcode]: https://doc.rust-lang.org/rustc/codegen-options/index.html#embed-bitcode | ||
[linker-plugin-lto]: https://doc.rust-lang.org/rustc/codegen-options/index.html#linker-plugin-lto | ||
|
||
### dylib | ||
|
||
A `dylib` is a platform-specific shared library. It includes the `rustc` | ||
[metadata] in a special link section called `.rustc` in a compressed format. | ||
|
||
### rmeta | ||
|
||
An `rmeta` file is custom binary format that contains the [metadata] for the | ||
crate. This file can be used for fast "checks" of a project by skipping all | ||
code generation (as is done with `cargo check`), collecting enough information | ||
for documentation (as is done with `cargo doc`), or for | ||
[pipelining](#pipelining). This file is created if the | ||
[`--emit=metadata`][emit] CLI option is used. | ||
|
||
`rmeta` files do not support linking, since they do not contain compiled | ||
object files. | ||
|
||
[emit]: https://doc.rust-lang.org/rustc/command-line-arguments.html#option-emit | ||
|
||
## Metadata | ||
|
||
The metadata contains a wide swath of different elements. This guide will not | ||
go into detail of every field it contains. You are encouraged to browse the | ||
[`CrateRoot`] definition to get a sense of the different elements it contains. | ||
Everything about metadata encoding and decoding is in the [`rustc_metadata`] | ||
package. | ||
|
||
Here are a few highlights of things it contains: | ||
|
||
* The version of the `rustc` compiler. The compiler will refuse to load files | ||
from any other version. | ||
* The [Strict Version Hash](#strict-version-hash) (SVH). This helps ensure the | ||
correct dependency is loaded. | ||
* The [Crate Disambiguator](#crate-disambiguator). This is a hash used | ||
to disambiguate between different crates of the same name. | ||
* Information about all the source files in the library. This can be used for | ||
a variety of things, such as diagnostics pointing to sources in a | ||
dependency. | ||
* Information about exported macros, traits, types, and items. Generally, | ||
anything that's needed to be known when a path references something inside a | ||
crate dependency. | ||
* Encoded [MIR]. This is optional, and only encoded if needed for code | ||
generation. `cargo check` skips this for performance reasons. | ||
|
||
[`CrateRoot`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/struct.CrateRoot.html | ||
[`rustc_metadata`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/index.html | ||
[MIR]: ../mir/index.md | ||
|
||
### Strict Version Hash | ||
|
||
The Strict Version Hash ([SVH], also known as the "crate hash") is a 64-bit | ||
hash that is used to ensure that the correct crate dependencies are loaded. It | ||
is possible for a directory to contain multiple copies of the same dependency | ||
built with different settings, or built from different sources. The crate | ||
loader will skip any crates that have the wrong SVH. | ||
|
||
The SVH is also used for the [incremental compilation] session filename, | ||
though that usage is mostly historic. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's used instead? I'm not sure what you mean by historic here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the past, The incremental naming convention is described at https://doc.rust-lang.org/nightly/nightly-rustc/rustc_incremental/persist/fs/index.html. |
||
|
||
The hash includes a variety of elements: | ||
|
||
* Hashes of the HIR nodes. | ||
* All of the upstream crate hashes. | ||
* All of the source filenames. | ||
* Hashes of certain command-line flags (like `-C metadata` via the [Crate | ||
Disambiguator](#crate-disambiguator), and all CLI options marked with | ||
`[TRACKED]`). | ||
|
||
See [`finalize_and_compute_crate_hash`] for where the hash is actually | ||
computed. | ||
|
||
[SVH]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/svh/struct.Svh.html | ||
[incremental compilation]: ../queries/incremental-compilation.md | ||
[`finalize_and_compute_crate_hash`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/hir/map/collector/struct.NodeCollector.html#method.finalize_and_compute_crate_hash | ||
|
||
### Crate Disambiguator | ||
|
||
The [`CrateDisambiguator`] is a 128-bit hash used to distinguish between | ||
different crates of the same name. It is a hash of all the [`-C metadata`] CLI | ||
options computed in [`compute_crate_disambiguator`]. It is used in a variety | ||
of places, such as symbol name mangling, crate loading, and much more. | ||
|
||
By default, all Rust symbols are mangled and incorporate the disambiguator | ||
hash. This allows multiple versions of the same crate to be included together. | ||
Cargo automatically generates `-C metadata` hashes based on a variety of | ||
factors, like the package version, source, and the target kind (a lib and bin | ||
can have the same crate name, so they need to be disambiguated). | ||
|
||
[`CrateDisambiguator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/crate_disambiguator/struct.CrateDisambiguator.html | ||
[`compute_crate_disambiguator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/util/fn.compute_crate_disambiguator.html | ||
[`-C metadata`]: https://doc.rust-lang.org/rustc/codegen-options/index.html#metadata | ||
|
||
## Crate loading | ||
|
||
Crate loading can have quite a few subtle complexities. During [name | ||
resolution], when an external crate is referenced (via an `extern crate` or | ||
path), the resolver uses the [`CrateLoader`] which is responsible for finding | ||
the crate libraries and loading the [metadata] for them. After the dependency | ||
is loaded, the `CrateLoader` will provide the information the resolver needs | ||
to perform its job (such as expanding macros, resolving paths, etc.). | ||
|
||
To load each external crate, the `CrateLoader` uses a [`CrateLocator`] to | ||
actually find the correct files for one specific crate. There is some great | ||
documentation in the [`locator`] module that goes into detail on how loading | ||
works, and I strongly suggest reading it to get the full picture. | ||
|
||
The location of a dependency can come from several different places. Direct | ||
dependencies are usually passed with `--extern` flags, and the loader can look | ||
at those directly. Direct dependencies often have references to their own | ||
dependencies, which need to be loaded, too. These are usually found by | ||
scanning the directories passed with the `-L` flag for any file whose metadata | ||
contains a matching crate name and [SVH](#strict-version-hash). The loader | ||
will also look at the [sysroot] to find dependencies. | ||
|
||
As crates are loaded, they are kept in the [`CStore`] with the crate metadata | ||
wrapped in the [`CrateMetadata`] struct. After resolution and expansion, the | ||
`CStore` will make its way into the [`GlobalCtxt`] for the rest of | ||
compilation. | ||
|
||
[name resolution]: ../name-resolution.md | ||
[`CrateLoader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CrateLoader.html | ||
[`CrateLocator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/locator/struct.CrateLocator.html | ||
[`locator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/locator/index.html | ||
[`CStore`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CStore.html | ||
[`CrateMetadata`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/decoder/struct.CrateMetadata.html | ||
[`GlobalCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.GlobalCtxt.html | ||
[sysroot]: ../building/bootstrapping.md#what-is-a-sysroot | ||
|
||
## Pipelining | ||
|
||
One trick to improve compile times is to start building a crate as soon as the | ||
metadata for its dependencies is available. For a library, there is no need to | ||
wait for the code generation of dependencies to finish. Cargo implements this | ||
technique by telling `rustc` to emit an [`rmeta`](#rmeta) file for each | ||
dependency as well as an [`rlib`](#rlib). As early as it can, `rustc` will | ||
save the `rmeta` file to disk before it continues to the code generation | ||
phase. The compiler sends a JSON message to let the build tool know that it | ||
can start building the next crate if possible. | ||
|
||
The [crate loading](#crate-loading) system is smart enough to know when it | ||
sees an `rmeta` file to use that if the `rlib` is not there (or has only been | ||
partially written). | ||
|
||
This pipelining isn't possible for binaries, because the linking phase will | ||
require the code generation of all its dependencies. In the future, it may be | ||
possible to further improve this scenario by splitting linking into a separate | ||
command (see [#64191]). | ||
|
||
[#64191]: https://github.com/rust-lang/rust/issues/64191 | ||
|
||
[metadata]: #metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps instead put this under "High-level compiler architecture"? That's where queries and serialization are discussed, which are somewhat related.