-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rework the README.md for rustc and add other readmes #44505
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
44e45d9
rework the README.md for rustc and add other readmes
nikomatsakis 73a4e8d
apply various nits
nikomatsakis 76eac36
promote maps into its own directory
nikomatsakis 70db841
split maps into submodules, document
nikomatsakis f130e7d
revamp the Compiler Process section to be more up to date
nikomatsakis 032fdef
define span
nikomatsakis 38813cf
start writing some typeck docs (incomplete)
nikomatsakis 638958b
incorporate suggestions from arielb1
nikomatsakis File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
# Introduction to the HIR | ||
|
||
The HIR -- "High-level IR" -- is the primary IR used in most of | ||
rustc. It is a desugared version of the "abstract syntax tree" (AST) | ||
that is generated after parsing, macro expansion, and name resolution | ||
have completed. Many parts of HIR resemble Rust surface syntax quite | ||
closely, with the exception that some of Rust's expression forms have | ||
been desugared away (as an example, `for` loops are converted into a | ||
`loop` and do not appear in the HIR). | ||
|
||
This README covers the main concepts of the HIR. | ||
|
||
### Out-of-band storage and the `Crate` type | ||
|
||
The top-level data-structure in the HIR is the `Crate`, which stores | ||
the contents of the crate currently being compiled (we only ever | ||
construct HIR for the current crate). Whereas in the AST the crate | ||
data structure basically just contains the root module, the HIR | ||
`Crate` structure contains a number of maps and other things that | ||
serve to organize the content of the crate for easier access. | ||
|
||
For example, the contents of individual items (e.g., modules, | ||
functions, traits, impls, etc) in the HIR are not immediately | ||
accessible in the parents. So, for example, if had a module item `foo` | ||
containing a function `bar()`: | ||
|
||
``` | ||
mod foo { | ||
fn bar() { } | ||
} | ||
``` | ||
|
||
Then in the HIR the representation of module `foo` (the `Mod` | ||
stuct) would have only the **`ItemId`** `I` of `bar()`. To get the | ||
details of the function `bar()`, we would lookup `I` in the | ||
`items` map. | ||
|
||
One nice result from this representation is that one can iterate | ||
over all items in the crate by iterating over the key-value pairs | ||
in these maps (without the need to trawl through the IR in total). | ||
There are similar maps for things like trait items and impl items, | ||
as well as "bodies" (explained below). | ||
|
||
The other reason to setup the representation this way is for better | ||
integration with incremental compilation. This way, if you gain access | ||
to a `&hir::Item` (e.g. for the mod `foo`), you do not immediately | ||
gain access to the contents of the function `bar()`. Instead, you only | ||
gain access to the **id** for `bar()`, and you must invoke some | ||
function to lookup the contents of `bar()` given its id; this gives us | ||
a chance to observe that you accessed the data for `bar()` and record | ||
the dependency. | ||
|
||
### Identifiers in the HIR | ||
|
||
Most of the code that has to deal with things in HIR tends not to | ||
carry around references into the HIR, but rather to carry around | ||
*identifier numbers* (or just "ids"). Right now, you will find four | ||
sorts of identifiers in active use: | ||
|
||
- `DefId`, which primarily name "definitions" or top-level items. | ||
- You can think of a `DefId` as being shorthand for a very explicit | ||
and complete path, like `std::collections::HashMap`. However, | ||
these paths are able to name things that are not nameable in | ||
normal Rust (e.g., impls), and they also include extra information | ||
about the crate (such as its version number, as two versions of | ||
the same crate can co-exist). | ||
- A `DefId` really consists of two parts, a `CrateNum` (which | ||
identifies the crate) and a `DefIndex` (which indixes into a list | ||
of items that is maintained per crate). | ||
- `HirId`, which combines the index of a particular item with an | ||
offset within that item. | ||
- the key point of a `HirId` is that it is *relative* to some item (which is named | ||
via a `DefId`). | ||
- `BodyId`, this is an absolute identifier that refers to a specific | ||
body (definition of a function or constant) in the crate. It is currently | ||
effectively a "newtype'd" `NodeId`. | ||
- `NodeId`, which is an absolute id that identifies a single node in the HIR tree. | ||
- While these are still in common use, **they are being slowly phased out**. | ||
- Since they are absolute within the crate, adding a new node | ||
anywhere in the tree causes the node-ids of all subsequent code in | ||
the crate to change. This is terrible for incremental compilation, | ||
as you can perhaps imagine. | ||
|
||
### HIR Map | ||
|
||
Most of the time when you are working with the HIR, you will do so via | ||
the **HIR Map**, accessible in the tcx via `tcx.hir` (and defined in | ||
the `hir::map` module). The HIR map contains a number of methods to | ||
convert between ids of various kinds and to lookup data associated | ||
with a HIR node. | ||
|
||
For example, if you have a `DefId`, and you would like to convert it | ||
to a `NodeId`, you can use `tcx.hir.as_local_node_id(def_id)`. This | ||
returns an `Option<NodeId>` -- this will be `None` if the def-id | ||
refers to something outside of the current crate (since then it has no | ||
HIR node), but otherwise returns `Some(n)` where `n` is the node-id of | ||
the definition. | ||
|
||
Similarly, you can use `tcx.hir.find(n)` to lookup the node for a | ||
`NodeId`. This returns a `Option<Node<'tcx>>`, where `Node` is an enum | ||
defined in the map; by matching on this you can find out what sort of | ||
node the node-id referred to and also get a pointer to the data | ||
itself. Often, you know what sort of node `n` is -- e.g., if you know | ||
that `n` must be some HIR expression, you can do | ||
`tcx.hir.expect_expr(n)`, which will extract and return the | ||
`&hir::Expr`, panicking if `n` is not in fact an expression. | ||
|
||
Finally, you can use the HIR map to find the parents of nodes, via | ||
calls like `tcx.hir.get_parent_node(n)`. | ||
|
||
### HIR Bodies | ||
|
||
A **body** represents some kind of executable code, such as the body | ||
of a function/closure or the definition of a constant. Bodies are | ||
associated with an **owner**, which is typically some kind of item | ||
(e.g., a `fn()` or `const`), but could also be a closure expression | ||
(e.g., `|x, y| x + y`). You can use the HIR map to find find the body | ||
associated with a given def-id (`maybe_body_owned_by()`) or to find | ||
the owner of a body (`body_owner_def_id()`). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
The HIR map, accessible via `tcx.hir`, allows you to quickly navigate the | ||
HIR and convert between various forms of identifiers. See [the HIR README] for more information. | ||
|
||
[the HIR README]: ../README.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -413,6 +413,10 @@ pub struct WhereEqPredicate { | |
|
||
pub type CrateConfig = HirVec<P<MetaItem>>; | ||
|
||
/// The top-level data structure that stores the entire contents of | ||
/// the crate currently being compiled. | ||
/// | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this newline serve a purpose? |
||
/// For more details, see [the module-level README](README.md). | ||
#[derive(Clone, PartialEq, Eq, RustcEncodable, RustcDecodable, Debug)] | ||
pub struct Crate { | ||
pub module: Mod, | ||
|
@@ -927,7 +931,27 @@ pub struct BodyId { | |
pub node_id: NodeId, | ||
} | ||
|
||
/// The body of a function or constant value. | ||
/// The body of a function, closure, or constant value. In the case of | ||
/// a function, the body contains not only the function body itself | ||
/// (which is an expression), but also the argument patterns, since | ||
/// those are something that the caller doesn't really care about. | ||
/// | ||
/// # Examples | ||
/// | ||
/// ``` | ||
/// fn foo((x, y): (u32, u32)) -> u32 { | ||
/// x + y | ||
/// } | ||
/// ``` | ||
/// | ||
/// Here, the `Body` associated with `foo()` would contain: | ||
/// | ||
/// - an `arguments` array containing the `(x, y)` pattern | ||
/// - a `value` containing the `x + y` expression (maybe wrapped in a block) | ||
/// - `is_generator` would be false | ||
/// | ||
/// All bodies have an **owner**, which can be accessed via the HIR | ||
/// map using `body_owner_def_id()`. | ||
#[derive(Clone, PartialEq, Eq, RustcEncodable, RustcDecodable, Hash, Debug)] | ||
pub struct Body { | ||
pub arguments: HirVec<Arg>, | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,165 @@ | ||
# Types and the Type Context | ||
|
||
The `ty` module defines how the Rust compiler represents types | ||
internally. It also defines the *typing context* (`tcx` or `TyCtxt`), | ||
which is the central data structure in the compiler. | ||
|
||
## The tcx and how it uses lifetimes | ||
|
||
The `tcx` ("typing context") is the central data structure in the | ||
compiler. It is the context that you use to perform all manner of | ||
queries. The struct `TyCtxt` defines a reference to this shared context: | ||
|
||
```rust | ||
tcx: TyCtxt<'a, 'gcx, 'tcx> | ||
// -- ---- ---- | ||
// | | | | ||
// | | innermost arena lifetime (if any) | ||
// | "global arena" lifetime | ||
// lifetime of this reference | ||
``` | ||
|
||
As you can see, the `TyCtxt` type takes three lifetime parameters. | ||
These lifetimes are perhaps the most complex thing to understand about | ||
the tcx. During Rust compilation, we allocate most of our memory in | ||
**arenas**, which are basically pools of memory that get freed all at | ||
once. When you see a reference with a lifetime like `'tcx` or `'gcx`, | ||
you know that it refers to arena-allocated data (or data that lives as | ||
long as the arenas, anyhow). | ||
|
||
We use two distinct levels of arenas. The outer level is the "global | ||
arena". This arena lasts for the entire compilation: so anything you | ||
allocate in there is only freed once compilation is basically over | ||
(actually, when we shift to executing LLVM). | ||
|
||
To reduce peak memory usage, when we do type inference, we also use an | ||
inner level of arena. These arenas get thrown away once type inference | ||
is over. This is done because type inference generates a lot of | ||
"throw-away" types that are not particularly interesting after type | ||
inference completes, so keeping around those allocations would be | ||
wasteful. | ||
|
||
Often, we wish to write code that explicitly asserts that it is not | ||
taking place during inference. In that case, there is no "local" | ||
arena, and all the types that you can access are allocated in the | ||
global arena. To express this, the idea is to us the same lifetime | ||
for the `'gcx` and `'tcx` parameters of `TyCtxt`. Just to be a touch | ||
confusing, we tend to use the name `'tcx` in such contexts. Here is an | ||
example: | ||
|
||
```rust | ||
fn not_in_inference<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>, def_id: DefId) { | ||
// ---- ---- | ||
// Using the same lifetime here asserts | ||
// that the innermost arena accessible through | ||
// this reference *is* the global arena. | ||
} | ||
``` | ||
|
||
In contrast, if we want to code that can be usable during type inference, then you | ||
need to declare a distinct `'gcx` and `'tcx` lifetime parameter: | ||
|
||
```rust | ||
fn maybe_in_inference<'a, 'gcx, 'tcx>(tcx: TyCtxt<'a, 'gcx, 'tcx>, def_id: DefId) { | ||
// ---- ---- | ||
// Using different lifetimes here means that | ||
// the innermost arena *may* be distinct | ||
// from the global arena (but doesn't have to be). | ||
} | ||
``` | ||
|
||
### Allocating and working with types | ||
|
||
Rust types are represented using the `Ty<'tcx>` defined in the `ty` | ||
module (not to be confused with the `Ty` struct from [the HIR]). This | ||
is in fact a simple type alias for a reference with `'tcx` lifetime: | ||
|
||
```rust | ||
pub type Ty<'tcx> = &'tcx TyS<'tcx>; | ||
``` | ||
|
||
[the HIR]: ../hir/README.md | ||
|
||
You can basically ignore the `TyS` struct -- you will basically never | ||
access it explicitly. We always pass it by reference using the | ||
`Ty<'tcx>` alias -- the only exception I think is to define inherent | ||
methods on types. Instances of `TyS` are only ever allocated in one of | ||
the rustc arenas (never e.g. on the stack). | ||
|
||
One common operation on types is to **match** and see what kinds of | ||
types they are. This is done by doing `match ty.sty`, sort of like this: | ||
|
||
```rust | ||
fn test_type<'tcx>(ty: Ty<'tcx>) { | ||
match ty.sty { | ||
ty::TyArray(elem_ty, len) => { ... } | ||
... | ||
} | ||
} | ||
``` | ||
|
||
The `sty` field (the origin of this name is unclear to me; perhaps | ||
structural type?) is of type `TypeVariants<'tcx>`, which is an enum | ||
definined all of the different kinds of types in the compiler. | ||
|
||
> NB: inspecting the `sty` field on types during type inference can be | ||
> risky, as there are may be inference variables and other things to | ||
> consider, or sometimes types are not yet known that will become | ||
> known later.). | ||
|
||
To allocate a new type, you can use the various `mk_` methods defined | ||
on the `tcx`. These have names that correpond mostly to the various kinds | ||
of type variants. For example: | ||
|
||
```rust | ||
let array_ty = tcx.mk_array(elem_ty, len * 2); | ||
``` | ||
|
||
These methods all return a `Ty<'tcx>` -- note that the lifetime you | ||
get back is the lifetime of the innermost arena that this `tcx` has | ||
access to. In fact, types are always canonicalized and interned (so we | ||
never allocate exactly the same type twice) and are always allocated | ||
in the outermost arena where they can be (so, if they do not contain | ||
any inference variables or other "temporary" types, they will be | ||
allocated in the global arena). However, the lifetime `'tcx` is always | ||
a safe approximation, so that is what you get back. | ||
|
||
> NB. Because types are interned, it is possible to compare them for | ||
> equality efficiently using `==` -- however, this is almost never what | ||
> you want to do unless you happen to be hashing and looking for | ||
> duplicates. This is because often in Rust there are multiple ways to | ||
> represent the same type, particularly once inference is involved. If | ||
> you are going to be testing for type equality, you probably need to | ||
> start looking into the inference code to do it right. | ||
|
||
You can also find various common types in the tcx itself by accessing | ||
`tcx.types.bool`, `tcx.types.char`, etc (see `CommonTypes` for more). | ||
|
||
### Beyond types: Other kinds of arena-allocated data structures | ||
|
||
In addition to types, there are a number of other arena-allocated data | ||
structures that you can allocate, and which are found in this | ||
module. Here are a few examples: | ||
|
||
- `Substs`, allocated with `mk_substs` -- this will intern a slice of types, often used to | ||
specify the values to be substituted for generics (e.g., `HashMap<i32, u32>` | ||
would be represented as a slice `&'tcx [tcx.types.i32, tcx.types.u32]`. | ||
- `TraitRef`, typically passed by value -- a **trait reference** | ||
consists of a reference to a trait along with its various type | ||
parameters (including `Self`), like `i32: Display` (here, the def-id | ||
would reference the `Display` trait, and the substs would contain | ||
`i32`). | ||
- `Predicate` defines something the trait system has to prove (see `traits` module). | ||
|
||
### Import conventions | ||
|
||
Although there is no hard and fast rule, the `ty` module tends to be used like so: | ||
|
||
```rust | ||
use ty::{self, Ty, TyCtxt}; | ||
``` | ||
|
||
In particular, since they are so common, the `Ty` and `TyCtxt` types | ||
are imported directly. Other types are often referenced with an | ||
explicit `ty::` prefix (e.g., `ty::TraitRef<'tcx>`). But some modules | ||
choose to import a larger or smaller set of names explicitly. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: find find