Skip to content

Commit c442e27

Browse files
committed
rework the MIR intro section, breaking out passes and visitors
1 parent 9a9c8c3 commit c442e27

9 files changed

+1121
-79
lines changed

src/SUMMARY.md

+4
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,15 @@
2424
- [Type checking](./type-checking.md)
2525
- [The MIR (Mid-level IR)](./mir.md)
2626
- [MIR construction](./mir-construction.md)
27+
- [MIR visitor](./mir-visitor.md)
28+
- [MIR passes: getting the MIR for a function](./mir-passes.md)
2729
- [MIR borrowck](./mir-borrowck.md)
30+
- [MIR-based region checking (NLL)](./mir-regionck.md)
2831
- [MIR optimizations](./mir-optimizations.md)
2932
- [Constant evaluation](./const-eval.md)
3033
- [miri const evaluator](./miri.md)
3134
- [Parameter Environments](./param_env.md)
3235
- [Generating LLVM IR](./trans.md)
36+
- [Background material](./background.md)
3337
- [Glossary](./glossary.md)
3438
- [Code Index](./code-index.md)

src/background.md

+122
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Background topics
2+
3+
This section covers a numbers of common compiler terms that arise in
4+
this guide. We try to give the general definition while providing some
5+
Rust-specific context.
6+
7+
<a name=cfg>
8+
9+
## What is a control-flow graph?
10+
11+
A control-flow graph is a common term from compilers. If you've ever
12+
used a flow-chart, then the concept of a control-flow graph will be
13+
pretty familiar to you. It's a representation of your program that
14+
exposes the underlying control flow in a very clear way.
15+
16+
A control-flow graph is structured as a set of **basic blocks**
17+
connected by edges. The key idea of a basic block is that it is a set
18+
of statements that execute "together" -- that is, whenever you branch
19+
to a basic block, you start at the first statement and then execute
20+
all the remainder. Only at the end of the is there the possibility of
21+
branching to more than one place (in MIR, we call that final statement
22+
the **terminator**):
23+
24+
```
25+
bb0: {
26+
statement0;
27+
statement1;
28+
statement2;
29+
...
30+
terminator;
31+
}
32+
```
33+
34+
Many expressions that you are used to in Rust compile down to multiple
35+
basic blocks. For example, consider an if statement:
36+
37+
```rust
38+
a = 1;
39+
if some_variable {
40+
b = 1;
41+
} else {
42+
c = 1;
43+
}
44+
d = 1;
45+
```
46+
47+
This would compile into four basic blocks:
48+
49+
```
50+
BB0: {
51+
a = 1;
52+
if some_variable { goto BB1 } else { goto BB2 }
53+
}
54+
55+
BB1: {
56+
b = 1;
57+
goto BB3;
58+
}
59+
60+
BB2: {
61+
c = 1;
62+
goto BB3;
63+
}
64+
65+
BB3: {
66+
d = 1;
67+
...;
68+
}
69+
```
70+
71+
When using a control-flow graph, a loop simply appears as a cycle in
72+
the graph, and the `break` keyword translates into a path out of that
73+
cycle.
74+
75+
<a name=dataflow>
76+
77+
## What is a dataflow analysis?
78+
79+
*to be written*
80+
81+
<a name=quantified>
82+
83+
## What is "universally quantified"? What about "existentially quantified"?
84+
85+
*to be written*
86+
87+
<a name=variance>
88+
89+
## What is co- and contra-variance?
90+
91+
*to be written*
92+
93+
<a name=free-vs-bound>
94+
95+
## What is a "free region" or a "free variable"? What about "bound region"?
96+
97+
Let's describe the concepts of free vs bound in terms of program
98+
variables, since that's the thing we're most familiar with.
99+
100+
- Consider this expression: `a + b`. In this expression, `a` and `b`
101+
refer to local variables that are defined *outside* of the
102+
expression. We say that those variables **appear free** in the
103+
expression. To see why this term makes sense, consider the next
104+
example.
105+
- In contrast, consider this expression, which creates a closure: `|a,
106+
b| a + b`. Here, the `a` and `b` in `a + b` refer to the arguments
107+
that the closure will be given when it is called. We say that the
108+
`a` and `b` there are **bound** to the closure, and that the closure
109+
signature `|a, b|` is a **binder** for the names `a` and `b`
110+
(because any references to `a` or `b` within refer to the variables
111+
that it introduces).
112+
113+
So there you have it: a variable "appears free" in some
114+
expression/statement/whatever if it refers to something defined
115+
outside of that expressions/statement/whatever. Equivalently, we can
116+
then refer to the "free variables" of an expression -- which is just
117+
the set of variables that "appear free".
118+
119+
So what does this have to do with regions? Well, we can apply the
120+
analogous concept to type and regions. For example, in the type `&'a
121+
u32`, `'a` appears free. But in the type `for<'a> fn(&'a u32)`, it
122+
does not.

src/glossary.md

+2
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,14 @@ HIR Map | The HIR map, accessible via tcx.hir, allows you to qu
1818
generics | the set of generic type parameters defined on a type or item
1919
ICE | internal compiler error. When the compiler crashes.
2020
ICH | incremental compilation hash. ICHs are used as fingerprints for things such as HIR and crate metadata, to check if changes have been made. This is useful in incremental compilation to see if part of a crate has changed and should be recompiled.
21+
inference variable | when doing type or region inference, an "inference variable" is a kind of special type/region that represents value you are trying to find. Think of `X` in algebra.
2122
infcx | the inference context (see `librustc/infer`)
2223
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans ([see more](./mir.html))
2324
miri | an interpreter for MIR used for constant evaluation ([see more](./miri.html))
2425
obligation | something that must be proven by the trait system ([see more](trait-resolution.html))
2526
local crate | the crate currently being compiled.
2627
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans ([see more](./mir.html))
28+
newtype | a "newtype" is a wrapper around some other type (e.g., `struct Foo(T)` is a "newtype" for `T`). This is commonly used in Rust to give a stronger type for indices.
2729
node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
2830
obligation | something that must be proven by the trait system ([see more](trait-resolution.html))
2931
provider | the function that executes a query ([see more](query.html))

src/mir-background.md

+122
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# MIR Background topics
2+
3+
This section covers a numbers of common compiler terms that arise when
4+
talking about MIR and optimizations. We try to give the general
5+
definition while providing some Rust-specific context.
6+
7+
<a name=cfg>
8+
9+
## What is a control-flow graph?
10+
11+
A control-flow graph is a common term from compilers. If you've ever
12+
used a flow-chart, then the concept of a control-flow graph will be
13+
pretty familiar to you. It's a representation of your program that
14+
exposes the underlying control flow in a very clear way.
15+
16+
A control-flow graph is structured as a set of **basic blocks**
17+
connected by edges. The key idea of a basic block is that it is a set
18+
of statements that execute "together" -- that is, whenever you branch
19+
to a basic block, you start at the first statement and then execute
20+
all the remainder. Only at the end of the is there the possibility of
21+
branching to more than one place (in MIR, we call that final statement
22+
the **terminator**):
23+
24+
```
25+
bb0: {
26+
statement0;
27+
statement1;
28+
statement2;
29+
...
30+
terminator;
31+
}
32+
```
33+
34+
Many expressions that you are used to in Rust compile down to multiple
35+
basic blocks. For example, consider an if statement:
36+
37+
```rust
38+
a = 1;
39+
if some_variable {
40+
b = 1;
41+
} else {
42+
c = 1;
43+
}
44+
d = 1;
45+
```
46+
47+
This would compile into four basic blocks:
48+
49+
```
50+
BB0: {
51+
a = 1;
52+
if some_variable { goto BB1 } else { goto BB2 }
53+
}
54+
55+
BB1: {
56+
b = 1;
57+
goto BB3;
58+
}
59+
60+
BB2: {
61+
c = 1;
62+
goto BB3;
63+
}
64+
65+
BB3: {
66+
d = 1;
67+
...;
68+
}
69+
```
70+
71+
When using a control-flow graph, a loop simply appears as a cycle in
72+
the graph, and the `break` keyword translates into a path out of that
73+
cycle.
74+
75+
<a name=dataflow>
76+
77+
## What is a dataflow analysis?
78+
79+
*to be written*
80+
81+
<a name=quantified>
82+
83+
## What is "universally quantified"? What about "existentially quantified"?
84+
85+
*to be written*
86+
87+
<a name=variance>
88+
89+
## What is co- and contra-variance?
90+
91+
*to be written*
92+
93+
<a name=free-vs-bound>
94+
95+
## What is a "free region" or a "free variable"? What about "bound region"?
96+
97+
Let's describe the concepts of free vs bound in terms of program
98+
variables, since that's the thing we're most familiar with.
99+
100+
- Consider this expression: `a + b`. In this expression, `a` and `b`
101+
refer to local variables that are defined *outside* of the
102+
expression. We say that those variables **appear free** in the
103+
expression. To see why this term makes sense, consider the next
104+
example.
105+
- In contrast, consider this expression, which creates a closure: `|a,
106+
b| a + b`. Here, the `a` and `b` in `a + b` refer to the arguments
107+
that the closure will be given when it is called. We say that the
108+
`a` and `b` there are **bound** to the closure, and that the closure
109+
signature `|a, b|` is a **binder** for the names `a` and `b`
110+
(because any references to `a` or `b` within refer to the variables
111+
that it introduces).
112+
113+
So there you have it: a variable "appears free" in some
114+
expression/statement/whatever if it refers to something defined
115+
outside of that expressions/statement/whatever. Equivalently, we can
116+
then refer to the "free variables" of an expression -- which is just
117+
the set of variables that "appear free".
118+
119+
So what does this have to do with regions? Well, we can apply the
120+
analogous concept to type and regions. For example, in the type `&'a
121+
u32`, `'a` appears free. But in the type `for<'a> fn(&'a u32)`, it
122+
does not.

src/mir-borrowck.md

+56-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,56 @@
1-
# MIR borrowck
1+
# MIR borrow check
2+
3+
The borrow check is Rust's "secret sauce" -- it is tasked with
4+
enforcing a number of properties:
5+
6+
- That all variables are initialized before they are used.
7+
- That you can't move the same value twice.
8+
- That you can't move a value while it is borrowed.
9+
- That you can't access a place while it is mutably borrowed (except through the reference).
10+
- That you can't mutate a place while it is shared borrowed.
11+
- etc
12+
13+
At the time of this writing, the code is in a state of transition. The
14+
"main" borrow checker still works by processing [the HIR](hir.html),
15+
but that is being phased out in favor of the MIR-based borrow checker.
16+
Doing borrow checking on MIR has two key advantages:
17+
18+
- The MIR is *far* less complex than the HIR; the radical desugaring
19+
helps prevent bugs in the borrow checker. (If you're curious, you
20+
can see
21+
[a list of bugs that the MIR-based borrow checker fixes here][47366].)
22+
- Even more importantly, using the MIR enables ["non-lexical lifetimes"][nll],
23+
which are regions derived from the control-flow graph.
24+
25+
[47366]: https://github.com/rust-lang/rust/issues/47366
26+
[nll]: http://rust-lang.github.io/rfcs/2094-nll.html
27+
28+
### Major phases of the borrow checker
29+
30+
The borrow checker source is found in
31+
[the `rustc_mir::borrow_check` module][b_c]. The main entry point is
32+
the `mir_borrowck` query. At the time of this writing, MIR borrowck can operate
33+
in several modes, but this text will describe only the mode when NLL is enabled
34+
(what you get with `#![feature(nll)]`).
35+
36+
[b_c]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check
37+
38+
The overall flow of the borrow checker is as follows:
39+
40+
- We first create a **local copy** C of the MIR. We will be modifying
41+
this copy in place to modify the types and things to include
42+
references to the new regions that we are computing.
43+
- We then invoke `nll::replace_regions_in_mir` to modify this copy C.
44+
Among other things, this function will replace all of the regions in
45+
the MIR with fresh [inference variables](glossary.html).
46+
- (More details can be found in [the regionck section](./mir-regionck.html).)
47+
- Next, we perform a number of [dataflow analyses](./background.html#dataflow)
48+
that compute what data is moved and when. The results of these analyses
49+
are needed to do both borrow checking and region inference.
50+
- Using the move data, we can then compute the values of all the regions in the MIR.
51+
- (More details can be found in [the NLL section](./mir-regionck.html).)
52+
- Finally, the borrow checker itself runs, taking as input (a) the
53+
results of move analysis and (b) the regions computed by the region
54+
checker. This allows is to figure out which loans are still in scope
55+
at any particular point.
56+

0 commit comments

Comments
 (0)