|
| 1 | +An informal guide to reading and working on the rustc compiler. |
| 2 | +================================================================== |
| 3 | + |
| 4 | +If you wish to expand on this document, or have a more experienced |
| 5 | +Rust contributor add anything else to it, please get in touch: |
| 6 | + |
| 7 | +* http://internals.rust-lang.org/ |
| 8 | +* https://chat.mibbit.com/?server=irc.mozilla.org&channel=%23rust |
| 9 | + |
| 10 | +or file a bug: |
| 11 | + |
| 12 | +https://github.com/rust-lang/rust/issues |
| 13 | + |
| 14 | +Your concerns are probably the same as someone else's. |
| 15 | + |
| 16 | +The crates of rustc |
| 17 | +=================== |
| 18 | + |
| 19 | +Rustc consists of a number of crates, including `libsyntax`, |
| 20 | +`librustc`, `librustc_back`, `librustc_trans`, and `librustc_driver` |
| 21 | +(the names and divisions are not set in stone and may change; |
| 22 | +in general, a finer-grained division of crates is preferable): |
| 23 | + |
| 24 | +- `libsyntax` contains those things concerned purely with syntax – |
| 25 | + that is, the AST, parser, pretty-printer, lexer, macro expander, and |
| 26 | + utilities for traversing ASTs – are in a separate crate called |
| 27 | + "syntax", whose files are in `./../libsyntax`, where `.` is the |
| 28 | + current directory (that is, the parent directory of front/, middle/, |
| 29 | + back/, and so on). |
| 30 | + |
| 31 | +- `librustc` (the current directory) contains the high-level analysis |
| 32 | + passes, such as the type checker, borrow checker, and so forth. |
| 33 | + It is the heart of the compiler. |
| 34 | + |
| 35 | +- `librustc_back` contains some very low-level details that are |
| 36 | + specific to different LLVM targets and so forth. |
| 37 | + |
| 38 | +- `librustc_trans` contains the code to convert from Rust IR into LLVM |
| 39 | + IR, and then from LLVM IR into machine code, as well as the main |
| 40 | + driver that orchestrates all the other passes and various other bits |
| 41 | + of miscellany. In general it contains code that runs towards the |
| 42 | + end of the compilation process. |
| 43 | + |
| 44 | +- `librustc_driver` invokes the compiler from `libsyntax`, then the |
| 45 | + analysis phases from `librustc`, and finally the lowering and |
| 46 | + codegen passes from `librustc_trans`. |
| 47 | + |
| 48 | +Roughly speaking the "order" of the three crates is as follows: |
| 49 | + |
| 50 | + libsyntax -> librustc -> librustc_trans |
| 51 | + | | |
| 52 | + +-----------------+-------------------+ |
| 53 | + | |
| 54 | + librustc_driver |
| 55 | + |
| 56 | + |
| 57 | +Modules in the rustc crate |
| 58 | +========================== |
| 59 | + |
| 60 | +The rustc crate itself consists of the following submodules |
| 61 | +(mostly, but not entirely, in their own directories): |
| 62 | + |
| 63 | +- session: options and data that pertain to the compilation session as |
| 64 | + a whole |
| 65 | +- middle: middle-end: name resolution, typechecking, LLVM code |
| 66 | + generation |
| 67 | +- metadata: encoder and decoder for data required by separate |
| 68 | + compilation |
| 69 | +- plugin: infrastructure for compiler plugins |
| 70 | +- lint: infrastructure for compiler warnings |
| 71 | +- util: ubiquitous types and helper functions |
| 72 | +- lib: bindings to LLVM |
| 73 | + |
| 74 | +The entry-point for the compiler is main() in the librustc_trans |
| 75 | +crate. |
| 76 | + |
| 77 | +The 3 central data structures: |
| 78 | +------------------------------ |
| 79 | + |
| 80 | +1. `./../libsyntax/ast.rs` defines the AST. The AST is treated as |
| 81 | + immutable after parsing, but it depends on mutable context data |
| 82 | + structures (mainly hash maps) to give it meaning. |
| 83 | + |
| 84 | + - Many – though not all – nodes within this data structure are |
| 85 | + wrapped in the type `spanned<T>`, meaning that the front-end has |
| 86 | + marked the input coordinates of that node. The member `node` is |
| 87 | + the data itself, the member `span` is the input location (file, |
| 88 | + line, column; both low and high). |
| 89 | + |
| 90 | + - Many other nodes within this data structure carry a |
| 91 | + `def_id`. These nodes represent the 'target' of some name |
| 92 | + reference elsewhere in the tree. When the AST is resolved, by |
| 93 | + `middle/resolve.rs`, all names wind up acquiring a def that they |
| 94 | + point to. So anything that can be pointed-to by a name winds |
| 95 | + up with a `def_id`. |
| 96 | + |
| 97 | +2. `middle/ty.rs` defines the datatype `sty`. This is the type that |
| 98 | + represents types after they have been resolved and normalized by |
| 99 | + the middle-end. The typeck phase converts every ast type to a |
| 100 | + `ty::sty`, and the latter is used to drive later phases of |
| 101 | + compilation. Most variants in the `ast::ty` tag have a |
| 102 | + corresponding variant in the `ty::sty` tag. |
| 103 | + |
| 104 | +3. `./../librustc_llvm/lib.rs` defines the exported types |
| 105 | + `ValueRef`, `TypeRef`, `BasicBlockRef`, and several others. |
| 106 | + Each of these is an opaque pointer to an LLVM type, |
| 107 | + manipulated through the `lib::llvm` interface. |
| 108 | + |
| 109 | + |
| 110 | +Control and information flow within the compiler: |
| 111 | +------------------------------------------------- |
| 112 | + |
| 113 | +- main() in lib.rs assumes control on startup. Options are |
| 114 | + parsed, platform is detected, etc. |
| 115 | + |
| 116 | +- `./../libsyntax/parse/parser.rs` parses the input files and produces |
| 117 | + an AST that represents the input crate. |
| 118 | + |
| 119 | +- Multiple middle-end passes (`middle/resolve.rs`, `middle/typeck.rs`) |
| 120 | + analyze the semantics of the resulting AST. Each pass generates new |
| 121 | + information about the AST and stores it in various environment data |
| 122 | + structures. The driver passes environments to each compiler pass |
| 123 | + that needs to refer to them. |
| 124 | + |
| 125 | +- Finally, the `trans` module in `librustc_trans` translates the Rust |
| 126 | + AST to LLVM bitcode in a type-directed way. When it's finished |
| 127 | + synthesizing LLVM values, rustc asks LLVM to write them out in some |
| 128 | + form (`.bc`, `.o`) and possibly run the system linker. |
0 commit comments