|
| 1 | +# Backend Agnostic Codegen |
| 2 | + |
| 3 | +In the future, it would be nice to allow other codegen backends (e.g. |
| 4 | +[Cranelift][cranelift]). To this end, `librustc_codegen_ssa` provides an |
| 5 | +abstract interface for all backends to implenent. |
| 6 | + |
| 7 | +> The following is a copy/paste of a README from the rust-lang/rust repo. |
| 8 | +> Please submit a PR if it needs updating. |
| 9 | +
|
| 10 | +# Refactoring of `rustc_codegen_llvm` |
| 11 | +by Denis Merigoux, October 23rd 2018 |
| 12 | + |
| 13 | +## State of the code before the refactoring |
| 14 | + |
| 15 | +All the code related to the compilation of MIR into LLVM IR was contained |
| 16 | +inside the `rustc_codegen_llvm` crate. Here is the breakdown of the most |
| 17 | +important elements: |
| 18 | +* the `back` folder (7,800 LOC) implements the mechanisms for creating the |
| 19 | + different object files and archive through LLVM, but also the communication |
| 20 | + mechanisms for parallel code generation; |
| 21 | +* the `debuginfo` (3,200 LOC) folder contains all code that passes debug |
| 22 | + information down to LLVM; |
| 23 | +* the `llvm` (2,200 LOC) folder defines the FFI necessary to communicate with |
| 24 | + LLVM using the C++ API; |
| 25 | +* the `mir` (4,300 LOC) folder implements the actual lowering from MIR to LLVM |
| 26 | + IR; |
| 27 | +* the `base.rs` (1,300 LOC) file contains some helper functions but also the |
| 28 | + high-level code that launches the code generation and distributes the work. |
| 29 | +* the `builder.rs` (1,200 LOC) file contains all the functions generating |
| 30 | + individual LLVM IR instructions inside a basic block; |
| 31 | +* the `common.rs` (450 LOC) contains various helper functions and all the |
| 32 | + functions generating LLVM static values; |
| 33 | +* the `type_.rs` (300 LOC) defines most of the type translations to LLVM IR. |
| 34 | + |
| 35 | +The goal of this refactoring is to separate inside this crate code that is |
| 36 | +specific to the LLVM from code that can be reused for other rustc backends. For |
| 37 | +instance, the `mir` folder is almost entirely backend-specific but it relies |
| 38 | +heavily on other parts of the crate. The separation of the code must not affect |
| 39 | +the logic of the code nor its performance. |
| 40 | + |
| 41 | +For these reasons, the separation process involves two transformations that |
| 42 | +have to be done at the same time for the resulting code to compile : |
| 43 | + |
| 44 | +1. replace all the LLVM-specific types by generics inside function signatures |
| 45 | + and structure definitions; |
| 46 | +2. encapsulate all functions calling the LLVM FFI inside a set of traits that |
| 47 | + will define the interface between backend-agnostic code and the backend. |
| 48 | + |
| 49 | +While the LLVM-specific code will be left in `rustc_codegen_llvm`, all the new |
| 50 | +traits and backend-agnostic code will be moved in `rustc_codegen_ssa` (name |
| 51 | +suggestion by @eddyb). |
| 52 | + |
| 53 | +## Generic types and structures |
| 54 | + |
| 55 | +@irinagpopa started to parametrize the types of `rustc_codegen_llvm` by a |
| 56 | +generic `Value` type, implemented in LLVM by a reference `&'ll Value`. This |
| 57 | +work has been extended to all structures inside the `mir` folder and elsewhere, |
| 58 | +as well as for LLVM's `BasicBlock` and `Type` types. |
| 59 | + |
| 60 | +The two most important structures for the LLVM codegen are `CodegenCx` and |
| 61 | +`Builder`. They are parametrized by multiple lifetime parameters and the type |
| 62 | +for `Value`. |
| 63 | + |
| 64 | +```rust,ignore |
| 65 | +struct CodegenCx<'ll, 'tcx> { |
| 66 | + /* ... */ |
| 67 | +} |
| 68 | +
|
| 69 | +struct Builder<'a, 'll, 'tcx> { |
| 70 | + cx: &'a CodegenCx<'ll, 'tcx>, |
| 71 | + /* ... */ |
| 72 | +} |
| 73 | +``` |
| 74 | + |
| 75 | +`CodegenCx` is used to compile one codegen-unit that can contain multiple |
| 76 | +functions, whereas `Builder` is created to compile one basic block. |
| 77 | + |
| 78 | +The code in `rustc_codegen_llvm` has to deal with multiple explicit lifetime |
| 79 | +parameters, that correspond to the following: |
| 80 | +* `'tcx` is the longest lifetime, that corresponds to the original `TyCtxt` |
| 81 | + containing the program's information; |
| 82 | +* `'a` is a short-lived reference of a `CodegenCx` or another object inside a |
| 83 | + struct; |
| 84 | +* `'ll` is the lifetime of references to LLVM objects such as `Value` or |
| 85 | + `Type`. |
| 86 | + |
| 87 | +Although there are already many lifetime parameters in the code, making it |
| 88 | +generic uncovered situations where the borrow-checker was passing only due to |
| 89 | +the special nature of the LLVM objects manipulated (they are extern pointers). |
| 90 | +For instance, a additional lifetime parameter had to be added to |
| 91 | +`LocalAnalyser` in `analyse.rs`, leading to the definition: |
| 92 | + |
| 93 | +```rust,ignore |
| 94 | +struct LocalAnalyzer<'mir, 'a, 'tcx> { |
| 95 | + /* ... */ |
| 96 | +} |
| 97 | +``` |
| 98 | + |
| 99 | +However, the two most important structures `CodegenCx` and `Builder` are not |
| 100 | +defined in the backend-agnostic code. Indeed, their content is highly specific |
| 101 | +of the backend and it makes more sense to leave their definition to the backend |
| 102 | +implementor than to allow just a narrow spot via a generic field for the |
| 103 | +backend's context. |
| 104 | + |
| 105 | +## Traits and interface |
| 106 | + |
| 107 | +Because they have to be defined by the backend, `CodegenCx` and `Builder` will |
| 108 | +be the structures implementing all the traits defining the backend's interface. |
| 109 | +These traits are defined in the folder `rustc_codegen_ssa/traits` and all the |
| 110 | +backend-agnostic code is parametrized by them. For instance, let us explain how |
| 111 | +a function in `base.rs` is parametrized: |
| 112 | + |
| 113 | +```rust,ignore |
| 114 | +pub fn codegen_instance<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>( |
| 115 | + cx: &'a Bx::CodegenCx, |
| 116 | + instance: Instance<'tcx> |
| 117 | +) { |
| 118 | + /* ... */ |
| 119 | +} |
| 120 | +``` |
| 121 | + |
| 122 | +In this signature, we have the two lifetime parameters explained earlier and |
| 123 | +the master type `Bx` which satisfies the trait `BuilderMethods` corresponding |
| 124 | +to the interface satisfied by the `Builder` struct. The `BuilderMethods` |
| 125 | +defines an associated type `Bx::CodegenCx` that itself satisfies the |
| 126 | +`CodegenMethods` traits implemented by the struct `CodegenCx`. |
| 127 | + |
| 128 | +On the trait side, here is an example with part of the definition of |
| 129 | +`BuilderMethods` in `traits/builder.rs`: |
| 130 | + |
| 131 | +```rust,ignore |
| 132 | +pub trait BuilderMethods<'a, 'tcx>: |
| 133 | + HasCodegen<'tcx> |
| 134 | + + DebugInfoBuilderMethods<'tcx> |
| 135 | + + ArgTypeMethods<'tcx> |
| 136 | + + AbiBuilderMethods<'tcx> |
| 137 | + + IntrinsicCallMethods<'tcx> |
| 138 | + + AsmBuilderMethods<'tcx> |
| 139 | +{ |
| 140 | + fn new_block<'b>( |
| 141 | + cx: &'a Self::CodegenCx, |
| 142 | + llfn: Self::Function, |
| 143 | + name: &'b str |
| 144 | + ) -> Self; |
| 145 | + /* ... */ |
| 146 | + fn cond_br( |
| 147 | + &mut self, |
| 148 | + cond: Self::Value, |
| 149 | + then_llbb: Self::BasicBlock, |
| 150 | + else_llbb: Self::BasicBlock, |
| 151 | + ); |
| 152 | + /* ... */ |
| 153 | +} |
| 154 | +``` |
| 155 | + |
| 156 | +Finally, a master structure implementing the `ExtraBackendMethods` trait is |
| 157 | +used for high-level codegen-driving functions like `codegen_crate` in |
| 158 | +`base.rs`. For LLVM, it is the empty `LlvmCodegenBackend`. |
| 159 | +`ExtraBackendMethods` should be implemented by the same structure that |
| 160 | +implements the `CodegenBackend` defined in |
| 161 | +`rustc_codegen_utils/codegen_backend.rs`. |
| 162 | + |
| 163 | +During the traitification process, certain functions have been converted from |
| 164 | +methods of a local structure to methods of `CodegenCx` or `Builder` and a |
| 165 | +corresponding `self` parameter has been added. Indeed, LLVM stores information |
| 166 | +internally that it can access when called through its API. This information |
| 167 | +does not show up in a Rust data structure carried around when these methods are |
| 168 | +called. However, when implementing a Rust backend for `rustc`, these methods |
| 169 | +will need information from `CodegenCx`, hence the additional parameter (unused |
| 170 | +in the LLVM implementation of the trait). |
| 171 | + |
| 172 | +## State of the code after the refactoring |
| 173 | + |
| 174 | +The traits offer an API which is very similar to the API of LLVM. This is not |
| 175 | +the best solution since LLVM has a very special way of doing things: when |
| 176 | +addding another backend, the traits definition might be changed in order to |
| 177 | +offer more flexibility. |
| 178 | + |
| 179 | +However, the current separation between backend-agnostic and LLVM-specific code |
| 180 | +has allows the reuse of a significant part of the old `rustc_codegen_llvm`. |
| 181 | +Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the |
| 182 | +most important elements: |
| 183 | + |
| 184 | +* `back` folder: 3,800 (BA) vs 4,100 (LLVM); |
| 185 | +* `mir` folder: 4,400 (BA) vs 0 (LLVM); |
| 186 | +* `base.rs`: 1,100 (BA) vs 250 (LLVM); |
| 187 | +* `builder.rs`: 1,400 (BA) vs 0 (LLVM); |
| 188 | +* `common.rs`: 350 (BA) vs 350 (LLVM); |
| 189 | + |
| 190 | +The `debuginfo` folder has been left almost untouched by the splitting and is |
| 191 | +specific to LLVM. Only its high-level features have been traitified. |
| 192 | + |
| 193 | +The new `traits` folder has 1500 LOC only for trait definitions. Overall, the |
| 194 | +27,000 LOC-sized old `rustc_codegen_llvm` code has been split into the new |
| 195 | +18,500 LOC-sized new `rustc_codegen_llvm` and the 12,000 LOC-sized |
| 196 | +`rustc_codegen_ssa`. We can say that this refactoring allowed the reuse of |
| 197 | +approximately 10,000 LOC that would otherwise have had to be duplicated between |
| 198 | +the multiple backends of `rustc`. |
| 199 | + |
| 200 | +The refactored version of `rustc`'s backend introduced no regression over the |
| 201 | +test suite nor in performance benchmark, which is in coherence with the nature |
| 202 | +of the refactoring that used only compile-time parametricity (no trait |
| 203 | +objects). |
0 commit comments