rust-lang · spastorino · Oct 24, 2019 · Oct 24, 2019
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
@@ -91,6 +91,7 @@
     - [Code Generation](./codegen.md)
         - [Updating LLVM](./codegen/updating-llvm.md)
         - [Debugging LLVM](./codegen/debugging.md)
+        - [Backend Agnostic Codegen](./codegen/backend-agnostic.md)
     - [Profile-guided Optimization](./profile-guided-optimization.md)
     - [Debugging Support in Rust Compiler](./debugging-support-in-rustc.md)
 

diff --git a/src/codegen/backend-agnostic.md b/src/codegen/backend-agnostic.md
@@ -0,0 +1,203 @@
+# Backend Agnostic Codegen
+
+In the future, it would be nice to allow other codegen backends (e.g.
+[Cranelift][cranelift]). To this end, `librustc_codegen_ssa` provides an
+abstract interface for all backends to implenent.
+
+> The following is a copy/paste of a README from the rust-lang/rust repo.
+> Please submit a PR if it needs updating.
+
+# Refactoring of `rustc_codegen_llvm`
+by Denis Merigoux, October 23rd 2018
+
+## State of the code before the refactoring
+
+All the code related to the compilation of MIR into LLVM IR was contained
+inside the `rustc_codegen_llvm` crate. Here is the breakdown of the most
+important elements:
+* the `back` folder (7,800 LOC) implements the mechanisms for creating the
+  different object files and archive through LLVM, but also the communication
+  mechanisms for parallel code generation;
+* the `debuginfo` (3,200 LOC) folder contains all code that passes debug
+  information down to LLVM;
+* the `llvm` (2,200 LOC) folder defines the FFI necessary to communicate with
+  LLVM using the C++ API;
+* the `mir` (4,300 LOC) folder implements the actual lowering from MIR to LLVM
+  IR;
+* the `base.rs` (1,300 LOC) file contains some helper functions but also the
+  high-level code that launches the code generation and distributes the work.
+* the `builder.rs` (1,200 LOC) file contains all the functions generating
+  individual LLVM IR instructions inside a basic block;
+* the `common.rs` (450 LOC) contains various helper functions and all the
+  functions generating LLVM static values;
+* the `type_.rs` (300 LOC) defines most of the type translations to LLVM IR.
+
+The goal of this refactoring is to separate inside this crate code that is
+specific to the LLVM from code that can be reused for other rustc backends. For
+instance, the `mir` folder is almost entirely backend-specific but it relies
+heavily on other parts of the crate. The separation of the code must not affect
+the logic of the code nor its performance.
+
+For these reasons, the separation process involves two transformations that
+have to be done at the same time for the resulting code to compile :
+
+1. replace all the LLVM-specific types by generics inside function signatures
+   and structure definitions;
+2. encapsulate all functions calling the LLVM FFI inside a set of traits that
+   will define the interface between backend-agnostic code and the backend.
+
+While the LLVM-specific code will be left in `rustc_codegen_llvm`, all the new
+traits and backend-agnostic code will be moved in `rustc_codegen_ssa` (name
+suggestion by @eddyb).
+
+## Generic types and structures
+
+@irinagpopa started to parametrize the types of `rustc_codegen_llvm` by a
+generic `Value` type, implemented in LLVM by a reference `&'ll Value`. This
+work has been extended to all structures inside the `mir` folder and elsewhere,
+as well as for LLVM's `BasicBlock` and `Type` types.
+
+The two most important structures for the LLVM codegen are `CodegenCx` and
+`Builder`. They are parametrized by multiple lifetime parameters and the type
+for `Value`.
+
+```rust,ignore
+struct CodegenCx<'ll, 'tcx> {
+  /* ... */
+}
+
+struct Builder<'a, 'll, 'tcx> {
+  cx: &'a CodegenCx<'ll, 'tcx>,
+  /* ... */
+}
+```
+
+`CodegenCx` is used to compile one codegen-unit that can contain multiple
+functions, whereas `Builder` is created to compile one basic block.
+
+The code in `rustc_codegen_llvm` has to deal with multiple explicit lifetime
+parameters, that correspond to the following:
+* `'tcx` is the longest lifetime, that corresponds to the original `TyCtxt`
+  containing the program's information;
+* `'a` is a short-lived reference of a `CodegenCx` or another object inside a
+  struct;
+* `'ll` is the lifetime of references to LLVM objects such as `Value` or
+  `Type`.
+
+Although there are already many lifetime parameters in the code, making it
+generic uncovered situations where the borrow-checker was passing only due to
+the special nature of the LLVM objects manipulated (they are extern pointers).
+For instance, a additional lifetime parameter had to be added to
+`LocalAnalyser` in `analyse.rs`, leading to the definition:
+
+```rust,ignore
+struct LocalAnalyzer<'mir, 'a, 'tcx> {
+  /* ... */
+}
+```
+
+However, the two most important structures `CodegenCx` and `Builder` are not
+defined in the backend-agnostic code. Indeed, their content is highly specific
+of the backend and it makes more sense to leave their definition to the backend
+implementor than to allow just a narrow spot via a generic field for the
+backend's context.
+
+## Traits and interface
+
+Because they have to be defined by the backend, `CodegenCx` and `Builder` will
+be the structures implementing all the traits defining the backend's interface.
+These traits are defined in the folder `rustc_codegen_ssa/traits` and all the
+backend-agnostic code is parametrized by them. For instance, let us explain how
+a function in `base.rs` is parametrized:
+
+```rust,ignore
+pub fn codegen_instance<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
+    cx: &'a Bx::CodegenCx,
+    instance: Instance<'tcx>
+) {
+    /* ... */
+}
+```
+
+In this signature, we have the two lifetime parameters explained earlier and
+the master type `Bx` which satisfies the trait `BuilderMethods` corresponding
+to the interface satisfied by the `Builder` struct. The `BuilderMethods`
+defines an associated type `Bx::CodegenCx` that itself satisfies the
+`CodegenMethods` traits implemented by the struct `CodegenCx`.
+
+On the trait side, here is an example with part of the definition of
+`BuilderMethods` in `traits/builder.rs`:
+
+```rust,ignore
+pub trait BuilderMethods<'a, 'tcx>:
+    HasCodegen<'tcx>
+    + DebugInfoBuilderMethods<'tcx>
+    + ArgTypeMethods<'tcx>
+    + AbiBuilderMethods<'tcx>
+    + IntrinsicCallMethods<'tcx>
+    + AsmBuilderMethods<'tcx>
+{
+    fn new_block<'b>(
+        cx: &'a Self::CodegenCx,
+        llfn: Self::Function,
+        name: &'b str
+    ) -> Self;
+    /* ... */
+    fn cond_br(
+        &mut self,
+        cond: Self::Value,
+        then_llbb: Self::BasicBlock,
+        else_llbb: Self::BasicBlock,
+    );
+    /* ... */
+}
+```
+
+Finally, a master structure implementing the `ExtraBackendMethods` trait is
+used for high-level codegen-driving functions like `codegen_crate` in
+`base.rs`. For LLVM, it is the empty `LlvmCodegenBackend`.
+`ExtraBackendMethods` should be implemented by the same structure that
+implements the `CodegenBackend` defined in
+`rustc_codegen_utils/codegen_backend.rs`.
+
+During the traitification process, certain functions have been converted from
+methods of a local structure to methods of `CodegenCx` or `Builder` and a
+corresponding `self` parameter has been added. Indeed, LLVM stores information
+internally that it can access when called through its API. This information
+does not show up in a Rust data structure carried around when these methods are
+called. However, when implementing a Rust backend for `rustc`, these methods
+will need information from `CodegenCx`, hence the additional parameter (unused
+in the LLVM implementation of the trait).
+
+## State of the code after the refactoring
+
+The traits offer an API which is very similar to the API of LLVM. This is not
+the best solution since LLVM has a very special way of doing things: when
+addding another backend, the traits definition might be changed in order to
+offer more flexibility.
+
+However, the current separation between backend-agnostic and LLVM-specific code
+has allows the reuse of a significant part of the old `rustc_codegen_llvm`.
+Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the
+most important elements:
+
+* `back` folder: 3,800 (BA) vs 4,100 (LLVM);
+* `mir` folder: 4,400 (BA) vs 0 (LLVM);
+* `base.rs`: 1,100 (BA) vs 250 (LLVM);
+* `builder.rs`: 1,400 (BA) vs 0 (LLVM);
+* `common.rs`: 350 (BA) vs 350 (LLVM);
+
+The `debuginfo` folder has been left almost untouched by the splitting and is
+specific to LLVM. Only its high-level features have been traitified.
+
+The new `traits` folder has 1500 LOC only for trait definitions. Overall, the
+27,000 LOC-sized old `rustc_codegen_llvm` code has been split into the new
+18,500 LOC-sized new `rustc_codegen_llvm` and the 12,000 LOC-sized
+`rustc_codegen_ssa`. We can say that this refactoring allowed the reuse of
+approximately 10,000 LOC that would otherwise have had to be duplicated between
+the multiple backends of `rustc`.
+
+The refactored version of `rustc`'s backend introduced no regression over the
+test suite nor in performance benchmark, which is in coherence with the nature
+of the refactoring that used only compile-time parametricity (no trait
+objects).