From fe08a5f930b7a7114af2b3de846d7b77c83bdb33 Mon Sep 17 00:00:00 2001 From: Mark Mansi Date: Thu, 24 Oct 2019 13:33:44 -0500 Subject: [PATCH] move readme to guide --- src/SUMMARY.md | 1 + src/codegen/backend-agnostic.md | 203 ++++++++++++++++++++++++++++++++ 2 files changed, 204 insertions(+) create mode 100644 src/codegen/backend-agnostic.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index fc866f7b0..fdec4ffde 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -91,6 +91,7 @@ - [Code Generation](./codegen.md) - [Updating LLVM](./codegen/updating-llvm.md) - [Debugging LLVM](./codegen/debugging.md) + - [Backend Agnostic Codegen](./codegen/backend-agnostic.md) - [Profile-guided Optimization](./profile-guided-optimization.md) - [Debugging Support in Rust Compiler](./debugging-support-in-rustc.md) diff --git a/src/codegen/backend-agnostic.md b/src/codegen/backend-agnostic.md new file mode 100644 index 000000000..b6436b679 --- /dev/null +++ b/src/codegen/backend-agnostic.md @@ -0,0 +1,203 @@ +# Backend Agnostic Codegen + +In the future, it would be nice to allow other codegen backends (e.g. +[Cranelift][cranelift]). To this end, `librustc_codegen_ssa` provides an +abstract interface for all backends to implenent. + +> The following is a copy/paste of a README from the rust-lang/rust repo. +> Please submit a PR if it needs updating. + +# Refactoring of `rustc_codegen_llvm` +by Denis Merigoux, October 23rd 2018 + +## State of the code before the refactoring + +All the code related to the compilation of MIR into LLVM IR was contained +inside the `rustc_codegen_llvm` crate. Here is the breakdown of the most +important elements: +* the `back` folder (7,800 LOC) implements the mechanisms for creating the + different object files and archive through LLVM, but also the communication + mechanisms for parallel code generation; +* the `debuginfo` (3,200 LOC) folder contains all code that passes debug + information down to LLVM; +* the `llvm` (2,200 LOC) folder defines the FFI necessary to communicate with + LLVM using the C++ API; +* the `mir` (4,300 LOC) folder implements the actual lowering from MIR to LLVM + IR; +* the `base.rs` (1,300 LOC) file contains some helper functions but also the + high-level code that launches the code generation and distributes the work. +* the `builder.rs` (1,200 LOC) file contains all the functions generating + individual LLVM IR instructions inside a basic block; +* the `common.rs` (450 LOC) contains various helper functions and all the + functions generating LLVM static values; +* the `type_.rs` (300 LOC) defines most of the type translations to LLVM IR. + +The goal of this refactoring is to separate inside this crate code that is +specific to the LLVM from code that can be reused for other rustc backends. For +instance, the `mir` folder is almost entirely backend-specific but it relies +heavily on other parts of the crate. The separation of the code must not affect +the logic of the code nor its performance. + +For these reasons, the separation process involves two transformations that +have to be done at the same time for the resulting code to compile : + +1. replace all the LLVM-specific types by generics inside function signatures + and structure definitions; +2. encapsulate all functions calling the LLVM FFI inside a set of traits that + will define the interface between backend-agnostic code and the backend. + +While the LLVM-specific code will be left in `rustc_codegen_llvm`, all the new +traits and backend-agnostic code will be moved in `rustc_codegen_ssa` (name +suggestion by @eddyb). + +## Generic types and structures + +@irinagpopa started to parametrize the types of `rustc_codegen_llvm` by a +generic `Value` type, implemented in LLVM by a reference `&'ll Value`. This +work has been extended to all structures inside the `mir` folder and elsewhere, +as well as for LLVM's `BasicBlock` and `Type` types. + +The two most important structures for the LLVM codegen are `CodegenCx` and +`Builder`. They are parametrized by multiple lifetime parameters and the type +for `Value`. + +```rust,ignore +struct CodegenCx<'ll, 'tcx> { + /* ... */ +} + +struct Builder<'a, 'll, 'tcx> { + cx: &'a CodegenCx<'ll, 'tcx>, + /* ... */ +} +``` + +`CodegenCx` is used to compile one codegen-unit that can contain multiple +functions, whereas `Builder` is created to compile one basic block. + +The code in `rustc_codegen_llvm` has to deal with multiple explicit lifetime +parameters, that correspond to the following: +* `'tcx` is the longest lifetime, that corresponds to the original `TyCtxt` + containing the program's information; +* `'a` is a short-lived reference of a `CodegenCx` or another object inside a + struct; +* `'ll` is the lifetime of references to LLVM objects such as `Value` or + `Type`. + +Although there are already many lifetime parameters in the code, making it +generic uncovered situations where the borrow-checker was passing only due to +the special nature of the LLVM objects manipulated (they are extern pointers). +For instance, a additional lifetime parameter had to be added to +`LocalAnalyser` in `analyse.rs`, leading to the definition: + +```rust,ignore +struct LocalAnalyzer<'mir, 'a, 'tcx> { + /* ... */ +} +``` + +However, the two most important structures `CodegenCx` and `Builder` are not +defined in the backend-agnostic code. Indeed, their content is highly specific +of the backend and it makes more sense to leave their definition to the backend +implementor than to allow just a narrow spot via a generic field for the +backend's context. + +## Traits and interface + +Because they have to be defined by the backend, `CodegenCx` and `Builder` will +be the structures implementing all the traits defining the backend's interface. +These traits are defined in the folder `rustc_codegen_ssa/traits` and all the +backend-agnostic code is parametrized by them. For instance, let us explain how +a function in `base.rs` is parametrized: + +```rust,ignore +pub fn codegen_instance<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>( + cx: &'a Bx::CodegenCx, + instance: Instance<'tcx> +) { + /* ... */ +} +``` + +In this signature, we have the two lifetime parameters explained earlier and +the master type `Bx` which satisfies the trait `BuilderMethods` corresponding +to the interface satisfied by the `Builder` struct. The `BuilderMethods` +defines an associated type `Bx::CodegenCx` that itself satisfies the +`CodegenMethods` traits implemented by the struct `CodegenCx`. + +On the trait side, here is an example with part of the definition of +`BuilderMethods` in `traits/builder.rs`: + +```rust,ignore +pub trait BuilderMethods<'a, 'tcx>: + HasCodegen<'tcx> + + DebugInfoBuilderMethods<'tcx> + + ArgTypeMethods<'tcx> + + AbiBuilderMethods<'tcx> + + IntrinsicCallMethods<'tcx> + + AsmBuilderMethods<'tcx> +{ + fn new_block<'b>( + cx: &'a Self::CodegenCx, + llfn: Self::Function, + name: &'b str + ) -> Self; + /* ... */ + fn cond_br( + &mut self, + cond: Self::Value, + then_llbb: Self::BasicBlock, + else_llbb: Self::BasicBlock, + ); + /* ... */ +} +``` + +Finally, a master structure implementing the `ExtraBackendMethods` trait is +used for high-level codegen-driving functions like `codegen_crate` in +`base.rs`. For LLVM, it is the empty `LlvmCodegenBackend`. +`ExtraBackendMethods` should be implemented by the same structure that +implements the `CodegenBackend` defined in +`rustc_codegen_utils/codegen_backend.rs`. + +During the traitification process, certain functions have been converted from +methods of a local structure to methods of `CodegenCx` or `Builder` and a +corresponding `self` parameter has been added. Indeed, LLVM stores information +internally that it can access when called through its API. This information +does not show up in a Rust data structure carried around when these methods are +called. However, when implementing a Rust backend for `rustc`, these methods +will need information from `CodegenCx`, hence the additional parameter (unused +in the LLVM implementation of the trait). + +## State of the code after the refactoring + +The traits offer an API which is very similar to the API of LLVM. This is not +the best solution since LLVM has a very special way of doing things: when +addding another backend, the traits definition might be changed in order to +offer more flexibility. + +However, the current separation between backend-agnostic and LLVM-specific code +has allows the reuse of a significant part of the old `rustc_codegen_llvm`. +Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the +most important elements: + +* `back` folder: 3,800 (BA) vs 4,100 (LLVM); +* `mir` folder: 4,400 (BA) vs 0 (LLVM); +* `base.rs`: 1,100 (BA) vs 250 (LLVM); +* `builder.rs`: 1,400 (BA) vs 0 (LLVM); +* `common.rs`: 350 (BA) vs 350 (LLVM); + +The `debuginfo` folder has been left almost untouched by the splitting and is +specific to LLVM. Only its high-level features have been traitified. + +The new `traits` folder has 1500 LOC only for trait definitions. Overall, the +27,000 LOC-sized old `rustc_codegen_llvm` code has been split into the new +18,500 LOC-sized new `rustc_codegen_llvm` and the 12,000 LOC-sized +`rustc_codegen_ssa`. We can say that this refactoring allowed the reuse of +approximately 10,000 LOC that would otherwise have had to be duplicated between +the multiple backends of `rustc`. + +The refactored version of `rustc`'s backend introduced no regression over the +test suite nor in performance benchmark, which is in coherence with the nature +of the refactoring that used only compile-time parametricity (no trait +objects).