Skip to content

Commit 51a4a72

Browse files
mark-i-mspastorino
authored andcommitted
move readme to guide (#481)
1 parent 6fd627d commit 51a4a72

File tree

2 files changed

+204
-0
lines changed

2 files changed

+204
-0
lines changed

src/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@
9191
- [Code Generation](./codegen.md)
9292
- [Updating LLVM](./codegen/updating-llvm.md)
9393
- [Debugging LLVM](./codegen/debugging.md)
94+
- [Backend Agnostic Codegen](./codegen/backend-agnostic.md)
9495
- [Profile-guided Optimization](./profile-guided-optimization.md)
9596
- [Debugging Support in Rust Compiler](./debugging-support-in-rustc.md)
9697

src/codegen/backend-agnostic.md

+203
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
# Backend Agnostic Codegen
2+
3+
In the future, it would be nice to allow other codegen backends (e.g.
4+
[Cranelift][cranelift]). To this end, `librustc_codegen_ssa` provides an
5+
abstract interface for all backends to implenent.
6+
7+
> The following is a copy/paste of a README from the rust-lang/rust repo.
8+
> Please submit a PR if it needs updating.
9+
10+
# Refactoring of `rustc_codegen_llvm`
11+
by Denis Merigoux, October 23rd 2018
12+
13+
## State of the code before the refactoring
14+
15+
All the code related to the compilation of MIR into LLVM IR was contained
16+
inside the `rustc_codegen_llvm` crate. Here is the breakdown of the most
17+
important elements:
18+
* the `back` folder (7,800 LOC) implements the mechanisms for creating the
19+
different object files and archive through LLVM, but also the communication
20+
mechanisms for parallel code generation;
21+
* the `debuginfo` (3,200 LOC) folder contains all code that passes debug
22+
information down to LLVM;
23+
* the `llvm` (2,200 LOC) folder defines the FFI necessary to communicate with
24+
LLVM using the C++ API;
25+
* the `mir` (4,300 LOC) folder implements the actual lowering from MIR to LLVM
26+
IR;
27+
* the `base.rs` (1,300 LOC) file contains some helper functions but also the
28+
high-level code that launches the code generation and distributes the work.
29+
* the `builder.rs` (1,200 LOC) file contains all the functions generating
30+
individual LLVM IR instructions inside a basic block;
31+
* the `common.rs` (450 LOC) contains various helper functions and all the
32+
functions generating LLVM static values;
33+
* the `type_.rs` (300 LOC) defines most of the type translations to LLVM IR.
34+
35+
The goal of this refactoring is to separate inside this crate code that is
36+
specific to the LLVM from code that can be reused for other rustc backends. For
37+
instance, the `mir` folder is almost entirely backend-specific but it relies
38+
heavily on other parts of the crate. The separation of the code must not affect
39+
the logic of the code nor its performance.
40+
41+
For these reasons, the separation process involves two transformations that
42+
have to be done at the same time for the resulting code to compile :
43+
44+
1. replace all the LLVM-specific types by generics inside function signatures
45+
and structure definitions;
46+
2. encapsulate all functions calling the LLVM FFI inside a set of traits that
47+
will define the interface between backend-agnostic code and the backend.
48+
49+
While the LLVM-specific code will be left in `rustc_codegen_llvm`, all the new
50+
traits and backend-agnostic code will be moved in `rustc_codegen_ssa` (name
51+
suggestion by @eddyb).
52+
53+
## Generic types and structures
54+
55+
@irinagpopa started to parametrize the types of `rustc_codegen_llvm` by a
56+
generic `Value` type, implemented in LLVM by a reference `&'ll Value`. This
57+
work has been extended to all structures inside the `mir` folder and elsewhere,
58+
as well as for LLVM's `BasicBlock` and `Type` types.
59+
60+
The two most important structures for the LLVM codegen are `CodegenCx` and
61+
`Builder`. They are parametrized by multiple lifetime parameters and the type
62+
for `Value`.
63+
64+
```rust,ignore
65+
struct CodegenCx<'ll, 'tcx> {
66+
/* ... */
67+
}
68+
69+
struct Builder<'a, 'll, 'tcx> {
70+
cx: &'a CodegenCx<'ll, 'tcx>,
71+
/* ... */
72+
}
73+
```
74+
75+
`CodegenCx` is used to compile one codegen-unit that can contain multiple
76+
functions, whereas `Builder` is created to compile one basic block.
77+
78+
The code in `rustc_codegen_llvm` has to deal with multiple explicit lifetime
79+
parameters, that correspond to the following:
80+
* `'tcx` is the longest lifetime, that corresponds to the original `TyCtxt`
81+
containing the program's information;
82+
* `'a` is a short-lived reference of a `CodegenCx` or another object inside a
83+
struct;
84+
* `'ll` is the lifetime of references to LLVM objects such as `Value` or
85+
`Type`.
86+
87+
Although there are already many lifetime parameters in the code, making it
88+
generic uncovered situations where the borrow-checker was passing only due to
89+
the special nature of the LLVM objects manipulated (they are extern pointers).
90+
For instance, a additional lifetime parameter had to be added to
91+
`LocalAnalyser` in `analyse.rs`, leading to the definition:
92+
93+
```rust,ignore
94+
struct LocalAnalyzer<'mir, 'a, 'tcx> {
95+
/* ... */
96+
}
97+
```
98+
99+
However, the two most important structures `CodegenCx` and `Builder` are not
100+
defined in the backend-agnostic code. Indeed, their content is highly specific
101+
of the backend and it makes more sense to leave their definition to the backend
102+
implementor than to allow just a narrow spot via a generic field for the
103+
backend's context.
104+
105+
## Traits and interface
106+
107+
Because they have to be defined by the backend, `CodegenCx` and `Builder` will
108+
be the structures implementing all the traits defining the backend's interface.
109+
These traits are defined in the folder `rustc_codegen_ssa/traits` and all the
110+
backend-agnostic code is parametrized by them. For instance, let us explain how
111+
a function in `base.rs` is parametrized:
112+
113+
```rust,ignore
114+
pub fn codegen_instance<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
115+
cx: &'a Bx::CodegenCx,
116+
instance: Instance<'tcx>
117+
) {
118+
/* ... */
119+
}
120+
```
121+
122+
In this signature, we have the two lifetime parameters explained earlier and
123+
the master type `Bx` which satisfies the trait `BuilderMethods` corresponding
124+
to the interface satisfied by the `Builder` struct. The `BuilderMethods`
125+
defines an associated type `Bx::CodegenCx` that itself satisfies the
126+
`CodegenMethods` traits implemented by the struct `CodegenCx`.
127+
128+
On the trait side, here is an example with part of the definition of
129+
`BuilderMethods` in `traits/builder.rs`:
130+
131+
```rust,ignore
132+
pub trait BuilderMethods<'a, 'tcx>:
133+
HasCodegen<'tcx>
134+
+ DebugInfoBuilderMethods<'tcx>
135+
+ ArgTypeMethods<'tcx>
136+
+ AbiBuilderMethods<'tcx>
137+
+ IntrinsicCallMethods<'tcx>
138+
+ AsmBuilderMethods<'tcx>
139+
{
140+
fn new_block<'b>(
141+
cx: &'a Self::CodegenCx,
142+
llfn: Self::Function,
143+
name: &'b str
144+
) -> Self;
145+
/* ... */
146+
fn cond_br(
147+
&mut self,
148+
cond: Self::Value,
149+
then_llbb: Self::BasicBlock,
150+
else_llbb: Self::BasicBlock,
151+
);
152+
/* ... */
153+
}
154+
```
155+
156+
Finally, a master structure implementing the `ExtraBackendMethods` trait is
157+
used for high-level codegen-driving functions like `codegen_crate` in
158+
`base.rs`. For LLVM, it is the empty `LlvmCodegenBackend`.
159+
`ExtraBackendMethods` should be implemented by the same structure that
160+
implements the `CodegenBackend` defined in
161+
`rustc_codegen_utils/codegen_backend.rs`.
162+
163+
During the traitification process, certain functions have been converted from
164+
methods of a local structure to methods of `CodegenCx` or `Builder` and a
165+
corresponding `self` parameter has been added. Indeed, LLVM stores information
166+
internally that it can access when called through its API. This information
167+
does not show up in a Rust data structure carried around when these methods are
168+
called. However, when implementing a Rust backend for `rustc`, these methods
169+
will need information from `CodegenCx`, hence the additional parameter (unused
170+
in the LLVM implementation of the trait).
171+
172+
## State of the code after the refactoring
173+
174+
The traits offer an API which is very similar to the API of LLVM. This is not
175+
the best solution since LLVM has a very special way of doing things: when
176+
addding another backend, the traits definition might be changed in order to
177+
offer more flexibility.
178+
179+
However, the current separation between backend-agnostic and LLVM-specific code
180+
has allows the reuse of a significant part of the old `rustc_codegen_llvm`.
181+
Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the
182+
most important elements:
183+
184+
* `back` folder: 3,800 (BA) vs 4,100 (LLVM);
185+
* `mir` folder: 4,400 (BA) vs 0 (LLVM);
186+
* `base.rs`: 1,100 (BA) vs 250 (LLVM);
187+
* `builder.rs`: 1,400 (BA) vs 0 (LLVM);
188+
* `common.rs`: 350 (BA) vs 350 (LLVM);
189+
190+
The `debuginfo` folder has been left almost untouched by the splitting and is
191+
specific to LLVM. Only its high-level features have been traitified.
192+
193+
The new `traits` folder has 1500 LOC only for trait definitions. Overall, the
194+
27,000 LOC-sized old `rustc_codegen_llvm` code has been split into the new
195+
18,500 LOC-sized new `rustc_codegen_llvm` and the 12,000 LOC-sized
196+
`rustc_codegen_ssa`. We can say that this refactoring allowed the reuse of
197+
approximately 10,000 LOC that would otherwise have had to be duplicated between
198+
the multiple backends of `rustc`.
199+
200+
The refactored version of `rustc`'s backend introduced no regression over the
201+
test suite nor in performance benchmark, which is in coherence with the nature
202+
of the refactoring that used only compile-time parametricity (no trait
203+
objects).

0 commit comments

Comments
 (0)