introduce another IR for machine code instructions #9514

andrewrk · 2021-08-03T19:33:02Z

Problem statement:

inline assembly is not fully implemented yet
codegen.zig only uses one pass, and therefore generates worse code for many cases. For example in x86 it always uses a large jump because it cannot be sure that the jump target will be within 127 bytes.
-femit-asm does not work in self-hosted backends
- godbolt wants this!

Proposal to address the problems:

Instead of the self-hosted backend going straight from AIR => machine code, it would emit another IR. Let's call it MIR for Machine IR. The instructions would correspond almost one-to-one with machine code instructions. However, MIR could then be lowered into machine code, or into assembly code. This would help with debugging and working on the self-hosted codegen backends. Additionally, there could be another pass on the MIR to convert instructions into smaller encodings based on offset calculations, generating better code.

This also helps with inline assembly, which would emit MIR. In the LLVM backend, inline assembly would be lowered to MIR, which would then be lowered to LLVM flavored inline assembly. This may seem convoluted, but consider that we want Intel syntax for our x86 inline assembly, yet LLVM only supports AT&T (there are too many bugs in the AT&T dialect to say that it is supported). So this would let us have our own nice syntax and then lower it to what LLVM expects.

Compilation Speed Performance Concerns

Things are only one pass currently because I wanted to optimize for compilation speed. However, I think this is the wrong way to look at the problem. Consider:

Generating smaller machine code could improve performance because those extra bytes of machine code put more pressure on memory allocation, but also more importantly they have to be written to disk.
The bottleneck of compilation speed is expected to be semantic analysis. Every time a function is finished being analyzed, a parallel task can be sent to the backend to lower the function into machine code and linked into the output file. As long as semantic analysis remains the bottleneck, doing more computations in the backend is "free" in the sense that the CPU cores would otherwise just be sitting there idling.
AIR=>MIR and MIR=>machine code are also independently parallelizable if necessary.

Design of MIR

There would be a different MIR dialect for each Instruction Set Architecture. For example there would be an x86 MIR which has all the x86 instructions and an ARM MIR which has all the ARM instructions.

The LLVM backend, WebAssembly backend, C backend, and SPIR-V have no need for MIR. The "machine code" in those cases is already high enough level that no MIR is needed.

Fully implementing inline assembly and the full MIR instruction sets for each supported ISA will likely be done with large .zig files which are essentially data. I suspect this will be prohibitively slow and memory intensive for stage1 to handle, so I suggest we do a proof-of-concept with MIR in stage2 until we are fully self-hosted, and then after that we can complete the MIR instruction set listings.

The text was updated successfully, but these errors were encountered:

andrewrk · 2021-11-17T02:17:43Z

This is done, thanks to @joachimschmidt557, @kubkon, and @Luukdegram. Issues related to MIR improvements can be follow-up issues.

andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. backend-self-hosted labels Aug 3, 2021

andrewrk added this to the 0.9.0 milestone Aug 3, 2021

andrewrk added the accepted This proposal is planned. label Aug 3, 2021

andrewrk closed this as completed Nov 17, 2021

metroidchild mentioned this issue Sep 11, 2023

parse inline assembly syntax according to a set of dialects; integrate inline assembly more closely with the zig language #10761

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

introduce another IR for machine code instructions #9514

introduce another IR for machine code instructions #9514

andrewrk commented Aug 3, 2021 •

edited

Loading

andrewrk commented Nov 17, 2021

introduce another IR for machine code instructions #9514

introduce another IR for machine code instructions #9514

Comments

andrewrk commented Aug 3, 2021 • edited Loading

Problem statement:

Proposal to address the problems:

Compilation Speed Performance Concerns

Design of MIR

andrewrk commented Nov 17, 2021

andrewrk commented Aug 3, 2021 •

edited

Loading