Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce another IR for machine code instructions #9514

Closed
andrewrk opened this issue Aug 3, 2021 · 1 comment
Closed

introduce another IR for machine code instructions #9514

andrewrk opened this issue Aug 3, 2021 · 1 comment
Labels
accepted This proposal is planned. backend-self-hosted frontend Tokenization, parsing, AstGen, Sema, and Liveness. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Aug 3, 2021

Problem statement:

  • inline assembly is not fully implemented yet
  • codegen.zig only uses one pass, and therefore generates worse code for many cases. For example in x86 it always uses a large jump because it cannot be sure that the jump target will be within 127 bytes.
  • -femit-asm does not work in self-hosted backends
    • godbolt wants this!

Proposal to address the problems:

Instead of the self-hosted backend going straight from AIR => machine code, it would emit another IR. Let's call it MIR for Machine IR. The instructions would correspond almost one-to-one with machine code instructions. However, MIR could then be lowered into machine code, or into assembly code. This would help with debugging and working on the self-hosted codegen backends. Additionally, there could be another pass on the MIR to convert instructions into smaller encodings based on offset calculations, generating better code.

This also helps with inline assembly, which would emit MIR. In the LLVM backend, inline assembly would be lowered to MIR, which would then be lowered to LLVM flavored inline assembly. This may seem convoluted, but consider that we want Intel syntax for our x86 inline assembly, yet LLVM only supports AT&T (there are too many bugs in the AT&T dialect to say that it is supported). So this would let us have our own nice syntax and then lower it to what LLVM expects.

Compilation Speed Performance Concerns

Things are only one pass currently because I wanted to optimize for compilation speed. However, I think this is the wrong way to look at the problem. Consider:

  • Generating smaller machine code could improve performance because those extra bytes of machine code put more pressure on memory allocation, but also more importantly they have to be written to disk.
  • The bottleneck of compilation speed is expected to be semantic analysis. Every time a function is finished being analyzed, a parallel task can be sent to the backend to lower the function into machine code and linked into the output file. As long as semantic analysis remains the bottleneck, doing more computations in the backend is "free" in the sense that the CPU cores would otherwise just be sitting there idling.
  • AIR=>MIR and MIR=>machine code are also independently parallelizable if necessary.

Design of MIR

There would be a different MIR dialect for each Instruction Set Architecture. For example there would be an x86 MIR which has all the x86 instructions and an ARM MIR which has all the ARM instructions.

The LLVM backend, WebAssembly backend, C backend, and SPIR-V have no need for MIR. The "machine code" in those cases is already high enough level that no MIR is needed.

Fully implementing inline assembly and the full MIR instruction sets for each supported ISA will likely be done with large .zig files which are essentially data. I suspect this will be prohibitively slow and memory intensive for stage1 to handle, so I suggest we do a proof-of-concept with MIR in stage2 until we are fully self-hosted, and then after that we can complete the MIR instruction set listings.

@andrewrk andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. backend-self-hosted labels Aug 3, 2021
@andrewrk andrewrk added this to the 0.9.0 milestone Aug 3, 2021
@andrewrk andrewrk added the accepted This proposal is planned. label Aug 3, 2021
@andrewrk
Copy link
Member Author

This is done, thanks to @joachimschmidt557, @kubkon, and @Luukdegram. Issues related to MIR improvements can be follow-up issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. backend-self-hosted frontend Tokenization, parsing, AstGen, Sema, and Liveness. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

1 participant