RFC: A proposal for a better compiler IR #357

bobbinth · 2024-08-22T15:07:17Z

Discussed in #342

^{Originally posted by bitwalker July 21, 2023}
While doing the refactoring work early this summer on the compiler, I made some compromises to keep large structural changes relatively contained, i.e. I didn't want downstream crates to change much, if at all possible. I wasn't super successful in that regard; while the codegen crates were largely unaffected, the IR crate was completely reworked, and knowing what I know now, I would have come at things a different way.

The main compromise I made was re-using the AST as the IR on which the compiler does the bulk of its work. What is currently in our ir crate is not actually the compiler IR, but rather the algebraic graph (AIR), i.e. the representation we want for codegen. However, most of the heavy lifting is done outside of the algebraic graph, on the AST.

There are a few issues with re-using the AST for our IR:

The inlining phase expands all abstractions, and due to the tree structure of the AST, is forced to duplicate nodes in many places. It is also forced to traverse that tree recursively in multiple places in order to apply certain optimizations, and the tree can get very deep, largely due to comprehensions. This performs badly in terms of both space and time, and could cause stack overflows depending on the depth of the AST.
Some optimizations such as global value numbering (for performing common subexpression elimination (CSE)) are impractical to apply, and would be awkward to write, since the AST is not dataflow-oriented. As re-using computations is important for performance, GVN+CSE is an optimization we would very much like to perform.

I'd like to propose that we plan to do the following:

Make the algebraic graph the IR of the codegen backend, keeping the AIR moniker
Introduce a new graph-structured IR for the middle-end which is dataflow-oriented, called MIR
Move all transformations/optimizations from the AST to MIR, along with more complex validations
Lower directly to MIR from the AST as soon as parsing is complete, with only minimal validations done at this stage

This would significantly reduce the complexity of the code currently in the middle tier (e.g. inlining), and would provide a much cleaner interface for writing additional optimizations/transformations. Much of the complexity in that code today exists because of the awkward way in which we have to traverse and modify the AST to perform analysis and effect desired changes. This would also have the effect of making the compiler snappier, and scale better (e.g. it fixes the potential for stack overflows during traversal).

We can do this in a few individual steps parallel to other AirScript development (with rough estimates on how long each would take):

Move the algebraic graph (AIR) code out of the ir crate to a shared codegen crate which all of the codegen backend crates reference. (est. 1-2h)
Implement MIR in the ir crate, but not yet used anywhere (est. 3d)
Implement the lowering from AST to MIR, at this point the compiler still goes from AST to AIR directly (est. 2d)
For each pass currently applied to the AST which performs some kind of optimization or transformation, implement the equivalent pass for MIR, along with corresponding tests. Each of these can be done in a separate PR (est. 1w)
Implement the lowering from MIR to AIR (est. 1-2d)
Update the compiler pipeline from AST->AIR to AST->MIR->AIR (est. 1h)
Remove all of the now-unused AST code that handles optimization/transformation/etc. (est. 1h)

Since the level of abstraction in MIR would be the same as the AST, just structured differently, translating AST->MIR and MIR->AIR is trivial. Most of the development time required will be in handling the mechanical work of rewriting passes and tests in terms of the new structure, rather than implementing anything novel.

This proposal isn't necessary to implement right now, but I think we should start planning on it, and perhaps get steps 1-3 done so that we have a foundation on which to implement and test the remaining steps.

If anyone has any questions, I'm happy to answer them!

The text was updated successfully, but these errors were encountered:

bobbinth added this to the AirScript v0.4 milestone Aug 22, 2024

Leo-Besancon mentioned this issue Oct 17, 2024

MassaLabs: Implement a Intermediate Representation to improve the compilation process #359

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: A proposal for a better compiler IR #357

RFC: A proposal for a better compiler IR #357

bobbinth commented Aug 22, 2024

RFC: A proposal for a better compiler IR #357

RFC: A proposal for a better compiler IR #357

Comments

bobbinth commented Aug 22, 2024

Discussed in #342