Skip to content

A simple C compiler for learning, based on "Engineering a Compiler (the 2nd Edition)".

License

Notifications You must be signed in to change notification settings

pastchick3/eac-compiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EAC Compiler

A simple C compiler for learning, based on Engineering a Compiler (the 2nd Edition).

Installation

Precompiled eac-compiler binary can be downloaded from the GitHub "Releases" page. As eac-compiler uses MASM for x64 as its assembler and linker, you need to install the C++ workload from the Visual Studio. Also, you should make sure all environment variables are properly set by running vcvars64.bat or using the x64 Native Tools Command Prompt (both of them are shipped with the Visual Studio).

eac-compiler.zip contains three files. Among them, antlr4-runtime.dll is the runtime of the ANTLR parser generator. driver.asm will call the main function and print the return value of it in decimal to the standard output. driver.asm will be automatically compiled and linked against our main assembly file, and it should be kept in the same directory as eac-compiler.exe.

Quick Start

Suppose there is a simple C file named fib.c that computes the n-th Fibonacci numbers:

int fib(int n) {
    if (n <= 2) {
        return n - 1;
    } else {
        return fib(n - 1) + fib(n - 2);
    }
}

int main() {
    return fib(10);  // 34
}

To compile this file, simply run the following command.

> eac-compiler.exe fib.c

eac-compiler will compile it into main.asm and combine it with driver.asm to produce main.exe. Y.u can run main.exe to see its output:

> main.exe
> 34

You can also make eac-compiler to print intermediate results by passing command line flags --ast, --ssa, --cfg, --vasm, or --asm, which will print the AST, the SSA IR, the control flow graph (after destructing all Phi functions), pseudo-x64 assembly using virtual registers, and x64 assembly using physical registers.

Module Overview

The Front End

eac-compiler's parser is generated by ANTLR and only supports a small subset of the C langauge, the formal grammar is given in the end of this document, which basically covers only arithmetics over the integer type. If you are interested in the implementation of the pareser and other other language features, there is another project Agile-C in which I built a constraint-based type inference C transpiler that covers most frequently-used C features (structs, arrays, pointers, and more) with a hand-written recursive descent parser.

There are two extra points that worth mentioning. First, the FFI code that binds the generated C++ parser to Rust make use of global variables, so when you are running tests, you should disable the multithreading test runner, i.e. always use the command cargo test -- --test-threads=1 instead of the plain cargo test. Second, compound statments (curly braces), except the one used to delimit the function body, do not introduce new scopes, so we can construct continuous data flow across the whole function to make the data-flow analysis meaningful even for simple test programs.

The Intermediate Representation

eac-compiler performs the data-flow analysis over the original AST and transforms it into the Static Single-Assignment Form (SSA form) IR. The construction and destruction processes follows Chapter 9.3 of the book.

The Back End

eac-compiler emits x64 assembly and follows the Windows x64 calling convention. There are good introduction materials for the x64 assembly from Intel and for the Windows x64 calling convention from Microsoft. Among them, the most important parts eac-compiler uses is listed as below.

  • The return value (if any) is located in rax.
  • Arguments are stored in rcx, rdx, r8, r9, and then stack, in that order from left to right. There is 32 bytes "shadow space" allocated on the stack to store the first four arguments.
  • rsp is the stack pointer. rbp is the stack frame pointer (callee-saved). rcx, rdx, r8:r11 are caller-saved general-purpose registers. rbx, rsi, rdi, r12:r15 are callee-saved general-purpose registers.
  • It is caller's responsibility to clean the stack.

The instruction selection process of eac-compiler follows Chapter 11.4 (Insturction Selection via Tree-Pattern Matching), but because currently each AST/SSA node is mapped into an unique sequence of assembly code, the process looks pretty like the naive treewalk scheme.

The register allocation process follows the bottom-up local register allocation in Chapter 13.3.2. Because eac-compiler does not compute the live range information for variables, registers will not be actively freed but just spilt to the stack if all physical registers are occupied.

Grammar

<non-digit> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G"
                | "H" | "I" | "J" | "K" | "L" | "M" | "N"
                | "O" | "P" | "Q" | "R" | "S" | "T"
                | "U" | "V" | "W" | "X" | "Y" | "Z"
                | "a" | "b" | "c" | "d" | "e" | "f" | "g"
                | "h" | "i" | "j" | "k" | "l" | "m" | "n"
                | "o" | "p" | "q" | "r" | "s" | "t"
                | "u" | "v" | "w" | "x" | "y" | "z"
                | "_";
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9";
<identifier> ::= <non-digit> (<digit> | <non-digit>)*;
<number> ::= ["+" | "-"] <digit>+;


<primary-expression> ::= <identifier> | <number> | "(" <expression> ")";
<postfix-expression> ::= <primary-expression> | <postfix-expression> "(" <argument-list> ")";
<argument-list> ::= <expression> | <argument-list> "," <expression>;
<prefix-expression> ::= <postfix-expression> | "!" <postfix-expression> | "-" <postfix-expression>;
<multiplicative-expression> ::= <prefix-expression>
                            | <multiplicative-expression> "*" <prefix-expression>
                            | <multiplicative-expression> "/" <prefix-expression>;
<additive-expression> ::= <multiplicative-expression>
                            | <additive-expression> "+" <multiplicative-expression>
                            | <additive-expression> "-" <multiplicative-expression>;
<relational-expression> ::= <additive-expression>
                            | <relational-expression> "<" <additive-expression>
                            | <relational-expression> ">" <additive-expression>
                            | <relational-expression> "<=" <additive-expression>
                            | <relational-expression> ">=" <additive-expression>;
<equality-expression> ::= <relational-expression>
                            | <equality-expression> "==" <relational-expression>
                            | <equality-expression> "!=" <relational-expression>;
<logical-AND-expression> ::= <equality-expression>
                            | <logical-AND-expression> "&&" <equality-expression>;
<logical-OR-expression> ::= <logical-AND-expression>
                            | <logical-OR-expression> "||" <logical-AND-expression>;
<assignment-expression> ::= <identifier> "=" <logical-OR-expression>;
<expression> ::= <assignment-expression>;


<statement> ::= <declaration-statement>
                | <compound-statement>
                | <expression-statement>
                | <selection-statement>
                | <iteration-statement>
                | <jump-statement>;
<declaration-statement> ::= "int" <identifier>;
<compound-statement> ::= "{" <statement>* "}";
<expression-statement> ::= <expression> ";";
<selection-statement> ::= "if" "(" <expression> ")" <statement> ["else" <statement>];
<iteration-statement> ::= "while" "(" <expression> ")" <statement>;
<jump-statement> ::= "return" [<expression>] ";";


<function> ::= "void" | "int" <identifier> "(" <parameter-list> ")" <compound-statement>;
<parameter-list> ::= ["int" <identifier>] | <parameter-list> "," ["int" <identifier>];


<program> ::= <function>*;

About

A simple C compiler for learning, based on "Engineering a Compiler (the 2nd Edition)".

Resources

License

Stars

Watchers

Forks

Packages

No packages published