As with PA0, you will develop and submit this assignment through Github. Here is the link to accept the assignment https://classroom.github.com/a/6hRQ_pnv.
Your job in this assignment is to implement an assembler for Grumpy assembly code. That is, you'll be implementing a program assem
that reads assembly code for the Grumpy virtual machine (GrumpyVM) and outputs corresponding bytecode.
Assembly language program Equivalent bytecode program
| |
v /-------------\ v
<filename.s> => | assem | => <filename.o>
\-------------/
^| This assignment
The assignment template is a workspace containing two crates:
- a library crate called
grumpy
- a binary crate called
assem
.
The grumpy
library crate is a collection of modules implementing the various components of the Grumpy compiler. For, now it has only three files:
- lib.rs -- the crate root file
- isa.rs -- Grumpy instruction set types
- assemble.rs -- the Grumpy assembler.
To help familiarize yourself with the grumpy
crate and its modules, you can run the following command to generate and open its documentation in a browser:
cargo doc --open
For now there are only two modules within grumpy
: assemble
and isa
. In future assignments we'll be adding more.
The template files as provided are not complete. There are three main problems to solve:
- Translating Grumpy assembly into an in-memory data structure
- Laying out the parsed program in memory in order to resolve labels
- Printing the resulting bytecode.
We discuss each in turn.
The input to your assembler is a file <filename>.s
containing text like:
setframe 0
push Lmain
call
halt
Lmain:
push 3
push 12
binary /
ret
This program returns the result of the operation 12 / 3
.
Line by line, the program does the following:
Instruction | Effect |
---|---|
setframe 0 | Set the frame pointer to 0 |
push Lmain | Push the address corresponding to label Lmain onto the stack |
call | Call the address corresponding to Lmain |
halt | Halt the machine |
Lmain: | Declare label Lmain as the address of the following instruction |
push 3 | Push 3 onto the stack |
push 12 | Push 12 onto the stack |
binary / | Pop 12, pop 3, push (12 / 3) |
ret | Return to caller |
Your first step is to write functions for reading such programs from a file, storing them in an internal data structure (described below).
The file grumpy/src/isa.rs
contains the type of machine instructions that are natively supported by the Grumpy virtual machine:
pub enum Instr {
/// Push(v): Push value v onto the stack.
Push(Val),
/// Pop a value from the stack, discarding it.
Pop,
...
The pub
in this type declaration makes Instr
a type accessible in files that import the isa module.
If you look closely, you will see that labels don't appear anywhere in the native instruction set. This is because all jumps in GrumpyVM programs must be to statically known instruction locations (indices into the instruction cache). However, Grumpy assembly programs can contain labels, so we define a type of "pseudo-instructions" that augments native instructions with support for labels.
pub enum PInstr {
/// Label the next instruction.
PLabel(Label),
/// Push a label onto the stack.
PPush(Label),
/// Native machine instruction.
PI(Instr),
}
You must write parsing functions for reading Grumpy assembly programs from a file and storing them as arrays of PInstrs
(i.e., Vec<PInstr>
). The template contains starter code for implementing the FromStr
trait for PInstr
and related types, which is our recommended approach.
Here's a grammar that defines the set of programs your assembler should accept:
Unary Operations
u ::= neg
Binary Operations
b ::= + | * | - | / | < | ==
Values
v ::= tt // The unit value
| n // 32-bit two's complement signed integer
| true // Boolean literals
| false
| undef // The undefined value
Labels
l ::= L[a-zA-Z0-9]+ // A label is the character 'L' followed by one or more alphanumeric characters
| _L[a-zA-Z0-9]+ // or "_L" followed by one or more alphanumeric characters ("_L" is typically reserved
// for fresh labels generated by a compiler producing GrumpyVM assembly).
Instructions
i ::= push v // Push a value
| push l // Push a label
| pop
| peek u32 // Where u32 stands for a 32-bit unsigned integer
| unary u
| binary b
| swap
| alloc
| set
| get
| var u32
| store u32
| setframe u32
| call
| ret
| branch
| halt
Instructions or Labels
il ::= i // An "il" is either an instruction
| l: // or a label "l" followed by a colon ":", as in "Lmain:".
Grumpy Assembly Programs
p ::= il_1
il_2
...
il_N
Your assembler should be able to handle any assembly program that is valid according to this grammar. If you want additional examples of valid Grumpy assembly programs, take a look at the .s
files in the tests
directory.
A program such as
setframe 0
push Lmain
call
halt
Lmain:
...
contains labels like Lmain
that stand in for the addresses of instructions in the program. To produce a program that can be executed on the GrumpyVM, the assembler (in grumpy/src/assemble.rs
) needs to resolve the labels to these instruction addresses.
For the example program above, the labels could be resolved as:
Address | Instruction |
---|---|
0 | setframe u32(0) |
1 | push loc(4) |
2 | call |
3 | halt |
4 | push i32(3) |
5 | push i32(12) |
6 | binary / |
7 | ret |
Some uses of labels, like push Lmain
, point to declarations (Lmain:
) that appear only later in the program. One way to deal with this is to make two passes through the program, the first to resolve the addresses of labels, the second to replace the labels with their resolved addresses.
A Note on Instruction Addresses: When addressing instructions, your assembler should use addresses that correspond to the index of the instruction as it appears in the program. That is, the first instruction (above, setframe 0
) is at address 0, the second (push loc(4)
) is at address 1, and so on. In the bytecode representation of GrumpyVM programs (about which more below), different instructions are not necessarily the same size. Instruction addresses therefore abstract from the physical layout of GrumpyVM programs in bytecode; the GrumpyVM (PA2) will resolve this discrepancy when it decodes GrumpyVM bytecode files.
As a library crate, the grumpy
crate is not executable by itself, but it provides the functionality of the Grumpy compiler for use by executable (binary) crates. The executable for this assignment is in the assem
binary crate, where we perform all I/O and use functions and traits imported from the grumpy
library crate to parse GrumpyVM assembly programs and assemble them to bytecode.
The output of your assembler should be a new binary file <filename>.o
containing the bytecode for <filename>.s
with all labels resolved. For the program above, this bytecode is:
000000 0000 0008 0b00 0000 0000 0400 0000 040c
000010 0f00 0100 0000 0300 0100 0000 0c04 030d
000020
as displayed in hexadecimal by the linux command od --endian big -A x -x <filename.o>
. The first 6 characters of each line, e.g., 000000
, are the addresses in hexadecimal of the beginning of the line. These addresses are output automatically by od
; your assembler does not need to generate them.
How do you figure out the byte-level encodings of GrumpyVM's instructions? Your first point of reference should be the GrumpyVM documentation, which describes in detail the encodings of all the instructions (in particular, the section you should focus on is Instruction Bytecode Format). Stepping through the encodings to reverse engineer the .o
output above, we get:
Bytes | Meaning |
---|---|
00 00 00 08 | N , the number of instructions, equals 8 |
0b 00 00 00 00 | Instruction is setframe (opcode = 11 ), argument is 0 |
00 04 00 00 00 04 | Instruction is push (opcode = 0 ), argument is a location (prefix 04 ), location is 00 00 00 04 |
0c | Instruction is call (opcode = 12 ) |
0f | Instruction is halt (opcode = 15 ) |
00 01 00 00 00 03 | Instruction is push (opcode = 0 ), argument is an i32 (prefix 01 ), integer is 00 00 00 03 |
00 01 00 00 00 0c | Instruction is push (opcode = 0 ), argument is an i32 (prefix 01 ), integer is 00 00 00 0c (= 12 ) |
04 03 | Instruction is binary (opcode = 4 ), argument is operation / (code 03 ) |
0d | Instruction is ret (opcode = 13 ) |
-
GrumpyVM object files begin with a binary encoding of the unsigned 32-bit big-endian integer
N
, the number of instructions (not the number of bytes!) in the program. If you forget to outputN
, the GrumpyVM won't be able to execute your.o
files. -
Make sure to test your assembler by running the GrumpyVM on the object files you generate, for a variety of test programs. We will test your assembler against both student tests and against a test suite that we have designed.
-
Remember: all integers, etc., should be encoded in big-endian format. Take a look at the Rust documentation, e.g., for u32, to see how to convert integers to byte arrays in big-endian format.
One way to test your code is to run it on a variety of GrumpyVM assembly (.s
) files, ensuring that for each program you output bytecode (a .o
file) that's runnable by the GrumpyVM (and on which the VM produces the right result!).
To assist you, we've collected a number of sample programs in the tests/
directory. At a high level:
- Files
<file>.s
are GrumpyVM assembly programs. - Files
<file>.o
are GrumpyVM bytecode files.
To minimally test your assembler, ensure that for each <file>
in tests
, the .o
file your assembler produces when run on <file>.s
is exactly the same as the reference <file>.o
that we give you.
Be sure to test your program on the Ubuntu Prime machines before submitting to work out any bugs.
-
Include at least one novel test in your code in the form of a GrumpyVM assembly program. This will be worth 10% of your grade.
-
Implement a program
assem
that reads Grumpy assembly programs from<filename>.s
and outputs the resulting bytecodes to<filename>.o
. You can assume that<filename>.s
is the first argument toassem
(the zeroth being your assembler's path). You may implement the assembler however you wish (although most everything except I/O and basic plumbing should be in thegrumpy
crate), but the starter code provided ingrumpy/src/
suggests a reasonable strategy. -
Make sure you update your Cargo.toml files (
assem/Cargo.toml
andgrumpy/Cargo.toml
). The authors section should be
authors = ["YOUR NAME <YOUR EMAIL>"]
-
On valid Grumpy assembly programs,
assem
should return exit code 0. On invalid programs, your assembler should exit with a nonzero exit code. -
You must include at least 3 non-trivial unit tests in your code.
-
Submit your assignment through Github on or by the due date. Make sure you've pushed all your code to the Github repository (master branch). Please include your name and the word "submission" somewhere in the final commit message.
Submit the URL of your github repo to Blackboard, which should include your Github ID. If not, make sure to identify your Github ID in submission comments.