This document describes the binary form of the Mu intermediate representation. For the text form, see uvm-ir.rest.
DEPRECATED: The binary format is deprecated. As mentioned in this ticket, we have come to the conclusion that the interface between the client and the micro VM should be a functional interface, i.e. constructing IR nodes by invoking API functions. This binary IR form is still a serialised data format that needs to be parsed. The text form, however, is still useful for debugging and for using in statically compiled implementations.
The Mu IR Binary Form is similar to the Text Form in structure, but has notable differences.
Numerical IDs are used exclusively instead of textual names. The binary form also provides a special "name binding" pseudo-top-level definition which associates IDs with names.
A bundle in the binary form consists of many numbers encoded in bytes. All numbers are encoded in little endian and are tightly packed which means there are no padding bytes between two adjacent numbers. For floating point numbers, it is equivalent to convert them bit-by-bit into integer types of the same length and convert to bytes in little-endian.
A sequence of bytes has a binary type which maps the bytes to the value they represent. Possible binary types are:
- i8, i16, i32, i64
- Integer types of the respective lengths.
- float, double
- Floating point types of 32 bits and 64 bits, respectively.
- idt
- Alias to i32. Used for IDs.
- lent
- Alias to i16. Used for lengths of variable-length structures including the number of fields in a struct and the number of items in a parameter list.
- aryszt
- Alias to i64. Used for the length of arrays.
- opct
- Alias to i8. Used for instruction opcodes, operations or flags.
- other structures
- One structure can contain other structures defined separately.
A table is used to represent a contiguous structure. The first row is a list of binary types specifying the type of each column and the second row specifies for each column either a symbolic name for that field or the exact binary content expected. Such a structure consists of a sequence of numbers of the types of the first row.
type1 | type2 |
---|---|
num | or symbolic name |
Some structures are common in multiple structures.
An ID list, denoted as idList, is a list of IDs. It has the general form:
lent | idt | idt | ... |
---|---|---|---|
nIDs | id1 | id2 | ... |
nIDs
specifies the number of IDs and there are nIDs
IDs following it.
A bundle starts with a 4-byte magic "x7F' 'U' 'I' 'R', or 0x7F 0x55 0x49 0x52. Then there are many top-level definitions until the end of the bundle.
Type definition has the following form:
opct | idt | type constructor |
---|---|---|
0x01 | id | cons |
id
is the identifier of the defined type. A type constructor follows the
opcode 0x01 and the ID. See type-system.rest for a complete list of type
constructors.
NOTE: this is equivalent to: .typedef id = cons
.
Function signature definition has the following form:
opct | idt | idList | idList |
---|---|---|---|
0x02 | id | paramtys | rettys |
id
is the identifier of the defined function signature. paramtys
is a
list of IDs of its parameter types. rettys
is a list of IDs of the return
types.
NOTE: this is equivalent to: .funcsig id = (paramtys) -> (rettys)
Constant definition has the following form:
opct | idt | idt | constant constructor |
---|---|---|---|
0x03 | id | type | cons |
id
is the identifier of the defined constant. type
is the type of the
constant and must match the constant constructor. A constant constructor follows
the type.
NOTE: this is equivalent to: .const id <type> = cons
An integer constant constructor has the following form:
opct | i64 |
---|---|
0x01 | number |
number
is the integer constant number. If the integer constant has a type
with fewer bits, only the least significant bits are valid. The binary form
cannot encode integer constants larger than 64 bits.
NOTE: this is equivalent to an integer literal in the text form.
A float constant constructor has the following form:
opct | float |
---|---|
0x02 | number |
number
is the float constant number.
NOTE: this is equivalent to a float literal in the text form.
A double constant constructor has the following form:
opct | double |
---|---|
0x03 | number |
number
is the double constant number.
NOTE: this is equivalent to a double literal in the text form.
A list constant constructor has the following form:
opct | idList |
---|---|
0x04 | elems |
elems
is a list of IDs, each of which refers to another constant which is
the value of the corresponding field of the struct or element of array/vector.
NOTE: this is equivalent to the struct literal {elems}
in the text
form.
A NULL constant constructor has the following form:
opct |
---|
0x05 |
NOTE: this is equivalent to the NULL
keyword in the text form.
Global cell definition has the following form:
opct | idt | idt |
---|---|---|
0x04 | id | type |
id
is the ID of the defined global cell. type
is the type of the global
cell.
NOTE: this is equivalent to: .global id <type>
Function declaration has the following form:
opct | idt | idt |
---|---|---|
0x05 | id | sig |
id
is the ID of the declared function. sig
is the function signature of
it.
NOTE: this is equivalent to: .funcdecl id <sig>
Function definition has the following form:
opct | idt | idt | idt | function body |
---|---|---|---|---|
0x06 | id | verid | sig | body |
id
is the ID of the defined function. verid
is the ID of the version of
the function. sig
is the function signature of it. params
is a list of
IDs, each of which is the ID of its parameter. body
is the function body.
NOTE: this is equivalent to: .funcdef id VERSION verid <sig> {
body }
A function body has the following form:
lent | basic block | basic block | ... |
---|---|---|---|
nbbs | bb1 | bb2 | ... |
nbbs
is the number of basic blocks. bbx
are basic blocks.
A basic block has the following form:
idt | lent | idPairs | idt | lent | instruction | instruction | ... |
---|---|---|---|---|---|---|---|
id | nparams | params | exc | ninsts | inst1 | inst2 | ... |
id
is the ID of the basic block. Every basic block must have an ID, even
the entry block. nparams
is the number of parameters in params
, which a
list of pairs of IDs, each of which is:
idt | idt |
---|---|
type | param |
where type
is the ID of the type of the parameter, and param
is a
parameter to the basic block.
exc
is the ID of the exceptional parameter. It is omitted when the ID is 0.
ninsts
is the number of instructions in the current basic block. There are
ninsts instructions following the header.
An instruction has the following form:
idList | idt | instruction body |
---|---|---|
resIDs | id | instbody |
resIDs
is a list of IDS for the results. id
is the ID of the
instruction. instbody
is instruction body which is specific to each
instruction. See instruction-set.rest for an exhaustive list.
Function declaration has the following form:
opct | idt | idt | opct | idt |
---|---|---|---|---|
0x07 | id | func | callConv | cookie |
id
is the ID of the exposed value. func
is the ID of the function to
expose. callConv
is the calling convention flag. cookie
is the cookie,
the ID of an int<64>
constant.
Name binding is a definition specific to the binary form. It binds a name to an ID. It is designed for debugging purposes and is optional. The name must be a valid textual global identifier (including the prefix '@').
A name binding has the following form:
opct | idt | lent | i8 | i8 | ... |
---|---|---|---|---|---|
0x08 | id | nbytes | byte1 | byte2 | ... |
id
is the ID to bind. nbytes
is the number of bytes in the name and
bytex
is the value of each byte.
The name is encoded in ASCII and must follow the rules of global names, local names and allowed characters as defined in uvm-ir.rest.