Skip to content
This repository has been archived by the owner on Aug 2, 2019. It is now read-only.

Latest commit

 

History

History
352 lines (255 loc) · 10.7 KB

uvm-ir-binary.rest

File metadata and controls

352 lines (255 loc) · 10.7 KB

Intermediate Representation (Binary Form)

This document describes the binary form of the Mu intermediate representation. For the text form, see uvm-ir.rest.

DEPRECATED: The binary format is deprecated. As mentioned in this ticket, we have come to the conclusion that the interface between the client and the micro VM should be a functional interface, i.e. constructing IR nodes by invoking API functions. This binary IR form is still a serialised data format that needs to be parsed. The text form, however, is still useful for debugging and for using in statically compiled implementations.

Overview

The Mu IR Binary Form is similar to the Text Form in structure, but has notable differences.

Numerical IDs are used exclusively instead of textual names. The binary form also provides a special "name binding" pseudo-top-level definition which associates IDs with names.

Binary format

A bundle in the binary form consists of many numbers encoded in bytes. All numbers are encoded in little endian and are tightly packed which means there are no padding bytes between two adjacent numbers. For floating point numbers, it is equivalent to convert them bit-by-bit into integer types of the same length and convert to bytes in little-endian.

Binary Types

A sequence of bytes has a binary type which maps the bytes to the value they represent. Possible binary types are:

i8, i16, i32, i64
Integer types of the respective lengths.
float, double
Floating point types of 32 bits and 64 bits, respectively.
idt
Alias to i32. Used for IDs.
lent
Alias to i16. Used for lengths of variable-length structures including the number of fields in a struct and the number of items in a parameter list.
aryszt
Alias to i64. Used for the length of arrays.
opct
Alias to i8. Used for instruction opcodes, operations or flags.
other structures
One structure can contain other structures defined separately.

A table is used to represent a contiguous structure. The first row is a list of binary types specifying the type of each column and the second row specifies for each column either a symbolic name for that field or the exact binary content expected. Such a structure consists of a sequence of numbers of the types of the first row.

type1 type2
num or symbolic name

Common Structures

Some structures are common in multiple structures.

ID List

An ID list, denoted as idList, is a list of IDs. It has the general form:

lent idt idt ...
nIDs id1 id2 ...

nIDs specifies the number of IDs and there are nIDs IDs following it.

Top-level Structure

A bundle starts with a 4-byte magic "x7F' 'U' 'I' 'R', or 0x7F 0x55 0x49 0x52. Then there are many top-level definitions until the end of the bundle.

Type Definition

Type definition has the following form:

opct idt type constructor
0x01 id cons

id is the identifier of the defined type. A type constructor follows the opcode 0x01 and the ID. See type-system.rest for a complete list of type constructors.

NOTE: this is equivalent to: .typedef id = cons.

Function Signature Definition

Function signature definition has the following form:

opct idt idList idList
0x02 id paramtys rettys

id is the identifier of the defined function signature. paramtys is a list of IDs of its parameter types. rettys is a list of IDs of the return types.

NOTE: this is equivalent to: .funcsig id = (paramtys) -> (rettys)

Constant Definition

Constant definition has the following form:

opct idt idt constant constructor
0x03 id type cons

id is the identifier of the defined constant. type is the type of the constant and must match the constant constructor. A constant constructor follows the type.

NOTE: this is equivalent to: .const id <type> = cons

Integer Constant Constructor

An integer constant constructor has the following form:

opct i64
0x01 number

number is the integer constant number. If the integer constant has a type with fewer bits, only the least significant bits are valid. The binary form cannot encode integer constants larger than 64 bits.

NOTE: this is equivalent to an integer literal in the text form.

Floating Point Constant Constructors

A float constant constructor has the following form:

opct float
0x02 number

number is the float constant number.

NOTE: this is equivalent to a float literal in the text form.

A double constant constructor has the following form:

opct double
0x03 number

number is the double constant number.

NOTE: this is equivalent to a double literal in the text form.

List Constant Constructor

A list constant constructor has the following form:

opct idList
0x04 elems

elems is a list of IDs, each of which refers to another constant which is the value of the corresponding field of the struct or element of array/vector.

NOTE: this is equivalent to the struct literal {elems} in the text form.

NULL Constant Constructor

A NULL constant constructor has the following form:

opct
0x05
NOTE: this is equivalent to the NULL keyword in the text form.

Global Cell Definition

Global cell definition has the following form:

opct idt idt
0x04 id type

id is the ID of the defined global cell. type is the type of the global cell.

NOTE: this is equivalent to: .global id <type>

Function Definition and Declaration

Function declaration has the following form:

opct idt idt
0x05 id sig

id is the ID of the declared function. sig is the function signature of it.

NOTE: this is equivalent to: .funcdecl id <sig>

Function definition has the following form:

opct idt idt idt function body
0x06 id verid sig body

id is the ID of the defined function. verid is the ID of the version of the function. sig is the function signature of it. params is a list of IDs, each of which is the ID of its parameter. body is the function body.

NOTE: this is equivalent to: .funcdef id VERSION verid <sig> { body }

Function Body

A function body has the following form:

lent basic block basic block ...
nbbs bb1 bb2 ...

nbbs is the number of basic blocks. bbx are basic blocks.

A basic block has the following form:

idt lent idPairs idt lent instruction instruction ...
id nparams params exc ninsts inst1 inst2 ...

id is the ID of the basic block. Every basic block must have an ID, even the entry block. nparams is the number of parameters in params, which a list of pairs of IDs, each of which is:

idt idt
type param

where type is the ID of the type of the parameter, and param is a parameter to the basic block.

exc is the ID of the exceptional parameter. It is omitted when the ID is 0.

ninsts is the number of instructions in the current basic block. There are ninsts instructions following the header.

An instruction has the following form:

idList idt instruction body
resIDs id instbody

resIDs is a list of IDS for the results. id is the ID of the instruction. instbody is instruction body which is specific to each instruction. See instruction-set.rest for an exhaustive list.

Function Exposing Definition

Function declaration has the following form:

opct idt idt opct idt
0x07 id func callConv cookie

id is the ID of the exposed value. func is the ID of the function to expose. callConv is the calling convention flag. cookie is the cookie, the ID of an int<64> constant.

Name Binding

Name binding is a definition specific to the binary form. It binds a name to an ID. It is designed for debugging purposes and is optional. The name must be a valid textual global identifier (including the prefix '@').

A name binding has the following form:

opct idt lent i8 i8 ...
0x08 id nbytes byte1 byte2 ...

id is the ID to bind. nbytes is the number of bytes in the name and bytex is the value of each byte.

The name is encoded in ASCII and must follow the rules of global names, local names and allowed characters as defined in uvm-ir.rest.