Intermediate Representation (Binary Form)

This document describes the binary form of the Mu intermediate representation. For the text form, see uvm-ir.rest.

DEPRECATED: The binary format is deprecated. As mentioned in this ticket, we have come to the conclusion that the interface between the client and the micro VM should be a functional interface, i.e. constructing IR nodes by invoking API functions. This binary IR form is still a serialised data format that needs to be parsed. The text form, however, is still useful for debugging and for using in statically compiled implementations.

Overview

The Mu IR Binary Form is similar to the Text Form in structure, but has notable differences.

Numerical IDs are used exclusively instead of textual names. The binary form also provides a special "name binding" pseudo-top-level definition which associates IDs with names.

Binary format

A bundle in the binary form consists of many numbers encoded in bytes. All numbers are encoded in little endian and are tightly packed which means there are no padding bytes between two adjacent numbers. For floating point numbers, it is equivalent to convert them bit-by-bit into integer types of the same length and convert to bytes in little-endian.

Binary Types

A sequence of bytes has a binary type which maps the bytes to the value they represent. Possible binary types are:

i8, i16, i32, i64: Integer types of the respective lengths.
float, double: Floating point types of 32 bits and 64 bits, respectively.
idt: Alias to i32. Used for IDs.
lent: Alias to i16. Used for lengths of variable-length structures including the number of fields in a struct and the number of items in a parameter list.
aryszt: Alias to i64. Used for the length of arrays.
opct: Alias to i8. Used for instruction opcodes, operations or flags.
other structures: One structure can contain other structures defined separately.

A table is used to represent a contiguous structure. The first row is a list of binary types specifying the type of each column and the second row specifies for each column either a symbolic name for that field or the exact binary content expected. Such a structure consists of a sequence of numbers of the types of the first row.

type1	type2
num	or symbolic name

Common Structures

Some structures are common in multiple structures.

ID List

An ID list, denoted as idList, is a list of IDs. It has the general form:

lent	idt	idt	...
nIDs	id1	id2	...

nIDs specifies the number of IDs and there are nIDs IDs following it.

Top-level Structure

A bundle starts with a 4-byte magic "x7F' 'U' 'I' 'R', or 0x7F 0x55 0x49 0x52. Then there are many top-level definitions until the end of the bundle.

Type Definition

Type definition has the following form:

opct	idt	type constructor
0x01	id	cons

id is the identifier of the defined type. A type constructor follows the opcode 0x01 and the ID. See type-system.rest for a complete list of type constructors.

NOTE: this is equivalent to: .typedef id = cons.

Function Signature Definition

Function signature definition has the following form:

opct	idt	idList	idList
0x02	id	paramtys	rettys

id is the identifier of the defined function signature. paramtys is a list of IDs of its parameter types. rettys is a list of IDs of the return types.

NOTE: this is equivalent to: .funcsig id = (paramtys) -> (rettys)

Constant Definition

Constant definition has the following form:

opct	idt	idt	constant constructor
0x03	id	type	cons

id is the identifier of the defined constant. type is the type of the constant and must match the constant constructor. A constant constructor follows the type.

NOTE: this is equivalent to: .const id <type> = cons

Integer Constant Constructor

An integer constant constructor has the following form:

opct	i64
0x01	number

number is the integer constant number. If the integer constant has a type with fewer bits, only the least significant bits are valid. The binary form cannot encode integer constants larger than 64 bits.

NOTE: this is equivalent to an integer literal in the text form.

Floating Point Constant Constructors

A float constant constructor has the following form:

opct	float
0x02	number

number is the float constant number.

NOTE: this is equivalent to a float literal in the text form.

A double constant constructor has the following form:

opct	double
0x03	number

number is the double constant number.

NOTE: this is equivalent to a double literal in the text form.

List Constant Constructor

A list constant constructor has the following form:

opct	idList
0x04	elems

elems is a list of IDs, each of which refers to another constant which is the value of the corresponding field of the struct or element of array/vector.

NOTE: this is equivalent to the struct literal {elems} in the text form.

NULL Constant Constructor

A NULL constant constructor has the following form:

opct
0x05

NOTE: this is equivalent to the NULL keyword in the text form.

Global Cell Definition

Global cell definition has the following form:

opct	idt	idt
0x04	id	type

id is the ID of the defined global cell. type is the type of the global cell.

NOTE: this is equivalent to: .global id <type>

Function Definition and Declaration

Function declaration has the following form:

opct	idt	idt
0x05	id	sig

id is the ID of the declared function. sig is the function signature of it.

NOTE: this is equivalent to: .funcdecl id <sig>

Function definition has the following form:

opct	idt	idt	idt	function body
0x06	id	verid	sig	body

id is the ID of the defined function. verid is the ID of the version of the function. sig is the function signature of it. params is a list of IDs, each of which is the ID of its parameter. body is the function body.

NOTE: this is equivalent to: .funcdef id VERSION verid <sig> { body }

Function Body

A function body has the following form:

lent	basic block	basic block	...
nbbs	bb1	bb2	...

nbbs is the number of basic blocks. bbx are basic blocks.

A basic block has the following form:

idt	lent	idPairs	idt	lent	instruction	instruction	...
id	nparams	params	exc	ninsts	inst1	inst2	...

id is the ID of the basic block. Every basic block must have an ID, even the entry block. nparams is the number of parameters in params, which a list of pairs of IDs, each of which is:

idt	idt
type	param

where type is the ID of the type of the parameter, and param is a parameter to the basic block.

exc is the ID of the exceptional parameter. It is omitted when the ID is 0.

ninsts is the number of instructions in the current basic block. There are ninsts instructions following the header.

An instruction has the following form:

idList	idt	instruction body
resIDs	id	instbody

resIDs is a list of IDS for the results. id is the ID of the instruction. instbody is instruction body which is specific to each instruction. See instruction-set.rest for an exhaustive list.

Function Exposing Definition

Function declaration has the following form:

opct	idt	idt	opct	idt
0x07	id	func	callConv	cookie

id is the ID of the exposed value. func is the ID of the function to expose. callConv is the calling convention flag. cookie is the cookie, the ID of an int<64> constant.

Name Binding

Name binding is a definition specific to the binary form. It binds a name to an ID. It is designed for debugging purposes and is optional. The name must be a valid textual global identifier (including the prefix '@').

A name binding has the following form:

opct	idt	lent	i8	i8	...
0x08	id	nbytes	byte1	byte2	...

id is the ID to bind. nbytes is the number of bytes in the name and bytex is the value of each byte.

The name is encoded in ASCII and must follow the rules of global names, local names and allowed characters as defined in uvm-ir.rest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!