HAIL may not be the best tool. The most efficient way to initialise a micro VM is by building a boot image (this is implementation-specific). The most efficient way to load objects from a serialised file is to build the de-serialiser (such as the ".class" file parser) in Mu IR.
The Heap Allocation and Initialisation Language (HAIL) is a Mu IR-like language that allocates heap objects and initialise Mu memory locations with values.
It is designed to initialise language-specific objects, such as class
meta-objects (e.g. the java.lang.Class
objects and the virtual tables in JVM
created during class loading), heap-allocated constant string objects, and
language-level constants which are implemented as Mu-level global cells (because
Mu does not allow "constant object references").
HAIL should be faster than initialising the memory via the client API, and more
space-efficient than a naively implemented Mu function which creates and
initialises objects by executing a (usually very long) sequence of NEW
and
STORE
instructions. But keep in mind that is not the only efficient method.
A well-written Mu program can read from a serialised file (e.g. the ".class"
file in Java) and interpret it in a similar way the Mu micro VM interprets the
HAIL script. The client can also rely on object pinning and initialise objects
via pointers, bypassing the handle-based API.
A HAIL script has a text format and a binary format. The text format is similar to the text-based Mu IR, and the binary format is similar to the binary form Mu IR.
This document uses EBNF to define the text-form syntax. A non-terminal starts
with a capital letter and a terminal starts with a lower-case letter. Literal
characters are quoted within pairs of '
or "
. *
means repeating 0 or
more times. +
means repeating 1 or more times. ?
means optional. |
means either its left-hand-side or its right-hand-side. |
has the lowest
precedence than simple concatenation; unary suffixes have the highest
precedence. (
and )
group terms together to override the precedence.
[
and ]
denote a set of characters.
Comments start with two slashes //
and ends at the end of the line. White
spaces between lexicons are ignored.
In HAIL, a HAIL name, i.e. the name of a heap-allocated object, start with a
dollar sign $
followed by one or more characters in the set:
[0-9a-zA-Z_-.]
:
hailName ::= '$' [0-9a-zA-Z_-.]+
The scope of a HAIL name is within a single HAIL script. In other words, they are temporary, and they become invalid as soon as the HAIL script is fully evaluated. Storing them to global cells is one way to keep references to the allocated objects.
Global name, integer literal, floating point literal and null
literal are defined the same way as in Mu IR. They are
denoted as globalName
, intLit
, floatLit
, 'NULL'
, respectively.
An LValue specifies a memory location:
LValue ::= Name Index* Name ::= globalName | hailName Index ::= '[' IntExpr ']' IntExpr ::= intLit | globalName
It is either a component of a global cell (when Name
is a globalName
),
or a component of a newly allocated heap object in the current HAIL script (when
Name
is a hailName
). If the name appears alone, the memory location is
the global cell or the heap object itself. Its fields or elements can be
selected using indices. The index can be an integer literal (intLit
) or a
global name of a Mu constant of int<n>
type of any n
, treated as
unsigned integer.
If a memory location l
holds a struct or hybrid, then l[n]
is its n-th
field (n = 0, 1, 2...). Specifically, for hybrid, it means the n-th field in the
fixed part. If n equals the number of fixed-part fields, it selects the variable
part. In such cases, l[n][m]
is the m-th element of the variable part.
If a memory location l
holds an array or a vector, then l[n]
is its n-th
element (n = 0, 1, 2...).
An RValue specifies a Mu value:
RValue ::= globalName | intLit | floatLit | ``'NULL'`` | hailName | '&' LValue | '*' LValue | List List ::= '{' RValue* '}'
It can be the address of an LValue, in which case the value is one of:
- A Mu global SSA variable (constant, global cell, function, exposed function)
(
globalName
) - An integer literal (
intLit
) - A floating point literal (
floatLit
) - The
NULL
value of an appropriate type ('NULL'
) - An object reference to an object just created in HAIL (
hailName
) - An internal reference of an LValue (
'&' LValue
) - The current value held at the memory location of an LValue (
'*' LValue
) - A list of 0 or more RValue.
See memory initialisation below for more details.
Top-level definitions in HAIL include fixed object allocation, hybrid allocation and memory initialisation. All object allocations are evaluated before any memory initialisation definitions are evaluated.
A fixed object allocation allocates a fixed-size object:
FixedAlloc ::= '.new' hailName '<' Type '>' Type ::= globalName
where hailName
is a HAIL name of the allocated object, and Type
is the
global name of the type of the object. Type
must not be a hybrid
type.
A hybrid allocation allocates a hybrid:
HybridAlloc ::= '.newhybrid' hailName '<' Type '>' IntExpr
where hailName
and Type
are the name and the type. IntExpr
specifies
the length of the variable part of the hybrid, which can either be an integer
literal, or a Mu int<n>
constant of any n
, treated as unsigned integer.
Type
must be a hybrid
type.
A memory initialisation initialises a memory location:
MemInit ::= '.init' LValue = RValue
RValue
must be appropriate for the LValue
type. Specifically, the star
notation *LValue
copies the value from the memory location of the LValue
after the *
. It is applicable to all types as long as the type matches the
LValue
being written to. In addition:
- If
LValue
isint<n>
,uptr<T>
orufuncptr<sig>
, thenRValue
can be anintLit
, a constant of the sameLValue
type. - If
LValue
isfloat
ordouble
, thenRValue
can be afloatLit
or a Mu constant of the same type asLValue
. - If
LValue
isref<T>
orweakref<T>
, thenRValue
can beNULL
or a HAIL name of a newly created object. - If
LValue
isiref<T>
, thenRValue
can beNULL
, the global name of a global cell, or anLValue
(with the&
sign). ImplicitREFCAST
applies. - If
LValue
isfuncref<sig>
, thenRValue
can beNULL
or the global name of a Mu function. ImplicitREFCAST
applies. - If
LValue
isstackref
orthreadref
, then the only applicableRValue
isNULL
. - If
LValue
istagref64
, thenRValue
can be the appropriate value suitable fordouble
,int<52>
orstruct<ref<void> int<6>>
. - If
LValue
is a struct, hybrid, array or vector, thenRValue
must be aList
ofRValue
items. Each item will initialise a field or element of the composite type. The entire variable part of a hybrid is treated as one additional field to its fixed part fields, and is treated as an array of the actual length. The list may have less fields/elements of theLValue
, in which case only the first fields/elements are initialised, and others remain their old values. (Note: All newly allocated memory locations, including heap objects, stack cells and global cells, have initial values: 0 or NULL.)
When assigning to an LValue of ref<T>
, weakref<T>
, iref<T>
,
funcref<sig>
, uptr<T>
or ufuncptr<sig>
types, if the RValue only
differs in the T
or sig
parameter, then implicit REFCAST
or
PTRCAST
are applied. weakref
and ref
can be assigned to each other.
PTRCAST
can only change the type/sig parameters T
and sig
, but not
the base type int
, uptr
and ufuncptr
. (Note: This makes sub-class
instances assignable to a location that refers to a super-class instance.)
Multiple top-level definitions are applied in the order they appear in the HAIL
script. In order to deal with cyclic references, it is advisable to put .new
and .newhybrid
before .init
.
All loads (via *LValue
) and stores (via .init
) are non-atomic. In
.init
, it has undefined behaviour if any values in the LValue
in the
left-hand-side of the =
is accessed by the right-hand-side RValue
.
NOTE: This is to say, don't load from the memory location being initialised because the implementation may write into the left-hand-side in any order.
Example 1:
// Assume the following definitions in a previously loaded Mu IR bundle. // .typedef @i8 = int<8> // .typedef @i32 = int<32> // .typedef @i64 = int<64> // .typedef @float = float // .typedef @double = double // // .typedef @NakedArray = hybrid<@i8> // no fields in the fixed part // .typedef @LengthedArray = hybrid<@i32 @i8> // no fields in the fixed part // .typedef @JavaStyleArray = hybrid<@TID @i32 @i8> // two fields in the fiexed part // // .typedef @TID = int<64> // .typedef @SmallFloatArray = array<@float 4> // .typedef @irefi64 = iref<@i64> // .typedef @Object = struct<@TID @SmallFloatArray @double @irefi64> // // .typedef @vtable = hybrid<...> // .typedef @vtable_r = ref<@vtable> // .global @g_vtable <@vtable_r> // // .typedef @Object2 = struct<@vtable_r @i64> // // .typedef @LinkedList = struct<@i64 @LinkedList_r> // .typedef @LinkedList_r = ref<@LinkedList> // // .typedef @irefi64 = iref<@i64> // // .const @MAGICAL_NUMBER <@i64> = 42 // .const @PI <@double> = 3.14d // // .global @my_global <@i64> // .global @a_global_iref_cell <@irefi64> // .global @another_global_iref_cell <@irefi64> // // .global @my_favourite_linked_list_node <@LinkedList> // .new $my_long_obj <@i64> .init $my_long_obj = 0x123456789abcdef0 .newhybrid $my_array1 <@NakedArray> 4 .newhybrid $my_array2 <@LengthedArray> 10000 .newhybrid $my_array3 <@JavaStyleArray> @MAGICAL_NUMBER .init $my_array1 = {1 2 3 4} .init $my_array2 = {100 // claim length 100, while the capacity is really 10000 {0 1 2 3 4}} // Only init 5 elems .init $my_array2[1][99] = 99 // Also init the 99-th elem .init $my_array3 = {1001 42 {1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2}} .new $my_obj <@Object> .init $my_obj = {@MAGICAL_NUMBER {1.0f 2.0f 3.0f 4.0f} @PI @my_global} .new $my_obj2 <@Object2> // This object has a pointer to an existing v-table allocated before loading // this HAIL script. A reference to the v-table is held in the global cell @g_vtable. // The star notation *@g_vtable loads the value from an LValue and assign it // to the field. .init $my_obj2 = {*@g_vtable 42} .new $node0 <@LinkedList> .new $node1 <@LinkedList> .new $node2 <@LinkedList> .init $node0 = {0 $node1} // All objects are allocated before init. .init $node1 = {1 $node0} // so they can form a ring .init $node2 = {2 NULL} // Isolated node // Global cells can be initialised, too. .init @my_global = -1 // @a_global_iref_cell will hold an iref<@i64> to the global cell @my_global .init @a_global_iref_cell = &@myglobal // Equivalent. The global variable @myglobal is already an iref. .init @a_global_iref_cell = @myglobal .new $foo <@i64> // This refers into the gut of $foo .init @another_global_iref_cell = &$foo // In fact, all objects except $node0 and $node1 will become garbages after // this HAIL script is fully evaluated. .init @my_favourite_linked_list_node = $node0
Example 2: String constant initialisation. In order to keep references to these objects, we need to store them to global cells:
// Assume the following Java code: // System.out.println("Hello world!"); // // We want to create a String object for the string literal "Hello world!". // In a real JVM, more strings would be created for class names and method // names for reflection. // // We assume the Java class loader defines the String like this: // .typedef @RefString = ref <@String> // .typedef @String = struct <@TID @RefCharArray @i32 @i32> // tid, buf, begin, size // .typedef @RefCharArray = ref <@CharArray> // .typedef @CharArray = hybrid <@ArrayHeader @i16> // header, elements // .typedef @ArrayHeader = struct <@TID @i32> // tid, length // // It makes a global cell to store a reference to the String: // .global @const_hello_world <@RefString> // // Then we can create and initialise the string in HAIL: .new $hw <@String> // The String object .newhybrid $hwbuf <@CharArray> 12 // The underlying array .init $hw = {0xabcd $hwbuf 0 12} .init $hwbuf = {{0x1234 12} {0x48 0x65 0x6c 0x6c 0x6f 0x20 0x77 0x6f 0x72 0x6c 0x64 0x21}} .init @const_hello_world = $hw // Store it to the global cell. // Then out.println("Hello world!") can be compiled to: // %1 = LOAD <@RefString> @const_hello_world // CALL <@sig1> @PrintStream.println (%out %1) // in the real world it may need dynamic dispatching
A binary HAIL script starts with a 4-byte magic 'x7f' 'H' 'A' 'I', or 0x7f 0x48 0x41 0x49.
HAIL IDs are the counterpart of HAIL names. HAIL IDs are 32-bit integers. 0 is an invalid HAIL ID. HAIL ID has a different namespace from Mu IDs, i.e. they refer to different things even if their values are equal. HAIL IDs only refer to heap-allocated objects in the current HAIL script.
In the following paragraphs, binary types defined in Mu IR Binary Form are used. For convenience, we use "hID" for HAIL ID and "mID" for Mu ID.
A fixed object allocation definition has the form:
opct | idt | idt |
---|---|---|
0x01 | hID | type |
hID is the HAIL ID of the object. type is the Mu ID of the type.
A variable-length object allocation definition has the form:
opct | idt | idt | i64 |
---|---|---|---|
0x02 | hID | type | length |
hID is the HAIL ID of the object. type is the Mu ID of the type. length is the length of the variable part.
A memory initialisation definition has the form:
opct | LValue | RValue |
---|---|---|
0x03 | LValue | RValue |
LValue:
Name | lent | IntExpr | ... |
---|---|---|---|
Name | n | IntExpr | ... |
n is the number of IntExpr following.
Name:
opct | idt |
---|---|
tag | id |
If tag is 0x04, id is the HAIL ID; if tag is 0x05, id is the Mu ID.
IntExpr can be intLit or a global name (Name with tag=2)
intLit:
opct | i64 |
---|---|
0x12 | lit |
lit is the literal. There is currently no way to express integer literals longer than 64 bits. lit is truncated or zero-extended to the LValue length.
RValue can be one of the following:
- A Mu global SSA variable:
opct | idt |
---|---|
0x11 | gv |
gv it the ID of the global SSA variable.
- An integer literal (see intLit above)
- A 32-bit float literal
opct | float |
---|---|
0x13 | value |
- A 64-bit float literal
opct | double |
---|---|
0x14 | value |
- A
NULL
literal
opct |
---|
0x15 |
- An object reference to an object allocated in HAIL
opct | idt |
---|---|
0x16 | id |
id is the HAIL id.
- An internal reference of an LValue
opct | LValue |
---|---|
0x17 | LValue |
- A list of other values of any kinds.
opct | i64 | RValue | RValue | ... |
---|---|---|---|---|
0x18 | nelems | rv1 | rv2 | ... |
nelems is the number of RValues following it. This structure is recursive.
- A list of other values of the same kind of literals.
opct | i64 | opct | literal | literal | ... |
---|---|---|---|---|---|
0x19 | nelems | kind | lit1 | lit2 | ... |
nelems is the number of literals following. kind can be one in the following table, and kind determines the following literal element type.
opct | literal type |
0x12 | i64 |
0x13 | float |
0x14 | double |
0x1a | i8 |
0x1b | i16 |
0x1c | i32 |
This allows more compact encoding of large arrays of simple elements. The literal type, however, does not need to match the actual type of the LValue, because implicit truncating or zero-extension always happen.
The binary format is not the most efficient format possible. HAIL is still an interpreted format, even when it is binary. It is designed to be convenient, reasonably efficient and platform-independent.
There could be implementation-specific ways of serialising data faster than this portable interface.
The native interface can also potentially outperform HAIL. Using object pinning
and pointers, Mu IR programs can directly memcpy data from files, such as
copying strings from .class
files.