Skip to content
This repository has been archived by the owner on Aug 2, 2019. It is now read-only.

Heap Allocation and Initialisation Language (HAIL) #29

Open
wks opened this issue Mar 31, 2015 · 0 comments
Open

Heap Allocation and Initialisation Language (HAIL) #29

wks opened this issue Mar 31, 2015 · 0 comments

Comments

@wks
Copy link
Member

wks commented Mar 31, 2015

This proposal describes a language that allocates and initialises heap objects (and also global memory)

This proposal does not address initialiser function. It will be addressed in another issue.

Rationale

A code bundle (or simply "bundle" in our current terminology) contains types, function signatures, constants, global memory cells and functions. This is insufficient for a standalone Mu IR program.

A typical program usually contain statically declared and load-time initialised heap objects, e.g. strings, class objects (java.lang.Class) and so on. A developer from the PyPy project has indicated that there can be a lot of statically declared heap object. Currently those objects can be created and initialised in two ways:

  1. The client allocates and initialises heap objects via the Mu Client API. This approach suffers from one particular shortcoming: performance. The API can only initialise one memory location (e.g. one element of an array, or one scalar field of a struct) per API call.
  2. Include a particular function per bundle which creates and initialises heap objects. This approach has performance and complexity problems. This "function" must contain full description of all heap objects: their types, and the values of all (or some non-zero) fields, therefore the function can be huge. This information has to be encoded as Mu IR instructions and Mu IR constants, and the compiler has to translate this humongous "initialiser function" into runnable form and then execute it to make heap objects, and this function is executed only once. It is a waste of time and memory to compile such a one-shot function.

Solution

The proposed solution is a compact file format that describes heap objects and initialises the memory.

Sample:

Assume we have a "traditional" Mu IR Bundle:

.typedef @i64 = int<64>
.typedef @i8 = int<8>
.typedef @double = double
.typedef @string = hybrid <@i64 @i8>
.typedef @void = void
.typedef @refstring = ref<@string>
.typedef @refvoid = ref<@void>
.typedef @ClassFoo = struct<@i64 @double @refstring>
.typedef @intarray = hybrid<@i64 @i32>

.global @HW <@refstring> // A global memory cell, initialised to NULL, which may hold a string reference later.

After loading the previous bundle, load this Heap Allocation and Initialisation Language (HAIL) file:

// HAIL file
.new $a <@i64>     // A new object of just a number
.newhybrid $hw <@string> 12
.new $classFoo <@Foo>

.new $x <@refvoid>  // An object whose content is only a heap reference to void
.new $y <@refvoid>  // ditto

.newhybrid $hugeArray <@intarray> 10000

.init $a = 42
.init $hw = {12, {'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'}}
.init $classFoo = {42, 42.0d, $hw}   // Objects can directly refer to each other
.init $x = $y  // Objects are first allocated and then initialised
.init $y = $x  // So they can form circular references

.init @HW = $hw // @HW is a global cell declared in the previous "traditional" bundle. HAIL can initialise global cells in traditional bundles, too.

.init $hugeArray[5000] = 42  // Only initialise a particular elements. Other elements are 0.

// NOTE: only $hw is retailed because it is referenced by the global cell @HW. Other objects may immediately be garbage-collected (or not allocated at all if the Mu VM can "cheat")

Structure

Heap objects allocated in this form has a special sigil $ which is local to the current file.

A Heap Allocation and Initialisation Language (HAIL) file contains many of the following top-level definitions:

.new: Allocate scalar object in the heap. Has the form: .new $name <@type>

  • $name: the local name of the object.
  • @type: the type of the object.

.newhybrid: Allocate hybrid object in the heap. Has the form: .newhybrid $name <@type> length

  • $name, @type: same as ".new"
  • length: the length of the var part

.init: Initialise a heap object or a global cell. Has the form: .init name[sub1][sub2]... = val

  • name: The name of the heap object or global cell. In this format, heap objects use special sigils ($xxx) while global cells uses global names in the Mu IR (@xxx).
  • sub1, sub2, ...: Subscriptions. Ways to navigate through structs, arrays and hybrids. Specifically, in hybrid, the fixed part is 0 and the var part is 1.
  • val: The value. It can be one of the following:
    • Integer literals: 1, 24, -345, 0x456, 'H'
    • FP literals: 1.0f, 3.14d, nanf, nand, +infd, -infd, bitsd(0x7ff0000000000001)
    • Struct/array/hybrid literals: {elem0, elem1, elem2, ...}
    • NULL
    • other names (can be other heap objects of Mu IR constants, global cells (as internal references) and functions (as function references)): $hw @HW @main

Comparing to API-based object allocation and initialisation

A HAIL file is a unit of delivery to the Mu VM. Only one API call is needed to load a whole HAIL file and it can allocate and initialise many objects.

"Loading a HAIL file" will be a new API message (or function).

Performance Concerns

For better performance, this format should have a more compact binary format. Ideally the binary format can be very close to the in-memory representation of objects and require little more than copying data from the file to the memory and handle data sizes/padding/alignments. It cannot be perfectly identical to the in-memory representation because Mu's object layout is platform-dependent.

When to use the HAIL format

HAIL should be used when the client wishes to allocate many objects and bulk-initialise the memory. For example, when loading a Java .class file, a Mu IR bundle is loaded for the Java functions, and then a HAIL file is loaded to create/initalise the Class object, the virtual table, string literals and so on.

Another example: Assume there is a PyPy interpreter implemented on Mu IR. The executable PyPy interpreter is represented as Mu IR bundle, but a HAIL file can be used to initialise the interpreter instance and associated objects.

When HAIL may not be ideal

If the Mu VM is metacircular, the client is written in the Mu IR and accessing the Mu memory from the client will have no overhead. The HAIL format can still be implemented for compatible reason, but would not have any advantage in performance over direct memory accesses. For example, a metacircular Mu-based JVM can load a .class file and compile its methods to Mu IR, but the Class object can be created directly in the Mu IR because the JVM client itself is in Mu IR. It does not need to serialise the sequence of object allocations and initialisations into HAIL before doing them.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant