Skip to content
This repository has been archived by the owner on Aug 2, 2019. It is now read-only.

Latest commit

 

History

History
667 lines (475 loc) · 23 KB

type-system.rest

File metadata and controls

667 lines (475 loc) · 23 KB

Type System

Overview

Mu has a comprehensive type system. It is close to the machine level, but also has reference types for exact garbage collection.

In the Mu IR, a type is created by a (possibly recursive) combination of type constructors.

By convention, types are written in lower cases. Parameters to types are written in angular brackets < >.

Type and Data Value

A Mu type defines a set where a data value, or value when unambiguous, is one of its elements.

Both SSA variables and the Mu memory can hold values in this type system. Some restrictions can limit what type a variable or a memory location can hold.

Types and Type Constructors

A type constructor represents an abstract type. A concrete type is created by applying a type constructor and supplying necessary parameters. The following type constructors are available in Mu:

  • int < length >
  • float
  • double
  • uptr < T >
  • ufuncptr < sig >
  • struct < T1 T2 ... >
  • hybrid < F1 F2 ... V >
  • array < T length >
  • vector < T length >
  • void
  • ref < T >
  • iref < T >
  • weakref < T >
  • tagref64
  • funcref < sig >
  • threadref
  • stackref
  • framecursorref
  • irnoderef

For C programmers: In Mu IR, types cannot be "inlined", i.e. all types referenced by other definitions (such as other types, constants, globals, functions, ...) must be defined at top-level. For example:

.typedef @refi64 = ref<int<64>>     // WRONG. Cannot write int<64> inside.

.typedef @i64 = int<64>
.typedef @refi64 = ref<@i64>        // Right.

%sum = FADD <double> %a %b          // WRONG. "double" is a type constructor, not a type

.typedef @double = double
%sum = FADD <@double> %a %b         // Right.

Parameters of a type are in the angular brackets. They can be integer literals, types and function signatures. In the text form, the latter two are global names (See uvm-ir.rest).

There are several kinds of types.

  • float and double are floating point types.
  • ref and weakref are object referenct types.
  • ref, iref and weakref are reference types.
  • funcref, threadref, stackref, framecursorref and irnoderef are opaque reference types.
  • Reference types and opaque reference types are general reference types.
  • int, float, double, pointer types, general reference types and tagref64 are scalar types.
  • struct, hybrid, array and vector are composite types.
    • void is neither a scalar type nor a composite type.
  • hybrid is the only variable-length type. All other types are fixed-length types.
  • uptr and ufuncptr are pointer types.
  • int, pointer types, ref, iref and opaque reference types are EQ-comparable types.
  • int, iref and pointer types are ULT-comparable types.
  • ref<T> is the strong variant of weakref<T>; weakref<T> is the weak variant of ref<T>. All other types are the strong variant or weak variant of themselves.

A member of a composite type T is either a field of T if T is a struct, or an element of T if T is an array or a vector, or a field in the fixed part or any element in the variable part if T is a hybrid. A component of type T is either itself or a member of any component of T.

NOTE: This means a component is anything in a type, including itself and any arbitrarily nested members.

The type parameter T in uptr<T> and the return type and all parameter types of sig in ufuncptr<sig> must be native-safe. It is defined as following:

  • void, int<n>, float and double are native-safe.
  • struct<T1 T2 ...>, array<T n>, vec<T n> and hybrid<F1 F2 ... V> are native-safe if all of their type arguments T1, T2, ..., T, F1, F2, ..., V are native-safe.
  • uptr<T> and ufuncptr<sig> are native-safe if T and the return type and all parameter types in sig are native-safe. Otherwise the uptr or the ufuncptr type is not well-formed.
  • All other types are not native-safe. (Specifically, they are all general reference types as well as struct, array or hybrid that contains them.)

Primitive Non-reference Types

Integer Types

int < length >

length
integer literal: The length of the integer in bits.

int is an integer type of length bits.

int is neutral to signedness. Negative numbers are represented in the 2's complement notation where the highest bit is the sign bit.

int<1> is a Boolean type, in which case 1 means true and 0 means false.

NOTE: The signedness of an int type is determined by the operations rather than the type. For example, UDIV treats both operands as unsigned numbers, SDIV treats both operands as signed numbers and ASHR treats the first operand as signed and the second operand as unsigned.
NOTE: Although int<1> is required and int<6> and int<52> are also required when tagref64 is implemented, they should not be part of any in-memory structure because their corresponding LOAD and STORE operations are not required for Mu implementations. int<1> is meant to represent register flags, such as the result of comparison and some overflow or carry flags. int<6> and int<52> are supposed to be used transiently when packing or unpacking a tagref64 value.
For LLVM users: these types are directly borrowed from LLVM.

Example:

.typedef @i1 = int<1>
.typedef @i8 = int<8>
.typedef @i16 = int<16>
.typedef @i32 = int<32>
.typedef @i64 = int<64>

Floating Point Types

float

double

float and double are the IEEE754 single-precision and double-precision floating point number type, respectively.

For LLVM users: these types are directly borrowed from LLVM.

Example:

.typedef @float = float
.typedef @double = double

Pointer Types

uptr < T >

T
type: The type of the referent.

ufuncptr < sig >

sig
function signature: The signature of the pointed function.

uptr and ufuncptr are (untraced) pointer types which are represented by the integral address of the referent. They are part of the (unsafe) native interface. The "u" in their names stand for "untraced". Their values are not affected by the garbage collection, even if their addresses are obtained from pinning heap objects which are later unpinned.

uptr is the data pointer type. It points to a region in the native address space which represents the data type T.

ufuncptr is the function pointer type. It points to a native function whose signature is sig.

The type parameter T and both the return types and the parameter types of sig must be native-safe. It is implementation-defined whether multiple return values are allowed for a particular calling convention.

For LLVM users: uptr<T> is the counterpart of pointer types T*. ufuncptr<sig> is the counterpart of function pointers R (P1 P2 ...)*. The PTRCAST instruction can cast between different pointer types as well as integer types.

For C users: Similar to LLVM, uptr and ufuncptr are the equivalent of C pointers to objects and functions, respectively. However, since Mu interfaces with the native world at the ABI level rather than the C programming language level, pointers are defined as addresses and casting between pointers and integers has semantics.

Example:

.typedef @i32           = int<32>       // int
.typedef @i32_p         = uptr<@i32>    // int*

// ssize_t write(int fildes, const void *buf, size_t nbyte);
// See man (2) write.
.typedef @void          = void          // void
.typedef @void_p        = uptr<@void>   // void*
.typedef @size_t        = int<64>       // size_t, ssize_t
.funcsig @write_s       = (@i32 @void_p @size_t) -> (@size_t)
.typedef @write_fp      = ufuncptr<@write_s>
// @write_fp may point to the native function "write".

// typedef void (*sig_t) (int);
// sig_t signal(int sig, sig_t func);
// See man (3) signal.
.funcsig @sig_s         = (@i32) -> ()
.typedef @sig_t         = ufuncptr<@sig_s>
.funcsig @signal_s      = (@i32 @sig_t) -> (@sig_t)
.typedef @signal_fp     = ufuncptr<@signal_s>
// @signal_fp may point to the native function "signal".

Aggregate Types

Struct

struct < T1 T2 ... >

T1, T2, ...
type: The type of fields.

A struct is a Cartesian product type of several types. T1, T2, ... are its field types. A struct must have at least one member. struct members cannot be void.

NOTE: For C programs: C does not allow empty structures, either, but many programmers create empty structures in practice. C++ does allow empty classes. g++ treats empty classes as having one char element. In Mu, if it is desired to allocate an empty unit in the heap, the appropriate type is void.

A struct cannot have itself as a component.

NOTE: If it could, the struct would be infinitely large. However, a struct may contain a reference to the same struct type, since the size of references is not dictated by the thing it points to. For example:

.typedef @foo = struct <@i32 @foo>      // WRONG. @foo is infinitely big.

.typedef @foo = struct <@i32 @fooref>
.typedef @fooref = ref<@foo>            // Okay. It is a linked list.

struct cannot be the type of an SSA variable if any of its field types cannot be the type of an SSA variable.

NOTE: For example, a struct with a weakref field cannot be the type of an SSA variable. However, there can be references to such structs.
For LLVM users: This is the same as LLVM's structure type, except structures with a "flexible array member" (a 0-length array as the last element) corresponds to the hybrid type in Mu.

Example:

.typedef @byte = int<8>
.typedef @short = int<16>
.typedef @int = int<32>
.typedef @f = float
.typedef @d = double

.typedef @struct1 = struct<@byte @short @int @f @d>
.typedef @struct2 = struct<@f @f @struct1 @d @d> // nesting structs

Hybrid

hybrid < F1 F2 ... V >

F1, F2, ...
list of types: The types in the fixed part
V
type: The type of the elements of the variable part

A hybrid is a combination of a fixed-length prefix, i.e. its fixed part, and a variable-length array suffix, i.e. its variable part, whose length is decided at allocation time. F1 F2 ... are the types of fields in the fixed part. V is the type of the elements of the variable part.

NOTE: This is intended to play the part of "struct with flexible array member" in C99, i.e. struct { F1 f1; F2 f2; ... V v[]; }.

The fixed part may contain 0 fields. In this case, the fixed part is empty, and the variable part is in the beginning of this hybrid memory location, like a variable-length array without a header. The variable part cannot be omitted. Neither any fixed-part field nor V can be void.

hybrid cannot be the type of any SSA variable.

NOTE: There can be references to hybrids.

hybrid is the only type in Mu whose length is determined at allocation site rather than determined by the type itself.

hybrid cannot be contained in any other composite types, including other hybrids.

NOTE: Since the length of hybrids are only known at allocation time, allowing embedding hybrid members will make the size of other types variable. In Mu's design, hybrid is the only type whose length is determined at allocation site.

For C programmers: Just like struct { F f; V v[]; } cannot be embedded in other types, hybrid cannot, either. However, pointers/references to hybrids are allowed.

Example:

.typedef @byte = int<8>
.typedef @long = int<64>
.typedef @double = double

.typedef @struct1 = struct<@long @long @long>

.typedef @hybrid1 = hybrid<@long @byte>         // one int<64> followed by many int<8>
.typedef @hybrid2 = hybrid<@long @long @long @double>    // three int<64>, followed by many double
.typedef @hybrid3 = hybrid<@struct1 @double>    // similar to @hybrid2, but using struct as the header
.typedef @hybrid4 = hybrid<@byte>               // no fixed-part header. Just many int<8>.

Array

array < T length >

T
type: The type of elements.
length
integer literal: The number of elements.

An array is a sequence of values of the same type. T is its element type, i.e. the type of its elements, and length is the length of the array. T must not be void. An array must have at least one element.

An array cannot have itself as a component.

It is not recommended to have SSA variables of array type.

NOTE: The most useful feature of arrays is indexing by a variable index. But an SSA variable has more in common with registers than memory, and SSA variables are designed to be allocated in registers when possible. Mu implementations, as supposed to be minimal, may not be able to implement indexing more efficiently than storing the array to the memory and load back and element.

Using arrays as an SSA variable is most useful when passing a struct value (not pointer) that contains an array member to a C function that requires such a parameter, although such C functions are, themselves, very rare.

For LLVM users: Like LLVM arrays, a Mu array must have a size, but cannot have size 0. The closest counterpart of the "variable length array" (VLA) type in C is the hybrid type.

Example:

.typedef @u8 = int<8>
.typedef @real = double
.typedef @cmpx = struct<@real @real>

.typedef @array1 = array<@u8 4096>      // array of 4096 bytes
.typedef @array2 = array<@real 100>     // array of 100 doubles
.typedef @array3 = array<@cmpx 16>      // array of 16 structs
.typedef @array4 = array<@array2 1024>  // array of 1024 nested arrays

Vector Type

vector < T length >

T
type: The type of elements.
length
integer literal: The number of elements.

vector is the vector type for single-instruction multiple-data (SIMD) operations. A vector value is a packed value of multiple values of the same type. T is the type of its elements and length is the number of elements. T cannot be void. length must be at least one.

It is allowed to have SSA variables of vector types.

Only some primitive element types, such as int<32>, float and double, are required for implementations. If the implementation allows other types, then any vector cannot directly or indirectly contain itself as a member.

For LLVM users: This is the counterpart of the LLVM vector type.

Example:

.typedef @i32 = int<32>
.typedef @float = float
.typedef @double = double

.typedef @vector1 = vector<@i32 4>
.typedef @vector2 = vector<@float 4>
.typedef @vector3 = vector<@double 2>
.typedef @vector4 = vector<@double 4>

Void Type

void

The void type has no value.

It can only be used as the type of allocation units that do not store any values. This allows allocating void in the heap/stack/global memory. Particularly, the NEW instruction with the type void creates a new empty heap object which is not the same as any others. This is similar to the new Object() expression in Java. ref<void>, iref<void>, weakref<void> and uptr<void> are also allowed, which can refer/point to "anything".

Reference Types and General Reference Types

Reference Types

ref < T >

iref < T >

weakref < T >

T
type: The type of referent.

ref is an object reference type. A ref value is a strong reference to a heap objects.

iref is an internal reference type. An iref value is an internal reference to a memory location.

weakref is a weak object reference type. It is the weak variant of ref. A memory location of weakref holds a weak reference to a heap object and can be clear to NULL by the garbage collector when there is no strong references to the object the weakref value refers to.

NOTE: There is no weak internal reference.

The type parameter T is the referent type, which is the type of the heap object or memory location its value refers to.

All reference types can have NULL value which does not refer to any heap object or memory location.

Weakref can only be the type of a memory location, not an SSA variable. When a weakref location is loaded from, the result is a ref to the same object; when a ref is stored to a weakref location, the location holds a weakref to that object.

NOTE: Allowing SSA variables to hold weak references may cause many problems. The semantic allows the GC to change it to NULL at any time as long as the GC decides the referent object is no longer reachable. For this reason, it is impossible to guarantee a weak reference is NULL before accessing. Consider the following program:

%entry():
    %notnull = NE <@RefT> %weakref @NULLREF
    // Just at this moment, GC changed %weakref to NULL
    BRANCH2 %notnull %bb_cont(%weakref) %bb_abnormal(...)

%bb_cont(<@WeakRefT> %weakref):
    %val = LOAD <@T> %weakref       // null reference access
    ...

GC may clear the weak reference right after the program decided it is not NULL.

Requiring an explicit conversion from weakref<T> to ref<T> is not very useful. In that case, the only operation allowed for weakref<T> is to convert to ref<T>.

So letting this conversion happen implicitly during memory access is a natural choice, though not intuitive at all. In Mu's conceptual model, a memory load is like an IO operation: it does not simply "get" the value (such as an object reference) in the memory, but is a communication with the memory system and queries the global state. So it is natural for a load operation to return a different value each time executed.

For LLVM users: there is no equivalence in LLVM. Mu guarantees that all references are identified both in the heap and in the stack and are subject to garbage collection. The closest counterpart in LLVM is the pointer type. Mu does not encourage the use of pointers, though pointer types will be introduced in Mu in the future.

Example:

.typedef @i8  = int<8>
.typedef @i16 = int<16>
.typedef @i32 = int<32>
.typedef @float = float
.typedef @double = double
.typedef @some_struct = struct<@i32 @i16 @i8 @double @float>
.typedef @some_array = array<@i8 100>

.typedef @ref1 = ref<@i32>
.typedef @ref2 = ref<@some_struct>
.typedef @ref3 = ref<@some_array>
.typedef @iref1 = iref<@i32>
.typedef @iref2 = iref<@some_struct>
.typedef @iref3 = iref<@some_array>
.typedef @weakref1 = weakref<@i32>
.typedef @weakref2 = weakref<@some_struct>
.typedef @weakref3 = weakref<@some_array>

Tagged Reference

tagref64

tagref64 is a union type of double, int<52> and struct<ref<void> int<6>. It occupies 64 bits. A tagref64 value holds both a state which identifies the type it is currently representing and a value of that type.

NOTE: When a tagref64 contains an object reference, it can hold an int<6> together as a user-defined tag. It is useful to store type information.

When a tagref64 represents a double NaN value, it does not preserve the bit-wise representation of the NaN.

NOTE: This type is intended to reuse the NaN space of the IEEE754 double value to multiplex with integers and object references. For this reason, when storing NaN values, it will still be NaN, but may not have the same bit representation.
NOTE: This type is only available on some architectures including x86-64 with 48-bit addresses.

Function Reference Type

funcref < sig >

sig
function signature: The signature of the referred function.

funcref is a function reference type. It is an opaque reference to a Mu function and is not interchangeable with reference types. sig is the signature of the function it refers to.

A NULL value of a funcref type does not refer to any function.

NOTE: The value of a funcref may refer to a function that is declared but not defined. The value of a funcref type does not change even the function it refers to becomes defined or redefined.
For C and LLVM users: The funcref type is similar to the "pointer to function" type in C and LLVM, but it only refer to Mu functions. It is not a pointer (see the ufuncptr type). It may be implemented under the hood as a pointer to a function, which will be an implementation detail.

Example:

.typedef @i64 = int<64>

.funcsig @sig1 = (@i64 @i64) -> (@i64)
.funcsig @sig2 = () -> ()

.typedef @func1 = funcref<@sig1>
.typedef @func2 = funcref<@sig2>

Other Opaque Reference Types

threadref

stackref

framecursorref

irnoderef

These types are opaque references to things within Mu. They are not interchangeable with reference types. Only some special instructions (e.g. @uvm.new_stack, NEWTHREAD, @uvm.meta.new_cursor) or API calls can operate on them.

All opaque reference values can be NULL, which does not refer to anything.

threadref and stackref refer to Mu Threads and Mu stacks, respectively. They are used to manipulate the threads and stacks. In particular, stackref is used in the SWAPSTACK instruction. stackref is not a pointer to the top of the stack. It refers to the same stack even if frames are added or removed.

framecursorref refers to to frame cursors. A frame cursor is an internal structure used by the stack introspection API to iterate through stack frames. Its content is mutable but opaque. See Threads and Stacks for more details.

irnoderef refers to a Mu IR node being constructed by the IR Builder API.