Mu has a comprehensive type system. It is close to the machine level, but also has reference types for exact garbage collection.
In the Mu IR, a type is created by a (possibly recursive) combination of type constructors.
By convention, types are written in lower cases. Parameters to types are written
in angular brackets < >
.
A Mu type defines a set where a data value, or value when unambiguous, is one of its elements.
Both SSA variables and the Mu memory can hold values in this type system. Some restrictions can limit what type a variable or a memory location can hold.
A type constructor represents an abstract type. A concrete type is created by applying a type constructor and supplying necessary parameters. The following type constructors are available in Mu:
- int < length >
- float
- double
- uptr < T >
- ufuncptr < sig >
- struct < T1 T2 ... >
- hybrid < F1 F2 ... V >
- array < T length >
- vector < T length >
- void
- ref < T >
- iref < T >
- weakref < T >
- tagref64
- funcref < sig >
- threadref
- stackref
- framecursorref
- irnoderef
For C programmers: In Mu IR, types cannot be "inlined", i.e. all types referenced by other definitions (such as other types, constants, globals, functions, ...) must be defined at top-level. For example:
.typedef @refi64 = ref<int<64>> // WRONG. Cannot write int<64> inside. .typedef @i64 = int<64> .typedef @refi64 = ref<@i64> // Right. %sum = FADD <double> %a %b // WRONG. "double" is a type constructor, not a type .typedef @double = double %sum = FADD <@double> %a %b // Right.
Parameters of a type are in the angular brackets. They can be integer literals, types and function signatures. In the text form, the latter two are global names (See uvm-ir.rest).
There are several kinds of types.
float
anddouble
are floating point types.ref
andweakref
are object referenct types.ref
,iref
andweakref
are reference types.funcref
,threadref
,stackref
,framecursorref
andirnoderef
are opaque reference types.- Reference types and opaque reference types are general reference types.
int
,float
,double
, pointer types, general reference types andtagref64
are scalar types.struct
,hybrid
,array
andvector
are composite types.void
is neither a scalar type nor a composite type.
hybrid
is the only variable-length type. All other types are fixed-length types.uptr
andufuncptr
are pointer types.int
, pointer types,ref
,iref
and opaque reference types are EQ-comparable types.int
,iref
and pointer types are ULT-comparable types.ref<T>
is the strong variant ofweakref<T>
;weakref<T>
is the weak variant ofref<T>
. All other types are the strong variant or weak variant of themselves.
A member of a composite type T is either a field of T if T is a struct, or an element of T if T is an array or a vector, or a field in the fixed part or any element in the variable part if T is a hybrid. A component of type T is either itself or a member of any component of T.
NOTE: This means a component is anything in a type, including itself and any arbitrarily nested members.
The type parameter T in uptr<T>
and the return type and all parameter
types of sig in ufuncptr<sig>
must be native-safe. It is defined as
following:
void
,int<n>
,float
anddouble
are native-safe.struct<T1 T2 ...>
,array<T n>
,vec<T n>
andhybrid<F1 F2 ... V>
are native-safe if all of their type arguments T1, T2, ..., T, F1, F2, ..., V are native-safe.uptr<T>
andufuncptr<sig>
are native-safe if T and the return type and all parameter types in sig are native-safe. Otherwise theuptr
or theufuncptr
type is not well-formed.- All other types are not native-safe. (Specifically, they are all general
reference types as well as
struct
,array
orhybrid
that contains them.)
int
<
length >
- length
- integer literal: The length of the integer in bits.
int
is an integer type of length bits.
int
is neutral to signedness. Negative numbers are represented in the 2's
complement notation where the highest bit is the sign bit.
int<1>
is a Boolean type, in which case 1 means true and 0 means false.
NOTE: The signedness of anint
type is determined by the operations rather than the type. For example,UDIV
treats both operands as unsigned numbers,SDIV
treats both operands as signed numbers andASHR
treats the first operand as signed and the second operand as unsigned.
NOTE: Althoughint<1>
is required andint<6>
andint<52>
are also required whentagref64
is implemented, they should not be part of any in-memory structure because their correspondingLOAD
andSTORE
operations are not required for Mu implementations.int<1>
is meant to represent register flags, such as the result of comparison and some overflow or carry flags.int<6>
andint<52>
are supposed to be used transiently when packing or unpacking atagref64
value.
For LLVM users: these types are directly borrowed from LLVM.
Example:
.typedef @i1 = int<1> .typedef @i8 = int<8> .typedef @i16 = int<16> .typedef @i32 = int<32> .typedef @i64 = int<64>
float
double
float
and double
are the IEEE754 single-precision and double-precision
floating point number type, respectively.
For LLVM users: these types are directly borrowed from LLVM.
Example:
.typedef @float = float .typedef @double = double
uptr < T >
T
- type: The type of the referent.
ufuncptr < sig >
sig
- function signature: The signature of the pointed function.
uptr
and ufuncptr
are (untraced) pointer types which are represented by
the integral address of the referent. They are part of the (unsafe) native
interface. The "u" in their names stand for "untraced". Their values are not
affected by the garbage collection, even if their addresses are obtained from
pinning heap objects which are later unpinned.
uptr
is the data pointer type. It points to a region in the native address
space which represents the data type T.
ufuncptr
is the function pointer type. It points to a native function whose
signature is sig.
The type parameter T and both the return types and the parameter types of sig must be native-safe. It is implementation-defined whether multiple return values are allowed for a particular calling convention.
For LLVM users:
uptr<T>
is the counterpart of pointer typesT*
.ufuncptr<sig>
is the counterpart of function pointersR (P1 P2 ...)*
. ThePTRCAST
instruction can cast between different pointer types as well as integer types.For C users: Similar to LLVM,
uptr
andufuncptr
are the equivalent of C pointers to objects and functions, respectively. However, since Mu interfaces with the native world at the ABI level rather than the C programming language level, pointers are defined as addresses and casting between pointers and integers has semantics.
Example:
.typedef @i32 = int<32> // int .typedef @i32_p = uptr<@i32> // int* // ssize_t write(int fildes, const void *buf, size_t nbyte); // See man (2) write. .typedef @void = void // void .typedef @void_p = uptr<@void> // void* .typedef @size_t = int<64> // size_t, ssize_t .funcsig @write_s = (@i32 @void_p @size_t) -> (@size_t) .typedef @write_fp = ufuncptr<@write_s> // @write_fp may point to the native function "write". // typedef void (*sig_t) (int); // sig_t signal(int sig, sig_t func); // See man (3) signal. .funcsig @sig_s = (@i32) -> () .typedef @sig_t = ufuncptr<@sig_s> .funcsig @signal_s = (@i32 @sig_t) -> (@sig_t) .typedef @signal_fp = ufuncptr<@signal_s> // @signal_fp may point to the native function "signal".
struct
<
T1 T2 ... >
- T1, T2, ...
- type: The type of fields.
A struct
is a Cartesian product type of several types. T1, T2, ... are
its field types. A struct
must have at least one member. struct
members cannot be void
.
NOTE: For C programs: C does not allow empty structures, either, but many programmers create empty structures in practice. C++ does allow empty classes. g++ treats empty classes as having onechar
element. In Mu, if it is desired to allocate an empty unit in the heap, the appropriate type isvoid
.
A struct
cannot have itself as a component.
NOTE: If it could, the struct would be infinitely large. However, a struct may contain a reference to the same struct type, since the size of references is not dictated by the thing it points to. For example:
.typedef @foo = struct <@i32 @foo> // WRONG. @foo is infinitely big. .typedef @foo = struct <@i32 @fooref> .typedef @fooref = ref<@foo> // Okay. It is a linked list.
struct
cannot be the type of an SSA variable if any of its field types
cannot be the type of an SSA variable.
NOTE: For example, astruct
with aweakref
field cannot be the type of an SSA variable. However, there can be references to such structs.
For LLVM users: This is the same as LLVM's structure type, except structures
with a "flexible array member" (a 0-length array as the last element)
corresponds to the hybrid
type in Mu.
Example:
.typedef @byte = int<8> .typedef @short = int<16> .typedef @int = int<32> .typedef @f = float .typedef @d = double .typedef @struct1 = struct<@byte @short @int @f @d> .typedef @struct2 = struct<@f @f @struct1 @d @d> // nesting structs
hybrid
<
F1 F2 ... V >
- F1, F2, ...
- list of types: The types in the fixed part
- V
- type: The type of the elements of the variable part
A hybrid is a combination of a fixed-length prefix, i.e. its fixed part
, and
a variable-length array suffix, i.e. its variable part
, whose length is
decided at allocation time. F1 F2 ... are the types of fields in the fixed
part. V is the type of the elements of the variable part.
NOTE: This is intended to play the part of "struct with flexible array
member" in C99, i.e. struct { F1 f1; F2 f2; ... V v[]; }
.
The fixed part may contain 0 fields. In this case, the fixed part is empty, and
the variable part is in the beginning of this hybrid
memory location, like a
variable-length array without a header. The variable part cannot be omitted.
Neither any fixed-part field nor V can be void
.
hybrid
cannot be the type of any SSA variable.
NOTE: There can be references to hybrids.
hybrid
is the only type in Mu whose length is determined at allocation site
rather than determined by the type itself.
hybrid
cannot be contained in any other composite types, including other
hybrids.
NOTE: Since the length of hybrids are only known at allocation time, allowing embedding hybrid members will make the size of other types variable. In Mu's design, hybrid is the only type whose length is determined at allocation site.
For C programmers: Just like
struct { F f; V v[]; }
cannot be embedded in other types,hybrid
cannot, either. However, pointers/references to hybrids are allowed.
Example:
.typedef @byte = int<8> .typedef @long = int<64> .typedef @double = double .typedef @struct1 = struct<@long @long @long> .typedef @hybrid1 = hybrid<@long @byte> // one int<64> followed by many int<8> .typedef @hybrid2 = hybrid<@long @long @long @double> // three int<64>, followed by many double .typedef @hybrid3 = hybrid<@struct1 @double> // similar to @hybrid2, but using struct as the header .typedef @hybrid4 = hybrid<@byte> // no fixed-part header. Just many int<8>.
array
<
T length >
- T
- type: The type of elements.
- length
- integer literal: The number of elements.
An array
is a sequence of values of the same type. T is its element
type, i.e. the type of its elements, and length is the length of the array.
T must not be void
. An array must have at least one element.
An array
cannot have itself as a component.
It is not recommended to have SSA variables of array
type.
NOTE: The most useful feature of arrays is indexing by a variable index. But an SSA variable has more in common with registers than memory, and SSA variables are designed to be allocated in registers when possible. Mu implementations, as supposed to be minimal, may not be able to implement indexing more efficiently than storing the array to the memory and load back and element.
Using arrays as an SSA variable is most useful when passing a
struct
value (not pointer) that contains an array member to a C function that requires such a parameter, although such C functions are, themselves, very rare.
For LLVM users: Like LLVM arrays, a Mu array must have a size, but cannot
have size 0. The closest counterpart of the "variable length array" (VLA)
type in C is the hybrid
type.
Example:
.typedef @u8 = int<8> .typedef @real = double .typedef @cmpx = struct<@real @real> .typedef @array1 = array<@u8 4096> // array of 4096 bytes .typedef @array2 = array<@real 100> // array of 100 doubles .typedef @array3 = array<@cmpx 16> // array of 16 structs .typedef @array4 = array<@array2 1024> // array of 1024 nested arrays
vector < T length >
T
- type: The type of elements.
length
- integer literal: The number of elements.
vector
is the vector type for single-instruction multiple-data (SIMD)
operations. A vector
value is a packed value of multiple values of the same
type. T is the type of its elements and length is the number of elements.
T cannot be void. length must be at least one.
It is allowed to have SSA variables of vector types.
Only some primitive element types, such as int<32>
, float
and
double
, are required for implementations. If the
implementation allows other types, then any vector cannot directly or indirectly
contain itself as a member.
For LLVM users: This is the counterpart of the LLVM vector type.
Example:
.typedef @i32 = int<32> .typedef @float = float .typedef @double = double .typedef @vector1 = vector<@i32 4> .typedef @vector2 = vector<@float 4> .typedef @vector3 = vector<@double 2> .typedef @vector4 = vector<@double 4>
void
The void
type has no value.
It can only be used as the type of allocation units that do not store any
values. This allows allocating void
in the heap/stack/global memory.
Particularly, the NEW
instruction with the type void
creates a new empty
heap object which is not the same as any others. This is similar to the new
Object()
expression in Java. ref<void>
, iref<void>
, weakref<void>
and uptr<void>
are also allowed, which can refer/point to "anything".
ref
<
T >
iref
<
T >
weakref
<
T >
- T
- type: The type of referent.
ref
is an object reference type. A ref
value is a strong reference to a
heap objects.
iref
is an internal reference type. An iref
value is an internal
reference to a memory location.
weakref
is a weak object reference type. It is the weak variant of ref
.
A memory location of weakref
holds a weak reference to a heap object and can
be clear to NULL
by the garbage collector when there is no strong references
to the object the weakref
value refers to.
NOTE: There is no weak internal reference.
The type parameter T
is the referent type, which is the type of the heap
object or memory location its value refers to.
All reference types can have NULL
value which does not refer to any heap
object or memory location.
Weakref can only be the type of a memory location, not an SSA variable. When a
weakref
location is loaded from, the result is a ref
to the same object;
when a ref
is stored to a weakref
location, the location holds a
weakref
to that object.
NOTE: Allowing SSA variables to hold weak references may cause many problems. The semantic allows the GC to change it to
NULL
at any time as long as the GC decides the referent object is no longer reachable. For this reason, it is impossible to guarantee a weak reference isNULL
before accessing. Consider the following program:%entry(): %notnull = NE <@RefT> %weakref @NULLREF // Just at this moment, GC changed %weakref to NULL BRANCH2 %notnull %bb_cont(%weakref) %bb_abnormal(...) %bb_cont(<@WeakRefT> %weakref): %val = LOAD <@T> %weakref // null reference access ...GC may clear the weak reference right after the program decided it is not
NULL
.Requiring an explicit conversion from
weakref<T>
toref<T>
is not very useful. In that case, the only operation allowed forweakref<T>
is to convert toref<T>
.So letting this conversion happen implicitly during memory access is a natural choice, though not intuitive at all. In Mu's conceptual model, a memory load is like an IO operation: it does not simply "get" the value (such as an object reference) in the memory, but is a communication with the memory system and queries the global state. So it is natural for a load operation to return a different value each time executed.
For LLVM users: there is no equivalence in LLVM. Mu guarantees that all references are identified both in the heap and in the stack and are subject to garbage collection. The closest counterpart in LLVM is the pointer type. Mu does not encourage the use of pointers, though pointer types will be introduced in Mu in the future.
Example:
.typedef @i8 = int<8> .typedef @i16 = int<16> .typedef @i32 = int<32> .typedef @float = float .typedef @double = double .typedef @some_struct = struct<@i32 @i16 @i8 @double @float> .typedef @some_array = array<@i8 100> .typedef @ref1 = ref<@i32> .typedef @ref2 = ref<@some_struct> .typedef @ref3 = ref<@some_array> .typedef @iref1 = iref<@i32> .typedef @iref2 = iref<@some_struct> .typedef @iref3 = iref<@some_array> .typedef @weakref1 = weakref<@i32> .typedef @weakref2 = weakref<@some_struct> .typedef @weakref3 = weakref<@some_array>
tagref64
tagref64
is a union type of double
, int<52>
and struct<ref<void>
int<6>
. It occupies 64 bits. A tagref64
value holds both a state which
identifies the type it is currently representing and a value of that type.
NOTE: When atagref64
contains an object reference, it can hold anint<6>
together as a user-defined tag. It is useful to store type information.
When a tagref64
represents a double
NaN value, it does not preserve the
bit-wise representation of the NaN.
NOTE: This type is intended to reuse the NaN space of the IEEE754 double value to multiplex with integers and object references. For this reason, when storing NaN values, it will still be NaN, but may not have the same bit representation.
NOTE: This type is only available on some architectures including x86-64 with 48-bit addresses.
funcref
<
sig >
- sig
- function signature: The signature of the referred function.
funcref
is a function reference type. It is an opaque reference to a Mu
function and is not interchangeable with reference types. sig is the signature
of the function it refers to.
A NULL
value of a funcref
type does not refer to any function.
NOTE: The value of afuncref
may refer to a function that is declared but not defined. The value of afuncref
type does not change even the function it refers to becomes defined or redefined.
For C and LLVM users: Thefuncref
type is similar to the "pointer to function" type in C and LLVM, but it only refer to Mu functions. It is not a pointer (see theufuncptr
type). It may be implemented under the hood as a pointer to a function, which will be an implementation detail.
Example:
.typedef @i64 = int<64> .funcsig @sig1 = (@i64 @i64) -> (@i64) .funcsig @sig2 = () -> () .typedef @func1 = funcref<@sig1> .typedef @func2 = funcref<@sig2>
threadref
stackref
framecursorref
irnoderef
These types are opaque references to things within Mu. They are not
interchangeable with reference types. Only some special instructions (e.g.
@uvm.new_stack
, NEWTHREAD
, @uvm.meta.new_cursor
) or API calls can
operate on them.
All opaque reference values can be NULL
, which does not refer to anything.
threadref
and stackref
refer to Mu Threads and Mu stacks, respectively.
They are used to manipulate the threads and stacks. In
particular, stackref
is used in the SWAPSTACK
instruction. stackref
is not a pointer to the top of the stack. It refers to the same stack even if
frames are added or removed.
framecursorref
refers to to frame cursors. A frame cursor is an internal
structure used by the stack introspection API to iterate through stack frames.
Its content is mutable but opaque. See Threads and Stacks for more details.
irnoderef
refers to a Mu IR node being constructed by the IR Builder API.