Types and their representations

Table of Contents Philosophy Kinds of types Built-in types Concrete types Sample declarations Sample method Type aliases Is it object oriented? Implementation notes Questions

Philosophy

Users should hardly ever need to write types or declarations. This is more an aspect of the implementation, and a matter of precisely documenting how various important types work and how type inference works. However, we want the types involved in inference, user-defined types, and core operations (like array indexing) to be the same things.

Users of dynamic languages often observe that optional type declarations would allow them to fix many of their performance problems. However, few systems have actually implemented this (the only real example I can think of is Common Lisp). The problem is that providing only a few simple types forces users to pick between flexibility and performance. The only sensible way to solve this problem is to provide a fairly sophisticated type system. The question is whether an elaborate run-time type system can perform well.

Kinds of types

You can think of a type as an object that can identify when a value belongs to it, and when another type is a subtype of it. There are multiple kinds of types, each characterized by its definition of these predicates. We have 6 kinds of types:

tuple types, (a,b,c,...). have any number of constituent types. covariant in their constituent types.
function types a-->b. have 2 constituent types. contravariant in a, covariant in b.
union types, Union(a,b,c,...). the set-union of several types.
tag types. These have a name, a supertype, and some user-defined data. Tag types implement user-defined disjoint types with explicit type hierarchy. A value belongs to tag type T if it is marked with that tag or one of its explicit subtypes, and subtype relations are determined just by looking at the supertype pointers and matching on the user-defined-data part. Tags are a lot like interfaces, and we could imagine them specifying methods in the future.
struct or record types. this is a *subkind* of Tag that adds named fields, and new() and convert() functions. these are the concrete types a user can define to actually hold data. a struct type T is concrete because you can say T.new(...) to make an instance. other types simply lack this field, so you naturally cannot make an instance.
"bits" types. like a struct type, except instead of named fields its representation is a string of N bits.

Built-in types

Tuple
Symbol
TagKind
StructKind
FuncKind
UnionKind
BitsKind

Given those, we make some primitive types:

Any
Type
TypeName
TypeVar
TypeConstructor
Buffer[T]

Then the following can be defined normally:

...[T]
Bottom
Tensor, Scalar, Number, Real, Int, Float
Boolean
Int8, Uint8, Int16, Uint16, Int32, Uint32, Int64, Uint64, Float32, Float64
Size

Size is a pointer-sized integer, equivalent to size_t in C.

Buffer is a primitive array, a pair of a pointer and a length.

Bottom is the empty set of values; there can be no direct or indirect instances of it and it is a subtype of every type. It is identical to Union().

Tuple is a statically-sized heterogeneous container where the type of each element is known. This type is for declaring things like "this function returns three integers". If you want to return an arbitrary number of integers, you would use a list or an array.

Concrete types

Notice that every real value is either a tuple, struct, or bitstring. A tuple has a length and a sequence of values, and its type is self-described (the type of a tuple is a tuple of the types of its elements). A struct is a tag plus a fixed number of values, and the tag tells you the type and the names and types of the fields. A bitstring is a tag (as in a struct) plus bytes of data.

A function is a bit different. Its representation is the same as a struct (consisting of a function pointer, closure data, and metadata), but its type describes what the function does instead of describing the representation. Not important, because functions are opaque to the programmer and function types will not really be used in julia 1.0.

Sample declarations

type Real <: Number

struct Complex[a] <: Number
  real:: a
  imag:: a
end

struct List[a]
  length:: Size
  offset:: Size
  items:: Buffer[a]
end

struct Array[a,ndims] <: Tensor[a,ndims]
  dims:: Buffer[Size]
  data:: Buffer[a]
end

The syntax a <: b creates a subtype.

Sample method

Here is a simple specialization of a function for a type:

function eltype(arr::Array)
  return typeof(arr).parameters[0];
end

Type aliases

typealias Vector[T] Tensor[T,1]

Is it object oriented?

Kind of! Depends on your definition.

Since we have generic functions, the above is an object system. But I don't think of it this way because it is a bit more primitive than that; all we have here is tagged tuples. It's the base of an object system rather than a complete one. You can think of these types as C structs. To allocate one, you would use a primitive like new(Complex(double),1,2) where the arguments are a type and field values. The user would typically never write code like this; it would be wrapped in more convenient functions.

The types described above do not have constructors, methods, or private data. My reasoning is that if all fields are public, you don't need getters and setters or any other methods; generic functions suffice. If objects are immutable you have no problem.

But there are still three more features you want: constructors, private data, and inheritance. For this I propose layering a dynamic object system on top, similar to Julia 1's. The idea is to have class definition syntax that declares a type and creates a constructor to do the allocation, initialize fields, assign methods (which would be function-valued fields), plus any custom logic.

Example:

class Rectangle(x, y, width, height, color=rgb(0,0,0))
  inherits Shape()
  public x, y, width, height, color
  function self.move(dx, dy)
    self.x += dx
    self.y += dy
  end
end

This would expand into something like:

struct Rectangle
  x:: Any
  y:: Any
  width:: Any
  height:: Any
  color:: Any
  move:: Function
end

function Rectangle(x, y, width, height, color=rgb(0,0,0), self=())
  function move(dx, dy)
    self.x += dx
    self.y += dy
  end

  if isnull(self)
    self = Rectangle.new(x,y,width,height,color,move)
  end

  Shape(self=self)  # call superclass constructor

  return self
end

Private fields are simply local variables of the constructor, which are captured by normal lexical scope.

With optional and keyword arguments, there is little need to have multiple constructors.

Implementation notes

boxed representation: typetag refcount fields...

we can automatically inline space for small primitive vectors

constructors will be automatically generated

each type will have an automatically generated traversal routine that knows where to find pointers and that automatically redirects self-internal pointers (to handle inline allocations)

Questions

Do we need an Object type to distinguish tuples and unions from other types? Do we need Object as the root for user-defined dynamic objects?

Do we need arrow (function) types? (Is this ML with Matlab syntax? That doesn't sound like too bad an idea actually.) We might use them internally during inference. The big trouble is that while you can tell the type of any other value just by looking at it, you can't for functions! If somebody declared an argument to have type int->int, it seems it would be restricted to functions actually declared that way.

Should we expose both generic and non-generic functions to the user? The issue is that overloading certain primitive functions does not make sense; for example if get-field is overloaded you can't access the actual fields of an object, so you want it to be a "simple" function not subject to dispatch.

A strange (and possibly over-complicated) idea: give the user some control of where the JIT specializes by compiling type-specific code for types not derived from Object, and dynamic dispatch code for types derived from Object.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly