-
Notifications
You must be signed in to change notification settings - Fork 0
Types and their representations
Users should hardly ever need to write types or declarations. This is more an aspect of the implementation, and a matter of precisely documenting how various important types work and how type inference works. However, we want the types involved in inference, user-defined types, and core operations (like array indexing) to be the same things.
Users of dynamic languages often observe that optional type declarations would allow them to fix many of their performance problems. However, few systems have actually implemented this (the only real example I can think of is Common Lisp). The problem is that providing only a few simple types forces users to pick between flexibility and performance. The only sensible way to solve this problem is to provide a fairly sophisticated type system. The question is whether an elaborate run-time type system can perform well.
You can think of a type as an object that can identify when a value belongs to it, and when another type is a subtype of it. There are multiple kinds of types, each characterized by its definition of these predicates. We have 6 kinds of types:
- tuple types, (a,b,c,...). have any number of constituent types. covariant in their constituent types.
- function types a-->b. have 2 constituent types. contravariant in a, covariant in b.
- union types, Union(a,b,c,...). the set-union of several types.
- tag types. These have a name, a supertype, and some user-defined data. Tag types implement user-defined disjoint types with explicit type hierarchy. A value belongs to tag type T if it is marked with that tag or one of its explicit subtypes, and subtype relations are determined just by looking at the supertype pointers and matching on the user-defined-data part. Tags are a lot like interfaces, and we could imagine them specifying methods in the future.
- struct or record types. this is a *subkind* of Tag that adds named fields, and new() and convert() functions. these are the concrete types a user can define to actually hold data. a struct type T is concrete because you can say T.new(...) to make an instance. other types simply lack this field, so you naturally cannot make an instance.
- "bits" types. like a struct type, except instead of named fields its representation is a string of N bits.
Tuple Symbol TagKind StructKind FuncKind UnionKind BitsKind
Given those, we make some primitive types:
Any Type TypeName TypeVar TypeConstructor Buffer[T]
Then the following can be defined normally:
...[T] Bottom Tensor, Scalar, Number, Real, Int, Float Boolean Int8, Uint8, Int16, Uint16, Int32, Uint32, Int64, Uint64, Float32, Float64 Size
Size is a pointer-sized integer, equivalent to size_t in C.
Buffer is a primitive array, a pair of a pointer and a length.
Bottom is the empty set of values; there can be no direct or indirect instances of it and it is a subtype of every type. It is identical to Union().
Tuple is a statically-sized heterogeneous container where the type of each element is known. This type is for declaring things like "this function returns three integers". If you want to return an arbitrary number of integers, you would use a list or an array.
Notice that every real value is either a tuple, struct, or bitstring. A tuple has a length and a sequence of values, and its type is self-described (the type of a tuple is a tuple of the types of its elements). A struct is a tag plus a fixed number of values, and the tag tells you the type and the names and types of the fields. A bitstring is a tag (as in a struct) plus bytes of data.
A function is a bit different. Its representation is the same as a struct (consisting of a function pointer, closure data, and metadata), but its type describes what the function does instead of describing the representation. Not important, because functions are opaque to the programmer and function types will not really be used in julia 1.0.
type Real <: Number struct Complex[a] <: Number real:: a imag:: a end struct List[a] length:: Size offset:: Size items:: Buffer[a] end struct Array[a,ndims] <: Tensor[a,ndims] dims:: Buffer[Size] data:: Buffer[a] end
The syntax a <: b creates a subtype.
Here is a simple specialization of a function for a type:
function eltype(arr::Array) return typeof(arr).parameters[0]; end
typealias Vector[T] Tensor[T,1]
Kind of! Depends on your definition.
Since we have generic functions, the above is an object system. But I don't think of it this way because it is a bit more primitive than that; all we have here is tagged tuples. It's the base of an object system rather than a complete one. You can think of these types as C structs. To allocate one, you would use a primitive like new(Complex(double),1,2) where the arguments are a type and field values. The user would typically never write code like this; it would be wrapped in more convenient functions.
The types described above do not have constructors, methods, or private data. My reasoning is that if all fields are public, you don't need getters and setters or any other methods; generic functions suffice. If objects are immutable you have no problem.
But there are still three more features you want: constructors, private data, and inheritance. For this I propose layering a dynamic object system on top, similar to Julia 1's. The idea is to have class definition syntax that declares a type and creates a constructor to do the allocation, initialize fields, assign methods (which would be function-valued fields), plus any custom logic.
Example:
class Rectangle(x, y, width, height, color=rgb(0,0,0)) inherits Shape() public x, y, width, height, color function self.move(dx, dy) self.x += dx self.y += dy end end
This would expand into something like:
struct Rectangle x:: Any y:: Any width:: Any height:: Any color:: Any move:: Function end function Rectangle(x, y, width, height, color=rgb(0,0,0), self=()) function move(dx, dy) self.x += dx self.y += dy end if isnull(self) self = Rectangle.new(x,y,width,height,color,move) end Shape(self=self) # call superclass constructor return self end
Private fields are simply local variables of the constructor, which are captured by normal lexical scope.
With optional and keyword arguments, there is little need to have multiple constructors.
boxed representation: typetag refcount fields...
we can automatically inline space for small primitive vectors
constructors will be automatically generated
each type will have an automatically generated traversal routine that knows where to find pointers and that automatically redirects self-internal pointers (to handle inline allocations)
Do we need an Object type to distinguish tuples and unions from other types? Do we need Object as the root for user-defined dynamic objects?
Do we need arrow (function) types? (Is this ML with Matlab syntax? That doesn't sound like too bad an idea actually.) We might use them internally during inference. The big trouble is that while you can tell the type of any other value just by looking at it, you can't for functions! If somebody declared an argument to have type int->int, it seems it would be restricted to functions actually declared that way.
Should we expose both generic and non-generic functions to the user? The issue is that overloading certain primitive functions does not make sense; for example if get-field is overloaded you can't access the actual fields of an object, so you want it to be a "simple" function not subject to dispatch.
A strange (and possibly over-complicated) idea: give the user some control of where the JIT specializes by compiling type-specific code for types not derived from Object, and dynamic dispatch code for types derived from Object.