You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Performance comparison with Python, PyPy, Ruby, JS, Scala, OCAML
on x86-64, AARH64, ARM, PPC64.
Some history
1993: Original language design and implementation
Was used in russian computer game company ANIMATEK as a simple
scripting language for describing dinosaurus movements
1998, 2002, 2007, 2016 : Major language and implementation revisions
This document describes the current state of Dino language and
its implementation.
The first taste of Dino
Eratosthenes sieve:
var i, prime, k, count = 0, SieveSize = 8191, flags = [SieveSize : 1];
for (i = 0; i < SieveSize; i++)
if (flags[i]) {
prime = i + i + 3;
k = i + prime;
for (;;) {
if (k >= SieveSize)
break;
flags[k] = 0;
k += prime;
}
count++;
}
putln (count);
DINO as a scripting language
Dino aims to look like C language
High-Level scripting object-oriented language:
Multi-precision integers
Heterogeneous extensible arrays, array slices
Associative tables with possibility to delete elements
Powerful and safe class composition operation for (multiple)
inheritance and traits description
First class functions, classes, and fibers with closures,
anonymous functions, classes, fibers
Exception handling
Concurrency
Pattern matching
Unicode 8 support
Arrays and Tables
Associative tables
elements can be added and deleted
elements can be any values, e.g. other tables
Implemented as hash tables without buckets for compactness
and data locality
Secondary hash for conflict resolutions
Murmur hash function for most values
Array Slices
Eratosthenes sieve with slices:
var i, prime, count = 0, SieveSize = 8191, flags = [SieveSize : 1];
for (i = 0; i < SieveSize; i++)
if (flags[i]) {
prime = i + i + 3;
flags[i + prime:SieveSize:prime] = 0;
count++;
}
putln (count);
Functions
Example:
fun even;
fun odd (i) {i == 0 ? 0 : even (i - 1);}
fun even (i) {i == 0 ? 1 : odd (i - 1);}
putln (odd (1_000_000));
fiber t (n) {for (var i = 0; i < n; i++) putln (i);}
t(100); // the following code don't wait for t finish
for (var i = 0; i < 1000; i++) putln (“main”, i);
Implemented as green threads:
Much faster than OS threads with Global Interpreter Lock (Python/Ruby)
Deterministic behaviour and automatic deadlock recognition
Plans to implement through OS threads without GIL for parallelism
There is a low level sync-statement wait
wait (cond) [stmt];
Simple statements are atomic
Object orientation
Class is just a special type of function:
Returns closure, public visibility by default
class num (i) {fun print {put (i);}}
class binop (l, r) { fun print_op;
fun print {l.print(); print_op (); r.print ();}
}
Special class/function composition:
emulates (multiple) inheritance, traits, and dynamic dispatching
class add (l, r) {
use binop former l, r later print_op;
fun print_op {put (“ + “);}
}
Object orientation -- continuation
A safe and powerful way to support object orientation
Declarations of class mentioned in use are inlayed
Declarations before the use rewrite corresponding inlayed
declarations mentioned in former-clause
Declarations after the use rewrite corresponding inserted
declarations mentioned in later-clause
The declarations should be matched
The original and new declarations should be present
if they are in former- or later-clause
The original declaration can be renamed and still used
instead of just rewriting if necessary
Object orientation -- continuation
Special function isa to check subtyping of class or object:
isa (add, binop);
isa (add (num (1), num (2)), binop);
Optimization for removing code duplication
Syntactic sugar for a singleton object (it is implemented as
anonymous class and corresponding object creation)
obj int_pair {
val min = 0, max = 10;
}
Object orientation -- continuation
Optimizations for access to object members permit to use
objects as name spaces
obj sorts {
var compare_fun;
fun quick_sort (...) {...}
fun heap_sort (...) {...}
}
...
sorts.fft (...);
To make access to object more brief an expose-clause exists
Exposing an object member
expose sorts.quick_sort;
quick_sort (...);
You can have a member access with a different name
expose sorts.quick_sort (asort);
asort (...);
You can expose all public declarations of an object
Dino has 5 standard spaces (singleton objects) avalaible by default
to any Dino program
lang -- provides interface to fundamental Dino features
io -- provides interface for input/output and to work with file system
sys -- provides interace to some features of underlying OS
math -- mostly provides some mathematical functions
re -- provides regular expression matching functions
yaep -- provides interface to Yet Another Earley Parser
All items in spaces lang and io are always exposed
If you redefine some exposed item, you still can have access to it
as a member of the space.
Pattern matching
Pattern can be
pattern matching anything _
pattern variable matching any value, e.g. a
the value is assigned to the variable
vector pattern matching vectors, e.g. [v, _, 5, ...]
table pattern matching tables, e.g. tab ["a": v, ...]
object pattern matching objects of given class or
its derived class, e.g. node (a, 10)
... in a pattern list matching zero or more values
expression which looks different from the above and matches the equal value
pattern can be nested, e.g. node ([v, ...], 10)
Pattern matching in variable declaration, e.g. var [a, b, ...] = v;
v should be a vector with at least 2 elements, new declared variables
a and b will hold values of the two first elements
Pattern matching -- pmatch statement
pmatch (v) {
case [...]: putln ("array"); continue;
case [a, ...]: if (a == 0) break; putln ("array with non-zero 1st element");
case node (v) if v != 0: putln ("object of class node with nozero parameter");
case _: putln ("any but array");
}
Try to match value with case patterns in a particular order,
execute the corresponding code for the first matched pattern
Scope of pattern variables is the corresponding case
Continue means continue to try subsequent patterns
Break means finish the match statement execution
There is an implicit break at the end of each case
Example: classes and functions with pattern matching
Simple binary tree and its check:
class tree {}
class leaf (i) {use tree;}
class node (l, r) {use tree;}
fun exists_leaf (test, t) {
pmatch (t) {
case leaf (v): return test (v);
case node (l, r):
return exists_leaf (test, l) || exists_leaf (test, r);
}
}
fun has_odd_leaf (t) {
exists_leaf (fun (n) {type (n) == int && n % 2 == 1;}, t);
}
Regular expression matching -- rmatch statement
rmatch (str) {
case "[a-zA-Z]+": putln ("word starting at ", m[0]);
case "[0-9]+": putln ("number starting at ", m[0]);
case _: putln ("anything else, m is undefined");
}
Try to match string with case regular expressions in a particular order,
execute the corresponding code for the first matched regular expression
Continue and break statements behave the same way as
in pattern match statement
Exception handling
Exceptions are objects of sub-classes of class except
Exceptions can be generated by Dino interpreter, e.g. floating point exception
or by throw-statement:
class my_except (msg) {use except;}
throw my_except ("my special exceptions");
Exceptions can be processed by try-block or try-operator
Exceptions are propagated to previous blocks on the block stack until they are processed
Unprocessed exceptions finish the program execution
Exception handling -- continuation
Try-block
Exception occurring inside the block is processed in the first catch-block
whose class mentioned in the catch clauses is a super-class of the processed exception class
The processed exception is in variable e implicitly defined in
the corresponding catch-block
If there is no matched catch-block, the exception is propagated further
try {
var ln;
for (;;) {
var ln = getln (); putln (ln);
}
} catch (eof) { putln ("end of file"); }
Try-operator
The operator returns non-zero if no exceptions occurs in the statement given as the first argument
The operator returns zero if an exception occurs and its class is a sub-class (see isa)
of one exception class given by the subsequent arguments
If there is no matched argument class, the exception is propagated further
var ln;
for (; try (ln = getln (), eof);) putln (ln);
In the previous example, try (ln = getln (), eof) can be considered as
abbreviation of anonymous function call:
Simple escape analysis to transform heap allocations into stack ones
Combination of Mark and Sweep and fast Mark and Copy algorithm
permitting to decrease program memory requirement
Implementation -- Continuation
JIT
Function Level for functions marked by hint (! jit)
fun fact (n) !jit {n <=1 ? 1 : n * fact (n - 1);}
JIT details:
Triggered by the first call
Portable implementation through C code generation
memory file system is used (can be persistent memory in future)
option --save-temps can be used for the C code inspecting
Usage of the same code as interpreter to simplify implementation
C code generator is less 100 lines on C
a script used to minimize the code (about 1/10 of C code
interpreter definitions are used for generated code.)
Small function code generation takes about 50-70ms
using GCC on modern Intel CPUs
Implementation -- Type Inference
Dino is dynamic type programming language
Still many types of operations can be recognized during compilation time:
E.g. general Bcode add can be changed by iadd (integer variant)
or fadd (floating point variant) which are executed
without operand type checking
Type recognition (inference) is very important for better
object code generation, especially for JIT
It can speed up code in many times
Implementation -- Type Inference 2
Major steps:
Building CFG (control flow graph) of all program:
basic blocks and CFG edges connecting them
Calculating available results of Bcode insns -- a forward
data-flow problem on CFG
Using the availability, building def-use chains connecting
operands and results of Bcode insns and variables
Calculating types of Bcode insn operands and results -- another
forward data flow problem on the built def-use graph
Changing Bcode insns on specialized ones, e.g. add on iadd
Implementation -- Type Inference 3
Major complications:
Higher order functions
Closures
Threads
Possible use of variable (undefined) value before assigning a value to it
Therefore we don't recognize all types theoretically possible to recognize
Recognizing types of variable values even if the variable changes
its type -- difference to type inference in static type languages
Implementation -- Tools and libraries
Dino is implemented with COCOM tools
SPRUT - compiler of IR object oriented description. Used for
implementation of semantic IR, Bcode and run-time data.
In debugging mode, it permits to check all described constraints and relations
MSTA - faster superset of YACC with better error recovery
SHILKA - fast keyword recognizer generator
AMMUNITION - different packages (source position handling, error reporting,
Earley parser etc)
** Calls *** Time **** Name **************************************
761087 0.43 -- search1: "meteor.d": 229
561264 0.07 -- ctz: "meteor.d": 28
1260 0.01 -- GoodPiece: "meteor.d": 37
...
0.51 -- All Program
Adding hints: !inline for ctz and !jit for search1
** Calls *** Time **** Name **************************************
761087 0.15 -- search1: "meteor.d": 229
...
0 0.00 -- ctz: "meteor.d": 28
...
0.17 -- All Program
Implementation -- C Interface
Dino has declarations to describe C functions and variables external
to Dino programs. For example, the following describes external
variable v and function f
extern v, f ();
The variable v in C code will be of type val_t. The function
f in C code will have teh following prototype
val_t f (int npars, val_t *args);
The file d_api.h provides C descriptions of Dino internals (the
type val_t, functions to create vectors, tables etc). The file is
generated from SPRUT description d_extern.d
The external C code is responsible for providing correct Dino
values
There are two ways to use C code: pre-compiled and compiled on the
fly
External variables and functions in C code preliminary compiled
as shared objects can be used if Dino knows where the objects are located
The option -L provides such knowledge. For example, options
-L/home/dino/obj1.so -L../obj2.so says Dino to load the shared
objects
Externals will be searched in some standard objects first, then in
objects provided by options -L in the same order as they stay on
the command line
A standard Dino external C code in file d_socket.c from Dino
sources can be used as an example of pre-compiled C code
Implementation -- C Interface 3
C code compiled on the fly looks in Dino code like
%{
...
val_t f (int n_pars, val_t *args) {
...
}
%}
extern f ();
val r = f(10);
All C code between pairs of brackets %{ and %} in one Dino file
is concatenated
The result code with pre-appended code from d_api.h is compiled
when the execution the first time achieves the location of the first
%{
The result shared object is loaded and external variables and
functions are searched lately also in this object as it was
described in pre-compiled C code
To check all C code before execution you can use option --check
Code Metrics
sloccount output as of 2/18/2016 for Dino + tools:
Totals grouped by language (dominant language first):
sh: 265452 (54.10%)
ansic: 194472 (39.64%)
yacc: 23297 (4.75%)
cpp: 7403 (1.51%)
Dino directory only:
Totals grouped by language (dominant language first):
sh: 161561 (62.13%)
ansic: 95124 (36.58%)
yacc: 3365 (1.29%)
Benchmarking -- Programs
Some old computer language shootout benchmarks:
loop - empty loop body
hash - associative tables
fact, fib - factorial and fibonacci (recursive functions with and
without tail recursion)
A lot of research was done on Dino implementation
(see article about this)
It can be used to improve performance of popular dynamic language implementations:
to make them faster
to make them more portable
to require less resources (compilation time and memory)
to have very quick start up and big compiler speed
Future directions of research
Type annotation. Two goals:
More cases for compile type checking which is in a direction of
Dino language development (introduction of early undefined value
recognition, the same number of actual and formal parameter
numbers etc.)
Faster code generated by JIT
Light-weight direct JIT
Using GCC for JIT is portable but too heavy for some system as CYGWIN
Direct JIT from bytecode to machine instructions is necessary
Goal is very fast code generation and simple fast optimizations
to decrease memory traffic
With type inference and type annotation the direct JIT can
achives 1/2-1/3 of speed optimized C code for majority of programs