This library is part of NUMCL. It provides a macro `SPECIALIZED` that performs a Julia-like dispatch on the arguments, lazily compiling a type-specific version of the function from the same code. The main target of this macro is the speed.
It supports SBCL, CCL, and LispWorks (untested), and provides a limited support to other lisps.
On other lisps, the code inside specialized
is not able to read
the variables that are not specified to be the dispatched arguments.
This is because it relies on the implementation-specific interface to the
environment object to collect the list of all lexical variables.
NEWS (2019/09/28): lexical variables NOT specialized by the user are type-annotated inside specialized-function based on the information available from cltl2.
Suppose computing a vector dot:
(defun dot-original (a b c)
(declare (optimize (speed 3) (safety 0) (debug 0)))
(loop
for ai across a
for bi across b
do (incf c (* ai bi)))
c)
This function is untyped and thus slow due to the calls to generic-*
and
generic-+
. While we could write a macro that automatically duplicates this
code with some additional type declaration, notice that the number of types that
this function accepts is large — a
could be a simple or non-simple array of
4 float types, 4 float complex types, and the integer subtypes for the
specialized arrays, such as (unsigned-byte 4)
. On SBCL x64, there are at
least 56 variants – 2*(4+4+12+7+1) – and since there are 2 arguments, it
should compile more than 2500 specialized variants. This wastes the compilation
time because most combinations are not actually used in the user code.
Julia language has chosen a strategy that lazily compiles the typed variant of the function when the function was invoked with a new combination of types. Specialized-function provides this same functionality in Common Lisp — lazy compilation of the type-specific variant of the same function.
Note that this cannot be achieved by CLOS and the behavior is orthogonal/complimentary to the CLOS dispatch. One critical difference is that CLOS generally assumes that the different code/algorithm is used for implementing each method, while specialized-function assumes that all specialized functions share the same code.
Another reason it cannot be achieved by CLOS is that CLOS in general cannot specify the array element types, which is critical in the high-performance code (while it is subject to a debate — MOP can extend it to support array specifiers).
Since they are orthogonal, you could also combine CLOS and specialized-function as follows:
(defmethod print-all ((obj array) (s stream))
(specialized (obj) ()
(loop for c across obj do (write-char c s))))
(defmethod print-all ((obj list) (s stream))
(loop for elem in obj do (write elem s)))
Note that the first print-all
for an array
could dispatch between the
base-string
(with base-char
elements) and string
(with character
elements).
We compile each dispatched branch lazily and store them in a tree of vectors (each node is called a table).
The number of entries in each table depends on the implementation’s type-upgrading rule
and the number of supported specialized array element types.
There is a widetag
function containing a nested etypecase
rule
which takes an object, investigate its type and returns the corresponding index in the table.
If the function is not available at the leaf node, it compiles a new version of the function. The overhead of performing a dispatch is
[simple-array access]x[number of arguments]+[single function call].
CCL does not obtain much speedup on the numerical code as it cannot infer the
return type of (aref A ...)
from the element-type declaration of A
.
On unsupported implementations, SPECIALIZED
is equivalent to =PROGN=.
not able to pass values of lexical variables to the body inside SPECIALIZED
.
For array types, it dispatches based on the (upgraded) element type and the simpleness of the runtime array object. It does not dispatch based on the rank and the dimensions. Also, once each function is called/compiled for the arrays of a certain rank, this rank information is fixed. This is based on a heuristic that the same code/algorithm does not work for an array of the different ranks. For example, it is unreasonable to assume that a code for matrix multiplication can process a 1D vector or a 3D tensor.
For integer types, all values that can be expressed within fixnum
will dispatch to the fixnum-specialized branch. That is, entering a
(specialized (n) ...)
with n
being 12
does not dispatch to (unsinged-byte 4)
but simply to fixnum
.
specialized
dispatches to the branches for standard-object
and
structure-object
, but does not provide any dispatch for their subtypes.
This should be handled by CLOS mainly because, as a heuristic, different
classes would require the different code/algorithm.
However, for specific purposes, we expose a function register-base-type
which allows you to add a specific type to the tables.
Masataro Asai (guicho2.71828@gmail.com)
Licensed under LGPL v3.
Copyright (c) 2019 IBM Corporation