The s-expression-syntax
library provides declarative rules for
parsing the various kinds of s-expression syntax used in Common
Lisp:
- Special operators
- Lambda lists
- Declarations
- Type specifiers
- Standard macros
~cl:loop~(work in progress)
- Feature expressions
Format control strings(work in progress)
Note that this library is not a code walker but it could be used to implement a code walker.
Explaining things in more detail requires a little bit of terminology:
- Expression
- A data structure which conceptually consists of atoms and conses (possibly meant for evaluation in which case the expression is a form). In contrast to the usual definition, expressions processed by this library may be represented as arbitrary objects which are interpreted as atoms and conses according to rules defined by the client using the library. That said, processing expressions using the usual interpretation of atoms and conses is of course supported. Cons expressions can contain other expressions which are sub-expressions of the respective containing expression.
- Syntax Description
- An object which describes the syntax of one
kind of expression. For example, there is a syntax
description for the special operator
cl:let
which describes (the infinite set) of well-formedlet
expressions. Besides recognizing well-formed expressions of a certain kind, a syntax description also contains descriptions of the parts that can appear within well-formed expressions of that kind. For example, forcl:let
, the parts are- zero or more bindings, in turn consisting of
- a variable name
- an optional initial value form
- zero or more declarations
- and zero or more body forms.
More precisely, a syntax description consists of a name, a list of parts and a parser which consumes expressions conforming to the described syntax and produces a parse result tree.
- zero or more bindings, in turn consisting of
- Part
- An object which describes a sub-structure of
the syntax described by a particular syntax description. For
example, the syntax description for
cl:let
has a part for declaration expressions contained inlet
expressions.More precisely, a part consists of a name, a cardinality, which is either one, zero-or-one, one-or-more or zero-or-more, and an evaluation which is roughly “evaluated”, “not evaluated” or “composed of evaluated and non-evaluated parts”.
For example,
cl:defun
has the following partsName Cardinality Evaluation name one not evaluated lambda list one composed of evaluated and non-evaluated parts documentation string zero-or-one not evaluated declaration zero-or-more not evaluated body forms zero-or-more evaluated The “composed of evaluated and non-evaluated parts” evaluation of the lambda list is due to the fact that lambda lists contain unevaluated variable names and lambda list keywords as well as, possibly, evaluated initialization forms for optional and keyword parameters and
&aux
variables. - Result Tree
- A tree that is result of parsing an
input expression. Each node in the tree corresponds to a
sub-expression of the input expression (multiple nodes may
correspond to the same sub-expression). A node is
characterized by its kind, its initargs and its relations
to other nodes.
The result tree could possibly be called an Abstract Syntax Tree (AST), but we avoid that term because the result directly reflects the structure of the input expression.
We may call a result tree a partial result tree if it contains unparsed sub-expressions of the input expression in some of its leaf nodes.
The most common use of this library probably is turning a given expression into an AST. This process happens in multiple steps
- Determine an appropriate syntax description for parsing the
expression. For example, the expression
(locally (declare …) 1 (+ a b) 3)
must be parsed using the syntax description for the special operatorcl:locally
. - Apply the obtained syntax description in conjunction with a parse result builder to obtain a partial (see 3.) AST for the expression.
- Optionally parse evaluated sub-expressions recursively. In the
above example
(declare …)
is a sub-expression that is not evaluated while1
,(+ a b)
and3
are sub-expressions that are evaluated. The latter are not automatically parsed and thus must be recursively processed in the way described here in order to obtain a fully parsed AST. A complete AST can generally only be produced by consulting an environment as well as interleaving parsing with macroexpansion and is therefore out of scope for this library.
The following code performs steps 1. and 2. and prints the resulting
(partially parsed) AST in a human-readable form. Note how the list
builder of the architecture.builder-protocol system is passed in the
parse
call and later used to destructure the AST node node
by
calling the functions node-relations
and node-relation
.
(let* ((expression '(defun foo (a &optional (b 2))
(declare (type integer a))
(declare (type integer b))
(format t "~S" a)
(list a b)))
(syntax (s-expression-syntax:find-syntax 'defun))
;; Alternatively, determine the appropriate syntax description
;; for EXPRESSION automatically:
;; (syntax (s-expression-syntax:classify t expression))
(builder 'list)
(node (s-expression-syntax:parse builder syntax expression)))
(flet ((describe-sub-expression (sub-expression relation-args)
(format t "~2@T-> ~S~%~
~2@T evaluation: ~S~%"
sub-expression (getf relation-args :evaluation))))
(loop :for relation :in (architecture.builder-protocol:node-relations builder node)
:for part-name = (find-symbol (symbol-name (first relation))
(find-package "S-EXPRESSION-SYNTAX"))
:for part = (s-expression-syntax:find-part part-name syntax)
:for cardinality = (s-expression-syntax:cardinality part)
:for (sub-expression evaluation)
= (multiple-value-list (architecture.builder-protocol:node-relation
builder relation node))
:do (format t "~A (~A)~%" part-name cardinality)
(ecase (s-expression-syntax:cardinality part)
((1) (describe-sub-expression sub-expression evaluation))
((*) (mapc #'describe-sub-expression sub-expression evaluation))))))
Evaluating the code results in the following output which
illustrates the four parts of the defun
expression: name,
lambda-list, declaration and form. The latter two have a cardinality
of *
, so multiple child nodes may be related to the parent node
through the relation in question. In this example, both relations
contain two child nodes: two declarations and two body forms.
NAME (1) -> (:FUNCTION-NAME NIL :NAME FOO :SOURCE FOO) evaluation: NIL LAMBDA-LIST (1) -> (:ORDINARY-LAMBDA-LIST ((:REQUIRED . *) (((:REQUIRED-PARAMETER ((:NAME . 1) (((:VARIABLE-NAME NIL :NAME A :SOURCE A) :EVALUATION NIL))) :SOURCE A))) (:OPTIONAL . *) (((:OPTIONAL-PARAMETER ((:NAME . 1) (((:VARIABLE-NAME NIL :NAME B :SOURCE B))) (:DEFAULT . 1) (((:UNPARSED NIL :EXPRESSION 2 :CONTEXT :FORM :SOURCE 2) :EVALUATION T))) :SOURCE (B 2)) :EVALUATION :COMPOUND))) :SOURCE (A &OPTIONAL (B 2))) evaluation: :COMPOUND DECLARATION (*) -> (:DECLARATION ((:ARGUMENT . *) (((:ATOMIC-TYPE-SPECIFIER ((:NAME . 1) (((:TYPE-NAME NIL :NAME INTEGER :SOURCE INTEGER)))) :SOURCE INTEGER)) ((:VARIABLE-NAME NIL :NAME A :SOURCE A)))) :KIND TYPE :SOURCE (TYPE INTEGER A)) evaluation: NIL -> (:DECLARATION ((:ARGUMENT . *) (((:ATOMIC-TYPE-SPECIFIER ((:NAME . 1) (((:TYPE-NAME NIL :NAME INTEGER :SOURCE INTEGER)))) :SOURCE INTEGER)) ((:VARIABLE-NAME NIL :NAME B :SOURCE B)))) :KIND TYPE :SOURCE (TYPE INTEGER B)) evaluation: NIL FORM (*) -> (:UNPARSED NIL :EXPRESSION (FORMAT T "~S" A) :CONTEXT :FORM :SOURCE (FORMAT T "~S" A)) evaluation: T -> (:UNPARSED NIL :EXPRESSION (LIST A B) :CONTEXT :FORM :SOURCE (LIST A B)) evaluation: T
We can also focus on the overall tree structure and print the
(partially parsed) AST as a tree. The following code again uses the
architecture.builder-protocol
system to destructure the AST, this
time as part of a generic tree printer.
(let* ((expression '(defun foo (a &optional (b 2))
(declare (type integer a))
(declare (type integer b))
(format t "~S" a)
(list a b)))
(builder 'list)
(node (s-expression-syntax:parse builder t expression)))
(let ((*print-case* :downcase))
(architecture.builder-protocol.print-tree:print-tree
builder node *standard-output*)))
Note the unparsed leafs indicated by the unparsed
node kind.
defun │ source: (defun foo (a &optional (b 2)) │ (declare (type integer a)) │ (declare (type integer b)) │ (format t "~S" a) │ (list a b)) ├─name: function-name │ name: foo │ source: foo ├─lambda-list: ordinary-lambda-list │ │ source: (a &optional (b 2)) │ ├─required: required-parameter │ │ │ source: a │ │ └─name: variable-name │ │ name: a │ │ source: a │ └─optional: optional-parameter │ │ source: (b 2) │ ├─name: variable-name │ │ name: b │ │ source: b │ └─default: unparsed │ expression: 2 │ context: :form │ source: 2 ├─declaration: declaration │ │ kind: type │ │ source: (type integer a) │ ├─argument: atomic-type-specifier │ │ │ source: integer │ │ └─name: type-name │ │ name: integer │ │ source: integer │ └─argument: variable-name │ name: a │ source: a ├─declaration: declaration │ │ kind: type │ │ source: (type integer b) │ ├─argument: atomic-type-specifier │ │ │ source: integer │ │ └─name: type-name │ │ name: integer │ │ source: integer │ └─argument: variable-name │ name: b │ source: b ├─form: unparsed │ expression: (format t "~S" a) │ context: :form │ source: (format t "~S" a) └─form: unparsed expression: (list a b) context: :form source: (list a b)
find-syntax NAME &KEY IF-DOES-NOT-EXIST Return the syntax description named NAME, if any. IF-DOES-NOT-EXIST controls the behavior in case a syntax description named NAME does not exist. The following values are allowed: #'ERROR Signal an error if a syntax description named NAME does not exist. OBJECT Return OBJECT if a syntax description named NAME does not exist.
(setf find-syntax) NEW-VALUE NAME &KEY IF-DOES-NOT-EXIST Set the syntax description associated with NAME to NEW-VALUE. An existing association for NAME, if any, is replaced. IF-DOES-NOT-EXISTS is accepted for parity with FIND-SYNTAX but ignored.
ensure-syntax NAME CLASS &REST INITARGS Associate NAME with a syntax description based on CLASS and INITARGS. Return the new or updated syntax description object associated with NAME. If the database of syntax descriptions already contains a syntax description for NAME, the existing syntax description object is reinitialized with INITARGS. If the database of syntax descriptions does not contain a syntax description for NAME, a new association is created by making an instance of CLASS, initializing it with INITARGS and registering the new object for NAME.
classify CLIENT EXPRESSION Classify EXPRESSION, possibly according to specialized behavior of CLIENT. Return a syntax description object that roughly reflects the kind of EXPRESSION. Note that a precise classification would have to take into account aspects beyond the syntax, such as the environment, to, for example, distinguish function and macro application or variable references and symbol macro applications. It is always possible to find an appropriate syntax description: + If EXPRESSION is a special form, this function returns the syntax description for the corresponding special operator. + If EXPRESSION is an application of a standard macro, this function returns the syntax description for that macro. + If EXPRESSION a list not covered by the above cases, this function returns the syntax description for a generic (that is, function or macro) application. Note that this case also covers invalid applications such as (1 2 3). + If EXPRESSION is a symbol but not a keyword, this function returns a syntax description for a variable reference. + If EXPRESSION is any object that is not covered by the above cases, this function returns a syntax description for a self-evaluating object.
parse CLIENT SYNTAX EXPRESSION Parse EXPRESSION according to SYNTAX, possibly specialized to CLIENT. SYNTAX is a designator for a syntax description: + If SYNTAX is `t', `classify' is applied to CLIENT and EXPRESSION to obtain an appropriate syntax description object. + If SYNTAX is any other symbol, `find-syntax' is called to obtain the syntax description named by SYNTAX. An error is signaled if SYNTAX does not name a syntax description. + Otherwise SYNTAX must be a syntax description object. EXPRESSION is either one of the kinds of expressions that make up Common Lisp programs (such as forms, type specifiers and declarations) or a particular non-standard representation of such expressions which is specific to CLIENT. For example, a client may choose to represent every sub-expression contained in an expression as a standard object in order to store additional information. If CLIENT employs such a non-standard representation, the protocol named by symbols exported from the `s-expression-syntax.expression-grammar' package has to be implemented by defining appropriate methods. If EXPRESSION does not conform to the syntax described by SYNTAX, an error of type `invalid-syntax-error' is signaled. If EXPRESSION does conform to the syntax described by SYNTAX, a parse result that associates the parts of SYNTAX with sub-expressions of EXPRESSION is returned. The type and structure of the return value depends on CLIENT as the parse result is constructed using the builder protocol with CLIENT as the builder.