Ast pattern matching, templating and rewriting. Supports regex-like
patterns for sequences. Is not tied to particular AST - NimNode
is
supported, but not the only type of AST.
This library is intended to be used not only as pattern matching for nim ast but also as foundation for any sort of operations on heterogeneous AST trees. At least I hope it to be suitable for this kind of task.
The readme might be a little too verbose (especially for the people who are familiar with prolog/unification etc).
nimble install nimtrs
This library is build around prolog-like unification algorithm and term notion. This terminology is used in documentation/API etc.
Code examples use either NimAst
(where appropriate) or simple type
Ast
, defined in tests tests/ast_term.nim.
Term is either of
- variable :
$a
or@a
- constant :
1, 3, 4
- list :
[t1, t2, t3, ...]
- functor head + list of arguments (other terms) :
f(t1, t2, t3 ...)
- pattern
*(12 | 13) & $a
All of the functions and types in the library have to generic
parameters: V
and F
. First one (V
)is a type of constants
(values), second one (F
) type of a functor symbol.
- If you want to model mathematical expressions
F
would be a string andV
in integer type. Possible example of a functor:cos(0.2)
orsin($a + $b)
. Note:+
is also a functor - binary operator is just a 2-ary function. - For working with nim ast functor head is a
NimNodeKind
and constant is aNimNode
. For example for expression like3 + 3
we get ast dump (in lisp form) that looks like this.Infix
is a functor head,(Ident "+")
is a constant since it is not possible for it to have arguments.(Infix (Ident "+") (IntLit 3) (IntLit 3))
Functor parts (terminology used)
Functor argument /list/
|
vvvvvvvvvv
Functor(a1, a2, a3)
^ ^ ^ ^
| | | |
| `---+---'
| |
| Functor /arguments/
|
Functor head
Simple example of AST node type (identical to tests/ast_term.nim)
type
AstKind* = enum
akStrLit, akIntLit, akIdent # Constant values
akCall, akCondition # Functors
Ast* = object
case kind: AstKind
of akStrLit, akIdent:
strVal: string
of akIntLit:
intVal: int
else:
subnodes: seq[Ast]
Unification is an algorithm for finding substitution (mapping of variables to values).
Signature for unfication is: (t1, t2: Term[V, F], env: TermEnv[V,
F]): Option[TermEnv[V, F]]
. Given some environment env
and two
terms t1
, t2
try to find all necessary substitutions for which
Each variable represented using symbol (VarSym
) and itself does not
have any value. Environment is a mapping (substitution) between
variable symbols and terms.
There are two types of variables - scalar and list.
- Scalar variable (denoted using
$varName
) is identical regular prolog variable and quite similar tolet
variable in nim - once assigned value cannot be changed, unification of two bound variables succesed only if their values can be unified. - List variable was introduces to allow capturing multiple values
from term. Unification with this type of variables always
succedes. List variable is always bound to list term but spliced
into arguments on
fromTerm
conversion.
NOTE: ex
and exUnif
are simpler helper templates defined in
src/nimtrs/doc_example.nim
echo "--- Single use of scalar variable ---"
exUnif(ex(Condition($a)), ex(Condition(%!mkLit("hello"))))
echo "--- Scalar variable used twice ---"
exUnif(
ex(Condition($a, $a)),
ex(Condition(%!mkLit("vareee"), %!mkLit("vareee")))
)
echo "--- Scalar variable used twice, different arguments ---"
exUnif(
ex(Condition($a, $a)),
ex(Condition(
%!mkLit("vareee"), # different values for constants. Variable will
# be unified with first one `"vareee"`, but fails on second one.
%!mkLit("((((()))))")
))
)
echo "--- List variable ---"
exUnif(
ex(Condition(@a, @a)), # List variable captures all terms
ex(Condition(%!mkLit("vareee"), %!mkLit("((((()))))")))
)
--- Single use of scalar variable --- {($a -> 'hello')} --- Scalar variable used twice --- {($a -> 'vareee')} --- Scalar variable used twice, different arguments --- Unification failed --- List variable --- {(@a -> ['vareee', '((((()))))'])}
List variables are always spliced. In most cases notion of ‘list’ is
not present in AST - things like StmtList
is just another functor
(with many arguments, yes, but functor still). This is not supported
right now, but using something like A([@listVariable])
to really
generate list will be added.
Terms are build as heterogeneous representation for trees - there is
only single type for functor head, but since constants themself can
differ ((IntLit 3)
and (Ident "+")
) it is necessary to distinguish
between them somehow.
One possible solution is to take OOP-style approach and model term as
a object hierarchy, with each one implementing some kind of
isConstant
and getFunctorSymbol
methods. This library uses
somewhat similar approach, but more suitable for nim case objects.
Instead of deriving from parent object type and implementing some
abstract methods it is necessarty to declare set of callback functions
that will be used on conversion from/to term.
func isFunctor*(nnk: NimNodeKind): bool =
nnk notin { # set of node kinds that cannot be considered 'functor'.
# I.e. it is not possible to have a child for
# `nnkFloatLit` for example, therefore it is not a
# functor.
nnkNone, nnkEmpty, nnkNilLit, # Empty node
nnkCharLit..nnkUInt64Lit, # Int literal
nnkFloatLit..nnkFloat64Lit, # Float literal
nnkStrLit..nnkTripleStrLit, nnkCommentStmt, nnkIdent, nnkSym # Str lit
}
const nimAstImpl* = TermImpl[NimNode, NimNodeKind](
getsym: ( # Get functor symbol from value. `V -> F`
proc(n: NimNode): NimNodeKind = n.kind
),
isFunctorSym: ( # Check if functor is a symbol. `F -> bool`
proc(kind: NimNodeKind): bool = kind.isFunctor()
),
makeFunctor: ( # Construct functor from head symbol and list of
# arguments. `F x seq[V] -> V`
proc(op: NimNodeKind, sub: seq[NimNode]): NimNode =
if sub.len == 0: newNimNode(op)
else: newTree(op, sub)
),
getArguments: ( # Get list of arguments from term. No checking is
# necessary - only functor terms would be queried
# for arguments. `V -> seq[V]`
proc(n: NimNode): seq[NimNode] = toSeq(n.children)
),
valStrGen: ( # Generate string representation for term. Used for
# pretty-printing terms. `V -> string`
proc(n: NimNode): string = n.toStrLit().strVal()
),
)
This ‘implementation’ is passed to toTerm
and fromTerm
converters
to convert value of type V
to Term[V, F]
.
import nimtrs/[trscore, trspprint, trsdsl, nimast_trs]
import options
template matchPatternNim(term: NodeTerm, patt: untyped): untyped =
matchPattern(term, nimAstImpl, patt)
macro ifTest(body: untyped): untyped =
for stmt in body:
let term = stmt.toTerm(nimAstImpl)
# ^^^^^^^^^^
# 'Implementation' - used for converting
# value of type `V` to term.
if term.matchPatternNim(
# Match head of the `term`
# | Match pattern one or more times
# | |Match `nnkElifBrach` - `nnk` prefix might be omiited
# | || Concatenation of two parts Optional artument in the term
# v_____ vv_________ v v
IfStmt(*ElifBranch(@conds, @bodies) & ?Else($elsebody))):
# A ^ ^ A ^‾‾‾‾‾‾‾‾‾‾
# ! |_______| ! Possible nullable variable,
# ! | ! inserted as `Option[F]`
# ! | !
# ! Two list variables, will be inserted as `seq[V]`
# ! !
# [ This part will consume as]
# [ much functor arguments as]
# [ possible. ]
for cond in conds:
echo cond.lispRepr()
if elsebody.isSome():
echo "Has `else`"
ifTest:
if 12 == 22:
echo "123"
elif false:
echo "123"
else:
echo "123123"
if 20 == 29:
echo "123"
(Infix (Ident "==") (IntLit 12) (IntLit 22)) (Ident "false") Has `else` (Infix (Ident "==") (IntLit 20) (IntLit 29))
Generate term with variables and then substitute them from environment. Examples of use (pretty simple but should illustrate the point).
import nimtrs/[trscore, trspprint, trsdsl, nimast_trs]
# import options
template makeNimTerm(body: untyped): untyped =
makeTerm(nimAstImpl, body)
macro templating(arg: untyped): untyped =
let env = makeEnvironment(@{
parseVarSym("$a") : arg.toTerm()
})
let templ = makeNimTerm:
IfStmt(
ElifBranch($a, %!ident("hello"))
)
let res = templ.substitute(env).fromTerm()
echo res.toStrLit()
templating(1 + 2)
if 1 + 2: hello
Pattern matching DSL is intentionally similar to EBNF grammar from hparse dsl. Of course there are differences, but I tried to keep DSLs as similar as possible.
NOTE: there are some missing pieces (alternatives, `interpol`
syntax), but it works in general (passes test suite at least).
- functor construction
Functor(a1, a2 ...)
make functor with constant head and argumentsa1, a2
. Argment might be a pattern.%?predFunctor(a1, a2 ...)
- make functor with predicate head, not binding variable. [2]%?predFunctor[$var](...)
or%?predFunctor[@var]
predicate head functor, binding variable$var
[$var](...)
functor with variable head.
- constant construction
%constGen
create constant of typeTerm[V, F]
, add it directly to the term.%!constGen
constant of typeV
, automatically converted toTerm[V, F]
[2]%?constant[%var]
predicate constant, binding variable$var
- variable declaration
$scalar
- scalar variable@list
- list variable
- pattern construction
E1 & E2
- concatenation. MatchE1
, followed byE2
E1 | E2
- alternative. MatchE1
orE2
[1]!E1
- negation. MatchE1
. If unification is successful returnnone()
env, otherwise return original environment. Does not modify env. on success.+E1
one-or-more match ofE1
*E2
zero-or-more matches ofE2
?E1
Optional match ofE1
- [1] alternative is not actually supported right now as it requires
much more more work than any other pattern. Reason? supporting
alternative will require implementing large portion of prolog
backtracking system to keep track of variables bound in each
alternative. Why? consider this pattern:
($a | (1 & $a)) & $a
unified with list[1, 2, 2]
. If we select first alternative we get{$a -> 1}
after first element - unification of$a
in environment{$a -> 1}
fails. We need to rollback to the start, dropping all values for$a
and match second alternative. After we do this unification succedes. - [2] more convinient
`interpol`
syntax in the todo list. - [ ] todo: add shorthand for
(!E1 E2)* E1
- matchE2
untilE1
is found.
This library uses hmisc/hexceptions for DSL error reporting.
discard initTRS(astImpl):
Condition($a, 0) => Condition($a, $b)
Undeclared variable $b 2 discard initTRS(astImpl): 5:36 Condition($a, 0) => Condition($a, $b) ^~ | Not declared in LHS Raised in :0 [CodeError:ObjectType]
Some things are informally described in devnotes.org, most of the functions and types are documented in the source code. there.
- [ ] support
`functor`(`value`)
to interpolate variables/expressions from surrounding environment (similar toquote do:
) - [ ] Fully support operations on lists
- [ ] Implement alternative and negation for pattern matchin
- [ ] Suppoirt
`interpol`
syntax for pattern description to splice values. - [ ] Debugging (pretty printing or something similar (simple pretty-printing is not going to cut it for large terms. Need good tool for visualizing and debugging failed unification)).
- [ ] Pass string primitive type literals (strings, integers, floats,
bools etc) as-is, without requiring quioting. Require user to
implement
fromBasicType
callback in form(val: PrimitiveType): Term[V, F]
wherePrimitiveType
is a case object for all ‘passthrough’ types.
- [ ] functor abbreviation in DSL