Skip to content
This repository has been archived by the owner on Jan 27, 2022. It is now read-only.

Latest commit

 

History

History
454 lines (363 loc) · 14.5 KB

readme.org

File metadata and controls

454 lines (363 loc) · 14.5 KB

nimtrs

Ast pattern matching, templating and rewriting. Supports regex-like patterns for sequences. Is not tied to particular AST - NimNode is supported, but not the only type of AST.

This library is intended to be used not only as pattern matching for nim ast but also as foundation for any sort of operations on heterogeneous AST trees. At least I hope it to be suitable for this kind of task.

The readme might be a little too verbose (especially for the people who are familiar with prolog/unification etc).

Installation

nimble install nimtrs

Links

Features

Core concepts

This library is build around prolog-like unification algorithm and term notion. This terminology is used in documentation/API etc.

Code examples use either NimAst (where appropriate) or simple type Ast, defined in tests tests/ast_term.nim.

Term

Term is either of

  • variable : $a or @a
  • constant : 1, 3, 4
  • list : [t1, t2, t3, ...]
  • functor head + list of arguments (other terms) : f(t1, t2, t3 ...)
  • pattern *(12 | 13) & $a

All of the functions and types in the library have to generic parameters: V and F. First one (V )is a type of constants (values), second one (F) type of a functor symbol.

  • If you want to model mathematical expressions F would be a string and V in integer type. Possible example of a functor: cos(0.2) or sin($a + $b). Note: + is also a functor - binary operator is just a 2-ary function.
  • For working with nim ast functor head is a NimNodeKind and constant is a NimNode. For example for expression like 3 + 3 we get ast dump (in lisp form) that looks like this. Infix is a functor head, (Ident "+") is a constant since it is not possible for it to have arguments.
    (Infix (Ident "+") (IntLit 3) (IntLit 3))
        

Functor parts (terminology used)

        Functor argument /list/
        |
        vvvvvvvvvv
Functor(a1, a2, a3)
^       ^   ^   ^
|       |   |   |
|       `---+---'
|           |
|           Functor /arguments/
|
Functor head

Simple example of AST node type (identical to tests/ast_term.nim)

type
  AstKind* = enum
    akStrLit, akIntLit, akIdent # Constant values
    akCall, akCondition # Functors

  Ast* = object
    case kind: AstKind
      of akStrLit, akIdent:
        strVal: string
      of akIntLit:
        intVal: int
      else:
        subnodes: seq[Ast]

Unification

Unification is an algorithm for finding substitution (mapping of variables to values).

Signature for unfication is: (t1, t2: Term[V, F], env: TermEnv[V, F]): Option[TermEnv[V, F]]. Given some environment env and two terms t1, t2 try to find all necessary substitutions for which

Variables & environment

Each variable represented using symbol (VarSym) and itself does not have any value. Environment is a mapping (substitution) between variable symbols and terms.

There are two types of variables - scalar and list.

  • Scalar variable (denoted using $varName) is identical regular prolog variable and quite similar to let variable in nim - once assigned value cannot be changed, unification of two bound variables succesed only if their values can be unified.
  • List variable was introduces to allow capturing multiple values from term. Unification with this type of variables always succedes. List variable is always bound to list term but spliced into arguments on fromTerm conversion.

NOTE: ex and exUnif are simpler helper templates defined in src/nimtrs/doc_example.nim

echo "--- Single use of scalar variable ---"
exUnif(ex(Condition($a)), ex(Condition(%!mkLit("hello"))))

echo "--- Scalar variable used twice ---"
exUnif(
  ex(Condition($a, $a)),
  ex(Condition(%!mkLit("vareee"), %!mkLit("vareee")))
)

echo "--- Scalar variable used twice, different arguments ---"
exUnif(
  ex(Condition($a, $a)),
  ex(Condition(
    %!mkLit("vareee"), # different values for constants. Variable will
    # be unified with first one `"vareee"`, but fails on second one.
    %!mkLit("((((()))))")
  ))
)


echo "--- List variable ---"
exUnif(
  ex(Condition(@a, @a)), # List variable captures all terms
  ex(Condition(%!mkLit("vareee"), %!mkLit("((((()))))")))
)
--- Single use of scalar variable ---
{($a -> 'hello')}
--- Scalar variable used twice ---
{($a -> 'vareee')}
--- Scalar variable used twice, different arguments ---
Unification failed
--- List variable ---
{(@a -> ['vareee', '((((()))))'])}

List variables are always spliced. In most cases notion of ‘list’ is not present in AST - things like StmtList is just another functor (with many arguments, yes, but functor still). This is not supported right now, but using something like A([@listVariable]) to really generate list will be added.

TermImpl

Terms are build as heterogeneous representation for trees - there is only single type for functor head, but since constants themself can differ ((IntLit 3) and (Ident "+")) it is necessary to distinguish between them somehow.

One possible solution is to take OOP-style approach and model term as a object hierarchy, with each one implementing some kind of isConstant and getFunctorSymbol methods. This library uses somewhat similar approach, but more suitable for nim case objects. Instead of deriving from parent object type and implementing some abstract methods it is necessarty to declare set of callback functions that will be used on conversion from/to term.

func isFunctor*(nnk: NimNodeKind): bool =
  nnk notin { # set of node kinds that cannot be considered 'functor'.
              # I.e. it is not possible to have a child for
              # `nnkFloatLit` for example, therefore it is not a
              # functor.
    nnkNone, nnkEmpty, nnkNilLit, # Empty node
    nnkCharLit..nnkUInt64Lit, # Int literal
    nnkFloatLit..nnkFloat64Lit, # Float literal
    nnkStrLit..nnkTripleStrLit, nnkCommentStmt, nnkIdent, nnkSym # Str lit
  }

const nimAstImpl* = TermImpl[NimNode, NimNodeKind](
  getsym: ( # Get functor symbol from value. `V -> F`
    proc(n: NimNode): NimNodeKind = n.kind
  ),
  isFunctorSym: ( # Check if functor is a symbol. `F -> bool`
    proc(kind: NimNodeKind): bool = kind.isFunctor()
  ),
  makeFunctor: ( # Construct functor from head symbol and list of
                 # arguments. `F x seq[V] -> V`
    proc(op: NimNodeKind, sub: seq[NimNode]): NimNode =
      if sub.len == 0: newNimNode(op)
      else: newTree(op, sub)
  ),
  getArguments: ( # Get list of arguments from term. No checking is
                  # necessary - only functor terms would be queried
                  # for arguments. `V -> seq[V]`
    proc(n: NimNode): seq[NimNode] = toSeq(n.children)
  ),
  valStrGen: ( # Generate string representation for term. Used for
               # pretty-printing terms. `V -> string`
    proc(n: NimNode): string = n.toStrLit().strVal()
  ),
)

This ‘implementation’ is passed to toTerm and fromTerm converters to convert value of type V to Term[V, F].

Regex-like pattern matching (TermPattern)

import nimtrs/[trscore, trspprint, trsdsl, nimast_trs]
import options

template matchPatternNim(term: NodeTerm, patt: untyped): untyped =
  matchPattern(term, nimAstImpl, patt)

macro ifTest(body: untyped): untyped =
  for stmt in body:
    let term = stmt.toTerm(nimAstImpl)
    #                      ^^^^^^^^^^
    #                      'Implementation' - used for converting
    #                       value of type `V` to term.
    if term.matchPatternNim(
    # Match head of the `term`
    # |      Match pattern one or more times
    # |      |Match `nnkElifBrach` - `nnk` prefix might be omiited
    # |      ||  Concatenation of two parts Optional artument in the term
    # v_____ vv_________                  v v
      IfStmt(*ElifBranch(@conds, @bodies) & ?Else($elsebody))):
      #      A           ^       ^      A         ^‾‾‾‾‾‾‾‾‾‾
      #      !           |_______|      !         Possible nullable variable,
      #      !           |              !         inserted as `Option[F]`
      #      !           |              !
      #      !           Two list variables, will be inserted as `seq[V]`
      #      !                          !
      #      [ This part will consume as]
      #      [ much functor arguments as]
      #      [ possible.                ]

      for cond in conds:
        echo cond.lispRepr()

      if elsebody.isSome():
        echo "Has `else`"

ifTest:
  if 12 == 22:
    echo "123"
  elif false:
    echo "123"
  else:
    echo "123123"

  if 20 == 29:
    echo "123"
(Infix (Ident "==") (IntLit 12) (IntLit 22))
(Ident "false")
Has `else`
(Infix (Ident "==") (IntLit 20) (IntLit 29))

AST templating

Generate term with variables and then substitute them from environment. Examples of use (pretty simple but should illustrate the point).

import nimtrs/[trscore, trspprint, trsdsl, nimast_trs]
  # import options

template makeNimTerm(body: untyped): untyped =
  makeTerm(nimAstImpl, body)

macro templating(arg: untyped): untyped =
  let env = makeEnvironment(@{
    parseVarSym("$a") : arg.toTerm()
  })

  let templ = makeNimTerm:
    IfStmt(
      ElifBranch($a, %!ident("hello"))
    )

  let res = templ.substitute(env).fromTerm()
  echo res.toStrLit()

templating(1 + 2)
if 1 + 2:
  hello

Rewriting TODO:DOC

Term construction DSL

Pattern matching DSL is intentionally similar to EBNF grammar from hparse dsl. Of course there are differences, but I tried to keep DSLs as similar as possible.

NOTE: there are some missing pieces (alternatives, `interpol` syntax), but it works in general (passes test suite at least).

  • functor construction
    • Functor(a1, a2 ...) make functor with constant head and arguments a1, a2. Argment might be a pattern.
    • %?predFunctor(a1, a2 ...) - make functor with predicate head, not binding variable. [2]
    • %?predFunctor[$var](...) or %?predFunctor[@var] predicate head functor, binding variable $var
    • [$var](...) functor with variable head.
  • constant construction
    • %constGen create constant of type Term[V, F], add it directly to the term.
    • %!constGen constant of type V, automatically converted to Term[V, F] [2]
    • %?constant[%var] predicate constant, binding variable $var
  • variable declaration
    • $scalar - scalar variable
    • @list - list variable
  • pattern construction
    • E1 & E2 - concatenation. Match E1, followed by E2
    • E1 | E2 - alternative. Match E1 or E2 [1]
    • !E1 - negation. Match E1. If unification is successful return none() env, otherwise return original environment. Does not modify env. on success.
    • +E1 one-or-more match of E1
    • *E2 zero-or-more matches of E2
    • ?E1 Optional match of E1

  • [1] alternative is not actually supported right now as it requires much more more work than any other pattern. Reason? supporting alternative will require implementing large portion of prolog backtracking system to keep track of variables bound in each alternative. Why? consider this pattern: ($a | (1 & $a)) & $a unified with list [1, 2, 2]. If we select first alternative we get {$a -> 1} after first element - unification of $a in environment {$a -> 1} fails. We need to rollback to the start, dropping all values for $a and match second alternative. After we do this unification succedes.
  • [2] more convinient `interpol` syntax in the todo list.
  • [ ] todo: add shorthand for (!E1 E2)* E1 - match E2 until E1 is found.

Error reporting in DSL

This library uses hmisc/hexceptions for DSL error reporting.

discard initTRS(astImpl):
  Condition($a, 0) => Condition($a, $b)
Undeclared variable $b

 2    discard initTRS(astImpl):
 5:36   Condition($a, 0) => Condition($a, $b)
                                          ^~
                                          |
                                          Not declared in LHS



Raised in :0


 [CodeError:ObjectType]

Development

Some things are informally described in devnotes.org, most of the functions and types are documented in the source code. there.

  • [ ] support `functor`(`value`) to interpolate variables/expressions from surrounding environment (similar to quote do:)
  • [ ] Fully support operations on lists
  • [ ] Implement alternative and negation for pattern matchin
  • [ ] Suppoirt `interpol` syntax for pattern description to splice values.
  • [ ] Debugging (pretty printing or something similar (simple pretty-printing is not going to cut it for large terms. Need good tool for visualizing and debugging failed unification)).
  • [ ] Pass string primitive type literals (strings, integers, floats, bools etc) as-is, without requiring quioting. Require user to implement fromBasicType callback in form (val: PrimitiveType): Term[V, F] where PrimitiveType is a case object for all ‘passthrough’ types.

TODO:DOC

  • [ ] functor abbreviation in DSL