treehugger.js
is a Javascript library for program processing. It
has generic means to represent and manipulate (analyze, transform)
abstract syntax trees (ASTs).
It consists of three parts:
- A generic ASTs representation format, inspired by ATerms)
that can be used to represent programs written in any language (Java, Ruby, Javascript)
lib/treehugger/tree.js
- A set of generic traversals to query, manipulate and annotate these ASTs,
inspired by Stratego/XT
lib/treehugger/traverse.js
- A set of analyses for specific languages. Currently:
- Javascript: a (UglifyJS-based) parser
and analyses reconstructing the type structure and first attempts at type-inference.
lib/treehugger/js/*.js
- Javascript: a (UglifyJS-based) parser
and analyses reconstructing the type structure and first attempts at type-inference.
The project relies on require.js for library loading.
ast.js uses a few simple data structures to represent ASTs and a textual representation that makes it easy to debug and write these ASTs. The textual representation is best introduced by example.
Consider a simple expresion language with expression as follows:
2 + 3 * 1
A parser could turn this into the following AST:
Add
/ \
Num Mul
| / \
2 Num Num
| |
3 1
Which is expressed using treehugger.js's textual representation as follows:
Add(Num("2"), Mul(Num("3"), Num("1")))
Using the treehugger.js
API this AST can be contructed as follows:
var tree = require('treehugger/tree');
var node = tree.cons("Add", [tree.cons("Num", [tree.string("2")]),
tree.cons("Mul", [tree.cons("Num", [tree.string("3")]),
tree.cons("Num", [tree.string("1")])])]);
Or, more simply:
var node = tree.parse('Add(Num("2"), Mul(Num("3"), Num("1")))');
treehugger.js has three kinds of AST node types:
- Strings (e.g.
"2"
,"myVariable"
), usually representing identifiers or other textual values. - Lists (e.g.
["a", "b", None()]
) - Constructors (or simply: cons nodes) (e.g.
None()
orNum("2")
), used to represent language constructs in the language, such as operators, loop constructs, etc.
Treehugger.js is based on the concept of generic traversals. A generic traversal traverses the tree in a particular order, for instance from top to bottom or from the bottom up. At every node the traversal comes across you can apply one or transformations. Tranformations can either be AST patterns, transformation functions or a combination of both.
The treehugger/traverse.js
library adds a number of methods to AST nodes that make traversals simpler:
collectTopDown
(traverses the tree top to bottom until finding a match, collecting all matches and returning them as a list)traverseTopDown
(traverses the tree top to bottom until finding a match)- (more are coming)
A transformation can be either:
- A textual AST pattern
- A textual AST pattern followed by a transformation function that is passed a binding object
- A transformation function
Each transform either matches or doesn't match. If a transformation function matches,
it returns a non-falsy value (usually a new AST node), if it doesn't, it returns false
.
A simple example:
var node = tree.parse('Add(Num("2"), Mul(Num("3"), Num("1")))');
node.collectTopDown("Num(_)").debug();
This will traverse the AST and look for nodes that match the Num(_) pattern,
where _
(a wildcard) can be anything. The collectTopDown
traversal traverses
the AST from top to bottom, and on the way collects all matches and returns them
as an ListNode
. The result of the collectTopDown
call in this case will be:
[Num("2"), Num("3"), Num("1")]
The .debug()
call prints the result to the Javascript console (console.log
).
So, what if we want to only return the numbers, not the Num(...)
constructors?
If we follow a textual pattern by a function, we can transform the result:
var node = tree.parse('Add(Num("2"), Mul(Num("3"), Num("1")))');
node.collectTopDown("Num(n)", function(b) { return b.n; }).debug();
Instead of using the placeholder _
, we now used n
. The function is passed a
bindings object whose n
property will contain the value of the placeholder.
So, the following will be printed to the Javascript console:
["2", "3", "1"]
If we want to match either Num
or Mul
nodes we can add a pattern for that
to the collectTopDown
call:
var node = tree.parse('Add(Num("2"), Mul(Num("3"), Num("1")))');
node.collectTopDown(
"Num(n)", function(b) {
return b.n;
},
"Mul(op1, op2)", function(b) {
return b.op1;
}
).debug();
This will print:
["2", Num("3")]
Why is that? The AST is traversed top to bottom by collectTopDown
. On its way, it will
try to match every node first against the Num(n)
pattern. If that succeeds,
the function after it in the argument list is executed, if the function returns
a value other than false
, the traversal stops at that branch and the result
is added to the result list. If the "Num(n)"
pattern did not match, it is
matched against "Mul(op1, op2)"
. Again, if this pattern matches, the function
is executed etc. etc. The collectTopDown
traversal will not traverse down to the
Num("3")
and Num("1")
nodes, because the traversal stopped when matching the
Mul(..., ...)
node -- if, instead the tranformation function would have returned
`false, the traversal would have proceeded down those nodes as well.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
You can see treehugger.js in action in this simple demo.