-
Notifications
You must be signed in to change notification settings - Fork 211
design
To do macros we need to read
. To read
we need to match delimiters
and build a token tree to know where a macro invocation ends.
But JS complicates this because delimiters can appear inside
of a regex literal and deciding if /
is the start of a
regex or the division operator depends on parsing context.
The read algorithm follows but see also the DLS 14 paper which goes into more detail.
So to handle the problem of /
we can use "almost one" lookbehind to
disambiguate. Algorithm:
skip over comments
if tok is /
if tok-1 is ()
if tok-2 in "if" "while" "for" "with"
tok is start of regex literal
else
tok is divide
else if tok-1 is {}
if isBlock(tok-1)
// named or anonymous function
if tok-2 is () and tok-3 is "function" or tok-4 is "function"
if function expression // how to determine is described below
tok is divide
else
tok is start of regex literal
else
tok is start of regex literal
else
tok is divide
else if tok-1 in punctuator // e.g. ";", "==", ">", "/", "+", etc.
tok is start of regex literal
else if tok-1 in keywords and not "this"
// though some keywords will eventually result in a parse error (eg. debugger, break)
tok is start of regex literal
else
tok is divide
assignOps = ["=", "+=", "-=", "*=", "/=", "%=",
"<<=", ">>=", ">>>=", "&=", "|=", "^=", ","];
binaryOps = ["+", "-", "*", "/", "%","<<", ">>", ">>>",
"&", "|", "^","&&", "||", "?", ":",
"instanceof", "in",
"===", "==", ">=", "<=", "<", ">", "!=", "!=="];
unaryOps = ["++", "--", "~", "!", "delete", "void", "typeof", "throw", "new"];
function isBlock(tok)
if tok-1 is ( or [
// ... ({...} ...)
return false
else if tok-1 is ":" and parent token is {}
// ... {a:{...} ...}
return isBlock(the parent {})
else if tok-1 is one of assignOps unaryOps binaryOps
// ... + {...}
// ... typeof {...}
return false
else if tok-1 is one of "return" "yield"
// handle ASI
if lineNumber(tok) isnt lineNumber(tok-1)
// return
// {...}
return true
else
// return {...}
return false
else if tok-1 is "case"
// case {...}
return false
else
return true
Depending on context, function name() {}
is either a function declaration or a
function expression. If it's a function expression then
a following /
will be interpreted as a divide but if it's a
function declaration a following /
will be interpreted as a regex.
For example,
// a declaration so / is regex
f(); function foo() {} /42/i
vs
// an expression so / is divide
x = function foo() {} /42/i
Looking a token behind the function
keyword (ignoring newlines) the
following imply it is a function declaration:
; } ) ] ident literal (including regex literal so need to be careful about /)
debugger break continue else
And these imply it is a function expression.
( [ , (assignment operators) (binary operators) (unary operators)
in typeof instanceof new return case delete
throw void
And these will result in a parse error:
do break default finally for function if switch this
try var while with
What should do we do with FutureReservedWords? Treat as identifiers?
Some examples:
// `=` comes first so regex
x = /foo/
// `x` so divide
x = x / foo /
// `(` so regex
x = (/foo/)
x = 10 {/foo/}
do { /foo/ }
// `)` so actually have to look back all the way to `if` to see regex
if (true) /foo/
// `=` before the `()` so divide
x = (a) / foo
// needs to be divide since call
bar (true) /foo/
This means that inside of a macro call we have to follow this context sensitivity for regex literals. So the following reasonable macro isn't allowed:
macro rcond {
rcond (s:expr) { instance e:expr... } => // ...
}
rcond ("foo") {
instance /foo}bar/
}
The "instance" makes the first /
be interpreted as divide. So we could just leave this as is and call it a limitation of macros. They need to respect the same structure as JS. This might actually be ok. The above could be done as:
rcond ("foo") {
instance "foo}bar"
}
(note that if we used case
instead of instance
then the following /
would be interpreted as the start of a regex since case
is a keyword, but in general this is a non-obvious rule for macro writers to be aware of)
Not too bad of a change I think. We're already forcing delimiter matching anyway. e.g. the following is bad because of the extra unmatched paren:
macro m {
case m (e1: expr ( e2:expr) => // ...
}
If we want to allow macros to shadow statements like if
we have
another complication:
macro if {
case if(c:expr) => ...
}
if (true) / foo
// should be divide?!
So I think we are going to treat the reserved keywords (if
, while
,
etc.) as really reserved. Macros can't override their meaning.
Should we disallow FutureReservedWords too (class
, enum
, etc.)?
Some example macro code. In various stages of wrong and impossible.
Syntax-rules flavored macros:
macro swap {
case swap (x:var, y:var) => {
tmp = x;
x = y;
y = tmp;
}
}
var a = 2, b = 4;
swap(a, b)
macro unless {
case unless (condition:expr) { body:expr } => {
if(!condition) { body } else {}
}
}
Recursive and refers to previously defined macro:
macro rotate {
case rotate (a:var) => ;
case rotate (a:var, b:var, c:var ...) => {
swap(a, b);
rotate(b, c ...);
}
}
var a = 2, b = 4, c = 6, d = 8;
rotate(a, b, c, d)
Syntax-case flavored macros:
macro swap {
case swap (x:var, y:var) => {
#'tmp = x;
#'x = y;
#'y = tmp;
}
}
macro thunk {
case thunk (e: expr) =>
#'function() { return e; }
}
thunk(2+2)
macro let {
case let (x:var = v:expr) { body:expr } => {
#'(function(x) { body })(v)
}
}
macro or {
case or () => #'false;
case or (e:expr) => #'e;
case or (e1:expr, e2:expr, e3:expr...) => {
#'let (t = e1) { t ? t : or(e2, e3) }
}
}
macro cond {
case cond { default: def:expr } => {
#'def
}
case cond { case condition:expr => val:expr, ... default => def:expr } => {
#'condition ? val : cond { ... default => def }
}
}
var type = cond {
case (x === null) => "null",
case Array.isArray(x) => "array",
case (typeof x === "object") => "object",
default => typeof x
};
macro {
case let(x_1:var = v_1:expr, ... , x_n:var = v_n:expr)
{ body: expr } => {
// ....
}
}
macro {
case cond (e:expr) {
case pat_1:expr => val_1:expr,
...
case pat_n:expr => val_n:expr,
default => val:expr
}
}
Misc:
- Optional cases (like
default
above)? - primitive forms to fall back on?
- have the interesting stuff be just another macro?
Possible primitive design that can then build other more expressive macro forms.
macro name {
function(stx) {
return stx;
}
}
The macro name {...}
wraps a transformer function. macro
is the signal to the expander that what follows is a macro definition. At any macro call matching the macro name the expander passes the read tree to the transfomer and replaces it with the transformer result.
So how do we name macros? Some possibilities...
In the macro form:
macro name {
function(stx) {
return stx;
}
}
(probably the best option)
In function name:
macro {
function name(stx) {
return stx;
}
}
(really don't like this)
As assign:
name = macro {
function name(stx) {
return stx;
}
}
(not too bad but confuses compile time vs runtime semantics?)
As assign with var
:
var name = macro {
function name(stx) {
return stx;
}
}
(even worse, what happens if we forget the var
? No global object but this is what a programmer is thinking of)
How are these scopped? Block scoping, ignores shadoing via hoisting. So:
function foo(m) {
m();
macro m { ... }
var m
}
Or should we say that macro definitions are hoisted to the top of their scope? And scope follows normal JS semantics (so no blocks?). But then what about new macro forms? eg:
function foo() {
// def is a "function" form
macro def {...}
bar();
def name(arg1, arg2) {
// so bar shouldn't be hoisted out of `def name(...) {...}`
macro bar {...}
}
}
So we could say that macros are hoisted out of known blocks (for
, if
, etc.) but not macros or normal function definitions. But then what if we wanted a block-like macro form. eg:
macro until { ... } // very much like while
function foo() {
foo(); // from def inside while
bar(); // not from def inside until
while (x !== 4) {
macro foo {...}
}
until (x === 4) {
macro bar {...}
}
}
Not consistant so I think macro definitions must be block scoped always.
To build out syntax objects we need a few primitive functions like in scheme.
syntax :: (Value) -> SyntaxObject
// aka #'
syntax-e :: (SyntaxObject) -> Value
// aka syntax->datum (unwrap all levels)
// and syntax-e (unwrap single level)
make-syntax :: (Val, SyntexObject) -> SyntaxObject
// aka datum->syntax
These names aren't quite right. What is JavaScripty?
syntax
// just overload syntax to include make-syntax power?
// usually will be using the sugar anyway
unwrapSyntax
unwrapSyntaxAll
// single function with flat?
// ugly
var t = thunk(2+2) // function(a) { return 2 + 2;}
// where
macro thunk {
function(stx) {
return syntax(
function(a) {
return unwrapSyntax(stx.body);
}, stx)
}
}
// using #{...} as sugar for syntax(...)
macro thunk {
function(stx) {
return #{
function(a) {
return unwrapSyntax(stx.body);
}
}
}
}
macro swap {
function(stx) {
var x = unwrapSyntax(stx.body[0])
var y = unwrapSyntax(stx.body[1])
return #{
var tmp = x;
x = y;
y = tmp;
}
}
}
Useful papers on macros:
- Macros that work
- Macros that work together
- A Theory of Typed Hygienic Macros
- Macro-By-Example: Deriving Syntactic Transformations from their Specifications
- Syntactic Abstraction in Scheme
- SuperC: Parsing All of C by Taming the Preprocessor
- Composable and Compilable Macros
- Refining Syntactic Sugar: Tools for Supporting Macro Development
- Fortifying Macros
- Composable and Compilable Macros