-
Notifications
You must be signed in to change notification settings - Fork 9
Introduction to Macros
Metaprogramming can be very useful in reducing or removing boilerplate code. However, some languages have less-than-satisfactory solutions while others have none at all. Template Haskell, for example, has several major problems (e.g. it is notoriously unstable between GHC releases, according to the 2017 State of Haskell), and PureScript doesn't have a macro-like system in the first place.
However, by following in Lisp's footsteps and adopting a homoiconic syntax (yay, parentheses!), Axel is equipped with a simple yet powerful metaprogramming system. Unlike in most languages, Axel source code is effectively a textual representation of the corresponding abstract syntax tree (AST) that the transpiler uses to represent the program internally. Macros, then, are just functions that take lists of Axel expressions and return lists of other Axel expressions. The Axel transpiler runs these macros at compile-time, allowing for code that creates code.
For an example of the power that macros provide us with, let's try to implement the LambdaCase
GHC language extension as a macro ourselves.
First, for a brief introduction to Axel's metaprogramming abilities. Let's use the following Axel expression as our example:
(+ 1 2)
This applies the +
function to two arguments, 1
and 2
. Internally, Axel represents this not as a function application per se, but rather as an s-expression (the Lisp name for a parenthesized list) containing the symbols +
, 1
, and 2
. The cool part about this, which Axel inherits from Lisp, is that by passing the expression to the quote
"special form," we can get that internal representation and manipulate it. For example:
(quote (+ 1 2)) --=> (SExpression [(Symbol "+") (LiteralInt 1) (LiteralInt 2)])
You can quote any expression you'd like:
(quote 1) -- => (LiteralInt 1)
(quote "test") --=> (LiteralString "test")
You can even quote another application of quote
:
(quote (quote 1)) -- (SExpression [(Symbol "quote") (LiteralInt 1)])
Once we have the quoted form of an expression, we can manipulate it like any other data structure, like so:
(case (quote (+ 1 2)))
((SExpression sexp) (SExpression (reverse sexp))))
--=> (SExpression [(LiteralInt 2) (LiteralInt 1) (Symbol "+")])
Let's define a function, backwards
, to reverse whatever Axel expression we pass in.
(= backwards (-> Expression Expression) -- It will take an Axel expression and return an Axel expression
((expression)
Now, an Axel Expression
can be one of several cases: LiteralChar Char
, LiteralInt Int
, LiteralString String
, SExpression [Expression]
, or Symbol String
. The only thing we can reverse is an SExpression
, so let's return the input unchanged for all other cases.
(case expression
((LiteralChar _) expression)
((LiteralInt _) expression)
((LiteralString _) expression)
((Symbol _) expression)
For the case that we're passed an SExpression
, however, let's extract the list of Expression
s inside it and reverse it. We'll have to wrap the new list in an SExpression
, since our function expects an Expression
to be returned.
((SExpression expressions) (SExpression (reverse expressions))))))
Now, we can run our backwards
function like so:
-- We have to quote `(2 1 +)` because otherwise it would be evaluated (which is not what we want).
-- Rather, we want the internal representation of `(+ 2 1)` to be passed to `backwards` (which requires a value of type `Expression`).
(backwards '(+ 2 1)) --=> (SExpression [(Symbol "+") (LiteralInt 1) (LiteralInt 2)])
When we evaluate backwards
, we get back a new Expression
. So, we now have the data representing our new code, but how can we actually execute it to get 3
?
Macros to the rescue! Currently, backwards
is just a plain old Axel function. But, Axel provides an alternative to normal function definition (=
) called defmacro
, and it lets us define a function that will actually be run at compile-time. Macros take a list of Expression
s and return a list of Expression
s. Macro calls in an Axel program are replaced by the results they return.
To see this in action, let's turn backwards
into a macro. defmacro
doesn't require a type signature to be passed, since it automatically uses the type [Expression] -> IO [Expression]
. You'll notice that backwards
is a partial function, since it wouldn't be able to handle being passed more than one argument. However, this isn't the worry it would usually be if this were a normal function, since errors will be thrown at compile-time and fixed before the program is actually run.
(defmacro backwards
((list expression)
Because backwards
will not have any side-effects, we will return
whatever our result will be (since we need to return a value of type IO [Expression]
rather than just [Expression]
).
(return
Also, we'll wrap the whole thing in a list, because (again) we're looking to return a list of Expression
s rather than just a single Expression
. From here, our definition is the same as before:
[(case expression
((LiteralChar _) expression)
((LiteralInt _) expression)
((LiteralString _) expression)
((SExpression expressions) (SExpression (reverse expressions)))
((Symbol _) expression))])))
Now, if we actually call backwards
in our code, the macro call will be "expanded" during compilation. Also, if you remember from when backwards
was a function, we had to explicitly quote our input for it not to be evaluated. However, when passing a form to a macro, it will be automatically quoted. This helps macros remain unobtrusive and makes them feel more like first-class syntactic constructs.
To see how our macro will actually be evaluated, let's walk through it step-by-step. If the program we provide to Axel looks like this:
(= main (IO Unit)
(() (backwards ("Hello, world!" putStrLn))))
backwards
will be executed as the program compiles, after which the program will look like this:
(= main (IO Unit)
(() (putStrLn "Hello, world!")))
And, when the program is executed at the end of this process, the output will be, as expected:
"Hello, world!"
Congratulations, you've created your very first macro!
Now, let's try to create a more complicated macro and implement GHC's LambdaCase
extension from scratch. We're going to write our macro using what we know now, and then we'll take a look at "quasiquoting" and see how it helps us make our macros more concise. Finally, we'll see it in action by using it to clean up our example.
For some context, take the following Haskell code:
\x -> case x of
True -> 1
False -> 2
Notice that x
is immediately passed to the case
expression and never used again. This happens commonly enough that the LambdaCase
extension was designed to help remove the duplication. It allows us to write the above lambda more concisely, like so:
\case
True -> 1
False -> 2
So, it's a pretty simple (yet useful) syntactic transformation. If we want Axel to have the same feature, it could look like the following:
-- Before
(\ (x) (case x
((True) 1)
((False) 2)
-- After
(\case
((True) 1)
((False) 2)
The rule to implement this feature is pretty simple: If you see (\case <case 1, case 2, ...>)
, replace that with (\ (<identifier>) (case <identifier> <case 1, case 2, ...>)
, where <identifier>
is some as-of-yet unused identifier.
Let's start our macro definition like usual:
(defmacro \case
((cases) undefined))
Now, we'll fill in the basic details, based on the discussion above. Since we want to return a list, we'll use return
to lift our result into IO
and then wrap our result with the SExpression
constructor.
(defmacro \case
((cases) (return [(SExpression TODO)])))
We want to fill in TODO
with \ (<some identifier> (case <some identifier> <cases>)
. Since we want to use \
and case
as symbols (instead of using them to refer to what they mean at runtime), we'll quote them with the '
operator.
(defmacro \case
((cases) (return [(SExpression ['\ (SExpression [TODO])
(SExpression ['case TODO TODO])])])))
Now, we need to somehow get an unused identifier that we can use as the variable in our lambda. Luckily, Axel provides the gensym
helper function, which is designed just for this! Because gensym
runs in the IO
monad (gensym
is of type IO Expression
because the Symbol
it creates must be unique), our code will as well.
(defmacro \case
((cases)
(do
<- identifier gensym
(return [(SExpression ['\ (SExpression [identifier])
(SExpression ['case identifier TODO])])]))))
And now, for the final TODO
: the cases themselves.
The cases
parameter of our macro is a list of Expression
s that we want to splice directly into our output (that is, we're looking for (case <identifier> <case 1> <case 2> <...>)
rather than (case <identifier> (<case 1> <case 2> <...>))
). So, we can finish our macro's definition like so:
(defmacro \case
((cases)
(do
<- identifier gensym
(return [(SExpression ['\ (SExpression [identifier])
(SExpression (: 'case (: identifier cases)))])]))))
And, there we have it!
To be honest, that was a bit tedious to write, and it's not the easiest to understand. Manually constructing our SExpression
s adds a lot of extra noise to our macro definition. Fortunately, though, there's a better way!
Meet quasiquote
, quote
's big sibling. It behaves just like quote
on normal expressions:
(quasiquote 1) --=> (LiteralInt 1)
(quasiquote (1 2 3)) --=> (SExpression [(LiteralInt 1) (LiteralInt 2) (LiteralInt 3)])
However, quasiquote
has some sidekicks that give it superpowers. If you wrap quasiquote
's argument with unquote
, it will undo the quasiquotation for just that section. For example:
(quasiquote 1) --=> (LiteralInt 1)
(quasiquote (unquote 1)) --=> 1
(quasiquote (1 (LiteralInt 2) 3)) --=> (SExpression [(LiteralInt 1) (SExpression [(Symbol "LiteralInt") (LiteralInt 2)]) (LiteralInt 3)])
(quasiquote (1 (unquote (LiteralInt 2)) 3)) --=> (SExpression [(LiteralInt 1) (LiteralInt 2) (LiteralInt 3)])
There's also unquote-splicing
, which not only undoes quasiquotation but also "splices" the result directly into its enclosing s-expression. For example:
(quasiquote (1 (unquote (SExpression [(LiteralInt 2) (LiteralInt 3)])))) --=> (SExpression [(LiteralInt 1) (SExpression [(LiteralInt 2) (LiteralInt 3)])])
(quasiquote (1 (unquote-splicing (SExpression [(LiteralInt 2) (LiteralInt 3)]))) --=> (SExpression [(LiteralInt 1) (LiteralInt 2) (LiteralInt 3)])
In effect, quasiquote
acts as a sort of templating system for creating Axel expressions, where you selectively choose what parts should be evaluated at compile time.
One last note: quasiquoting, like quoting, is common enough that it gets special syntax. While quote
becomes '
, quasiquote
becomes \`` (e.g.
(quasiquote (1 2 3))is usually written
`(1 2 3)).
unquotecan be written with a comma (e.g.
(unquote 1)is usually written
,1), and
unquote-splicingcan be written with a comma and then an at-sign (e.g.
(unquote-splicing (1 2 3))is usually written
,@(1 2 3)`).
Now, we have all the building blocks we need to clean up our macro definition from earlier. Here's the original definition, for reference:
(defmacro \case
((cases)
(do
<- identifier gensym
(return [(SExpression ['\ (SExpression [identifier])
(SExpression (: 'case (: identifier cases)))])]))))
And here's a version which uses quasiquote
, unquote
, and unquote-splicing
:
(defmacro \case
((cases)
(do
<- identifier gensym
(return `(\ (,identifier)
(case ,identifier
,@cases)))))
Notice how the second representation closely mirrors the final output (much more so than our first try does). Also, try to figure out how we derived the second representation from the first one. While it might be difficult at first, getting a hang of quoting and quasiquoting just takes practice.
** TODO Add more examples
** do-notation
This is an excellent example of a core syntax feature that need not be baked in to the compiler at all.
*** TODO Link to issue
** Infix Function Application
Haskell's common use of infix operators (everything from Prelude.(+) to Control.Lens.(^.)) can translate to excessive boilerplate when converted to prefix notation. For example:
#+BEGIN_SRC haskell
1 + 2 + 3
#+END_SRC
becomes
#+BEGIN_SRC lisp
(+ (+ 1 2) 3)
#+END_SRC
The basic implementation might not be difficult, but retaining precedence and associativity could be. That could be ignored in favor of left-to-right order, for example, but that would break a lot of existing code and add a lot of otherwise unnecessary parentheses.
*** TODO Give potential Axel implementation
- Questions
** "Quasiquote" vs "Backquote" vs "Syntax Quote"
TODO Elaborate
** Variadic Arguments
In Common Lisp, macros (like the common
with-gensyms) can take a variadic number of arguments through the&bodykeyword. The use case of requiring an unknown number of arguments would normally require wrapping them in a list (so that they become a single s-expression), but an auto-wrapping mechanism (like&body) may reduce syntactic noise when calling such macros. ** Side Effects Should macros be able to returnIOvalues? On the one hand, this adds great flexibility. On the other, however, it makes macro behavior nondeterministic (and has the same drawbacks as normal, impure code). ** Return Values Should macros returnExpressions or[Expression]s? In Template Haskell, functions run at the top level must return a list of declarations while functions inside expressions, for example, must return just a single expression. Common Lisp, on the other hand, forces the return of only one value but provides thebeginspecial form (e.g. for creating multiple top-level statements, such as function definitions). ** Reader Macros How important are reader macros? They /would/ grant the ability to implement many more extensions (e.g. those which modify literal syntax). However, this could add significantly more complexity. Clojure does not have user-implementable reader macros, so there's something to be said for deferring them indefinitely. ** Helper Scope Should macros be able to automatically access functions which do not depend on them or their consumers? Alternatively, Axel could follow Racket's approach, which has a [[http://blog.ezyang.com/2016/07/what-template-haskell-gets-wrong-and-racket-gets-right/][separate form for defining compile-time functions]]. If there is no hard stage restriction, the advantage is a lot of extra flexibility. However, some potential drawbacks come to mind. For example, if functions require functions from external modules, that could complicate macro expansion (depending on the implementation used). Also, macros could unexpectedly break if a macro becomes used, directly or indirectly, by one of the functions it depends on (again, it is unknown how likely this situation would be in practice). ** Hygiene Racket and Scheme are made distinct from Clojure and Common Lisp, for example, by their "hygenic" macro system (for example, symbols inside their macros are automaticallygensym-ed). They also contain several features which help with the ergonomics of macro-writing, such assyntax-rules. In Axel's specific case, this more complicated system will likely be more difficult to implement. So, for the sake of creating a PoC, a non-hygenic CL-esque macro system will probably come first. However, there will definitely be possibility for expansion in the future. *** Whole-File Wrapping Haskell programmers make liberal use of various GHC extensions; however, in derivatives such as PureScript, these are (obviously) not all available. Have no fear, though: Axel is here! TheUnicodeSyntaxGHC extension could be implemented as a macro which traverses the AST and replaces any Unicode characters with the appropriate substitutions (as it would be a first-class construct in Axel, it could even be user-extensible). For another example,ApplicativeDowould similarly be a user-land extension rather than a built-in extension. One problem with this approach, however, is that the entire source file would be need to wrapped with the appropriate macro, like so: #+BEGIN_SRC lisp (unicode-syntax ...) #+END_SRC This extra level of nesting may not be desirable, especially if multiple "extension" macros are required: #+BEGIN_SRC lisp (unicode-syntax (applicative-do (...))) #+END_SRC One solution is to break convention and denest the bodies of these macro calls, like so: #+BEGIN_SRC lisp (unicode-syntax (applicative-do ...)) #+END_SRC This may turn out to be acceptable. An alternative solution (which /would/ require a change to the compiler itself, to implementwrap-file) is as follows: #+BEGIN_SRC lisp (wrap-file unicode-syntax) (wrap-file applicative-do) ... #+END_SRC This would most likely desugar to the explicit, nested example. Racket's#langdeclaration, for example, takes the same approach:#langat the top of a file is converted to the appropriate(module ...)call around the rest of the file's contents. As Axel progresses in development and usage, the answer may become clearer. (I'm sure there are plenty of other ways to do this as well, so if you have ideas of your own, /please/ don't hesitate to chime in!) **defmacrodefmacros contain implicit type signatures, namely(-> (List Expression) (List Expression)). *** TODO Shoulddefmacrobe automatically "variadic"? *** TODO Shoulddefmacro's definition be anaphoric (e.g. have a predefinedexprsargument)? Maybe that's the automatic argument name if none is provided, but that would remove the ability to define point-free macros. ** Unique Identifier Names TODO Should macros run insideIOso that they can generate random identifiers (via agensymhelper function)?