Skip to content

Introduction to Macros

Joshua Grosso edited this page Feb 1, 2018 · 23 revisions

Metaprogramming can be very useful in reducing or removing boilerplate code. However, some languages have less-than-satisfactory solutions while others have none at all. Template Haskell, for example, has several major problems (e.g. it is notoriously unstable between GHC releases, according to the 2017 State of Haskell), and PureScript doesn't have a macro-like system in the first place.

However, by following in Lisp's footsteps and adopting a homoiconic syntax (yay, parentheses!), Axel is equipped with a simple yet powerful metaprogramming system. Unlike in most languages, Axel source code is effectively a textual representation of the corresponding abstract syntax tree (AST) that the transpiler uses to represent the program internally. Macros, then, are just functions that take lists of Axel expressions and return lists of other Axel expressions. The Axel transpiler runs these macros at compile-time, allowing for code that creates code.

For an example of the power that macros provide us with, let's try to implement the LambdaCase GHC language extension as a macro ourselves.

Quoting

First, for a brief introduction to Axel's metaprogramming abilities. Let's use the following Axel expression as our example:

(+ 1 2)

This applies the + function to two arguments, 1 and 2. Internally, Axel represents this not as a function application per se, but rather as an s-expression (the Lisp name for a parenthesized list) containing the symbols +, 1, and 2. The cool part about this, which Axel inherits from Lisp, is that by passing the expression to the quote "special form," we can get that internal representation and manipulate it. For example:

(quote (+ 1 2)) --=> (SExpression [(Symbol "+") (LiteralInt 1) (LiteralInt 2)])

You can quote any expression you'd like:

(quote 1) -- => (LiteralInt 1)
(quote "test") --=> (LiteralString "test")

You can even quote another application of quote:

(quote (quote 1)) -- (SExpression [(Symbol "quote") (LiteralInt 1)])

Once we have the quoted form of an expression, we can manipulate it like any other data structure, like so:

(case (quote (+ 1 2)))
  ((SExpression sexp) (SExpression (reverse sexp))))
--=> (SExpression [(LiteralInt 2) (LiteralInt 1) (Symbol "+")])

Macros

Let's define a function, backwards, to reverse whatever Axel expression we pass in.

(= backwards (-> Expression Expression) -- It will take an Axel expression and return an Axel expression
  ((expression)

Now, an Axel Expression can be one of several cases: LiteralChar Char, LiteralInt Int, LiteralString String, SExpression [Expression], or Symbol String. The only thing we can reverse is an SExpression, so let's return the input unchanged for all other cases.

   (case expression
     ((LiteralChar _) expression)
     ((LiteralInt _) expression)
     ((LiteralString _) expression)
     ((Symbol _) expression)

For the case that we're passed an SExpression, however, let's extract the list of Expressions inside it and reverse it. We'll have to wrap the new list in an SExpression, since our function expects an Expression to be returned.

     ((SExpression expressions) (SExpression (reverse expressions))))))

Now, we can run our backwards function like so:

-- We have to quote `(2 1 +)` because otherwise it would be evaluated (which is not what we want).
-- Rather, we want the internal representation of `(+ 2 1)` to be passed to `backwards` (which requires a value of type `Expression`).
(backwards '(+ 2 1)) --=> (SExpression [(Symbol "+") (LiteralInt 1) (LiteralInt 2)])

When we evaluate backwards, we get back a new Expression. So, we now have the data representing our new code, but how can we actually execute it to get 3?

Macros to the rescue! Currently, backwards is just a plain old Axel function. But, Axel provides an alternative to normal function definition (=) called defmacro, and it lets us define a function that will actually be run at compile-time. Macros take a list of Expressions and return a list of Expressions. Macro calls in an Axel program are replaced by the results they return.

To see this in action, let's turn backwards into a macro. defmacro doesn't require a type signature to be passed, since it automatically uses the type [Expression] -> IO [Expression]. You'll notice that backwards is a partial function, since it wouldn't be able to handle being passed more than one argument. However, this isn't the worry it would usually be if this were a normal function, since errors will be thrown at compile-time and fixed before the program is actually run.

(defmacro backwards
  ((list expression) 

Because backwards will not have any side-effects, we will return whatever our result will be (since we need to return a value of type IO [Expression] rather than just [Expression]).

   (return

Also, we'll wrap the whole thing in a list, because (again) we're looking to return a list of Expressions rather than just a single Expression. From here, our definition is the same as before:

    [(case expression
      ((LiteralChar _) expression)
      ((LiteralInt _) expression)
      ((LiteralString _) expression)
      ((SExpression expressions) (SExpression (reverse expressions)))
      ((Symbol _) expression))])))

Now, if we actually call backwards in our code, the macro call will be "expanded" during compilation. Also, if you remember from when backwards was a function, we had to explicitly quote our input for it not to be evaluated. However, when passing a form to a macro, it will be automatically quoted. This helps macros remain unobtrusive and makes them feel more like first-class syntactic constructs.

To see how our macro will actually be evaluated, let's walk through it step-by-step. If the program we provide to Axel looks like this:

(= main (IO Unit)
  (() (backwards ("Hello, world!" putStrLn))))

backwards will be executed as the program compiles, after which the program will look like this:

(= main (IO Unit)
  (() (putStrLn "Hello, world!")))

And, when the program is executed at the end of this process, the output will be, as expected:

"Hello, world!"

Congratulations, you've created your very first macro!

Quasiquoting

Now, let's try to create a more complicated macro and implement GHC's LambdaCase extension from scratch. We're going to write our macro using what we know now, and then we'll take a look at "quasiquoting" and see how it helps us make our macros more concise. Finally, we'll see it in action by using it to clean up our example.

For some context, take the following Haskell code:

\x -> case x of
        True -> 1
        False -> 2

Notice that x is immediately passed to the case expression and never used again. This happens commonly enough that the LambdaCase extension was designed to help remove the duplication. It allows us to write the above lambda more concisely, like so:

\case
  True -> 1
  False -> 2

So, it's a pretty simple (yet useful) syntactic transformation. If we want Axel to have the same feature, it could look like the following:

-- Before
(\ (x) (case x
         ((True) 1)
         ((False) 2)

-- After
(\case
  ((True) 1)
  ((False) 2)

The rule to implement this feature is pretty simple: If you see (\case <case 1, case 2, ...>), replace that with (\ (<identifier>) (case <identifier> <case 1, case 2, ...>), where <identifier> is some as-of-yet unused identifier.

Let's start our macro definition like usual:

(defmacro \case
  ((cases) undefined))

Now, we'll fill in the basic details, based on the discussion above. Since we want to return a list, we'll use return to lift our result into IO and then wrap our result with the SExpression constructor.

(defmacro \case
  ((cases) (return [(SExpression TODO)])))

We want to fill in TODO with \ (<some identifier> (case <some identifier> <cases>). Since we want to use \ and case as symbols (instead of using them to refer to what they mean at runtime), we'll quote them with the ' operator.

(defmacro \case
  ((cases) (return [(SExpression ['\ (SExpression [TODO])
                                     (SExpression ['case TODO TODO])])])))

Now, we need to somehow get an unused identifier that we can use as the variable in our lambda. Luckily, Axel provides the gensym helper function, which is designed just for this! Because gensym runs in the IO monad (gensym is of type IO Expression because the Symbol it creates must be unique), our code will as well.

(defmacro \case
  ((cases)
   (do
     <- identifier gensym
     (return [(SExpression ['\ (SExpression [identifier])
                               (SExpression ['case identifier TODO])])]))))

And now, for the final TODO: the cases themselves. The cases parameter of our macro is a list of Expressions that we want to splice directly into our output (that is, we're looking for (case <identifier> <case 1> <case 2> <...>) rather than (case <identifier> (<case 1> <case 2> <...>))). So, we can finish our macro's definition like so:

(defmacro \case
  ((cases)
   (do
    <- identifier gensym
    (return [(SExpression ['\ (SExpression [identifier])
                              (SExpression (: 'case (: identifier cases)))])]))))

And, there we have it!

To be honest, that was a bit tedious to write, and it's not the easiest to understand. Manually constructing our SExpressions adds a lot of extra noise to our macro definition. Fortunately, though, there's a better way!

Meet quasiquote, quote's big sibling. It behaves just like quote on normal expressions:

(quasiquote 1) --=> (LiteralInt 1)
(quasiquote (1 2 3)) --=> (SExpression [(LiteralInt 1) (LiteralInt 2) (LiteralInt 3)])

However, quasiquote has some sidekicks that give it superpowers. If you wrap quasiquote's argument with unquote, it will undo the quasiquotation for just that section. For example:

(quasiquote 1) --=> (LiteralInt 1)
(quasiquote (unquote 1)) --=> 1

(quasiquote (1 (LiteralInt 2) 3)) --=> (SExpression [(LiteralInt 1) (SExpression [(Symbol "LiteralInt") (LiteralInt 2)]) (LiteralInt 3)])
(quasiquote (1 (unquote (LiteralInt 2)) 3)) --=> (SExpression [(LiteralInt 1) (LiteralInt 2) (LiteralInt 3)])

There's also unquote-splicing, which not only undoes quasiquotation but also "splices" the result directly into its enclosing s-expression. For example:

(quasiquote (1 (unquote (SExpression [(LiteralInt 2) (LiteralInt 3)])))) --=> (SExpression [(LiteralInt 1) (SExpression [(LiteralInt 2) (LiteralInt 3)])])
(quasiquote (1 (unquote-splicing (SExpression [(LiteralInt 2) (LiteralInt 3)]))) --=> (SExpression [(LiteralInt 1) (LiteralInt 2) (LiteralInt 3)])

In effect, quasiquote acts as a sort of templating system for creating Axel expressions, where you selectively choose what parts should be evaluated at compile time.

One last note: quasiquoting, like quoting, is common enough that it gets special syntax. While quote becomes ', quasiquote becomes \`` (e.g. (quasiquote (1 2 3))is usually written`(1 2 3)). unquotecan be written with a comma (e.g.(unquote 1)is usually written,1), and unquote-splicingcan be written with a comma and then an at-sign (e.g.(unquote-splicing (1 2 3))is usually written,@(1 2 3)`).

Now, we have all the building blocks we need to clean up our macro definition from earlier. Here's the original definition, for reference:

(defmacro \case
  ((cases)
   (do
    <- identifier gensym
    (return [(SExpression ['\ (SExpression [identifier])
                              (SExpression (: 'case (: identifier cases)))])]))))

And here's a version which uses quasiquote, unquote, and unquote-splicing:

(defmacro \case
  ((cases)
   (do
    <- identifier gensym
    (return `(\ (,identifier)
                (case ,identifier
                      ,@cases)))))

Notice how the second representation closely mirrors the final output (much more so than our first try does). Also, try to figure out how we derived the second representation from the first one. While it might be difficult at first, getting a hang of quoting and quasiquoting just takes practice.

** TODO Add more examples ** do-notation This is an excellent example of a core syntax feature that need not be baked in to the compiler at all. *** TODO Link to issue ** Infix Function Application Haskell's common use of infix operators (everything from Prelude.(+) to Control.Lens.(^.)) can translate to excessive boilerplate when converted to prefix notation. For example: #+BEGIN_SRC haskell 1 + 2 + 3 #+END_SRC becomes #+BEGIN_SRC lisp (+ (+ 1 2) 3) #+END_SRC The basic implementation might not be difficult, but retaining precedence and associativity could be. That could be ignored in favor of left-to-right order, for example, but that would break a lot of existing code and add a lot of otherwise unnecessary parentheses. *** TODO Give potential Axel implementation

  • Questions ** "Quasiquote" vs "Backquote" vs "Syntax Quote" TODO Elaborate ** Variadic Arguments In Common Lisp, macros (like the common with-gensyms) can take a variadic number of arguments through the &body keyword. The use case of requiring an unknown number of arguments would normally require wrapping them in a list (so that they become a single s-expression), but an auto-wrapping mechanism (like &body) may reduce syntactic noise when calling such macros. ** Side Effects Should macros be able to return IO values? On the one hand, this adds great flexibility. On the other, however, it makes macro behavior nondeterministic (and has the same drawbacks as normal, impure code). ** Return Values Should macros return Expression s or [Expression] s? In Template Haskell, functions run at the top level must return a list of declarations while functions inside expressions, for example, must return just a single expression. Common Lisp, on the other hand, forces the return of only one value but provides the begin special form (e.g. for creating multiple top-level statements, such as function definitions). ** Reader Macros How important are reader macros? They /would/ grant the ability to implement many more extensions (e.g. those which modify literal syntax). However, this could add significantly more complexity. Clojure does not have user-implementable reader macros, so there's something to be said for deferring them indefinitely. ** Helper Scope Should macros be able to automatically access functions which do not depend on them or their consumers? Alternatively, Axel could follow Racket's approach, which has a [[http://blog.ezyang.com/2016/07/what-template-haskell-gets-wrong-and-racket-gets-right/][separate form for defining compile-time functions]]. If there is no hard stage restriction, the advantage is a lot of extra flexibility. However, some potential drawbacks come to mind. For example, if functions require functions from external modules, that could complicate macro expansion (depending on the implementation used). Also, macros could unexpectedly break if a macro becomes used, directly or indirectly, by one of the functions it depends on (again, it is unknown how likely this situation would be in practice). ** Hygiene Racket and Scheme are made distinct from Clojure and Common Lisp, for example, by their "hygenic" macro system (for example, symbols inside their macros are automatically gensym-ed). They also contain several features which help with the ergonomics of macro-writing, such as syntax-rules. In Axel's specific case, this more complicated system will likely be more difficult to implement. So, for the sake of creating a PoC, a non-hygenic CL-esque macro system will probably come first. However, there will definitely be possibility for expansion in the future. *** Whole-File Wrapping Haskell programmers make liberal use of various GHC extensions; however, in derivatives such as PureScript, these are (obviously) not all available. Have no fear, though: Axel is here! The UnicodeSyntax GHC extension could be implemented as a macro which traverses the AST and replaces any Unicode characters with the appropriate substitutions (as it would be a first-class construct in Axel, it could even be user-extensible). For another example, ApplicativeDo would similarly be a user-land extension rather than a built-in extension. One problem with this approach, however, is that the entire source file would be need to wrapped with the appropriate macro, like so: #+BEGIN_SRC lisp (unicode-syntax ...) #+END_SRC This extra level of nesting may not be desirable, especially if multiple "extension" macros are required: #+BEGIN_SRC lisp (unicode-syntax (applicative-do (...))) #+END_SRC One solution is to break convention and denest the bodies of these macro calls, like so: #+BEGIN_SRC lisp (unicode-syntax (applicative-do ...)) #+END_SRC This may turn out to be acceptable. An alternative solution (which /would/ require a change to the compiler itself, to implement wrap-file) is as follows: #+BEGIN_SRC lisp (wrap-file unicode-syntax) (wrap-file applicative-do) ... #+END_SRC This would most likely desugar to the explicit, nested example. Racket's #lang declaration, for example, takes the same approach: #lang at the top of a file is converted to the appropriate (module ...) call around the rest of the file's contents. As Axel progresses in development and usage, the answer may become clearer. (I'm sure there are plenty of other ways to do this as well, so if you have ideas of your own, /please/ don't hesitate to chime in!) ** defmacro defmacro s contain implicit type signatures, namely (-> (List Expression) (List Expression)). *** TODO Should defmacro be automatically "variadic"? *** TODO Should defmacro's definition be anaphoric (e.g. have a predefined exprs argument)? Maybe that's the automatic argument name if none is provided, but that would remove the ability to define point-free macros. ** Unique Identifier Names TODO Should macros run inside IO so that they can generate random identifiers (via a gensym helper function)?
Clone this wiki locally