diff --git a/src/macro-expansion.md b/src/macro-expansion.md index 12b95cb6c..a7777e80c 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -1 +1,161 @@ # Macro expansion + +Macro expansion happens during parsing. `rustc` has two parsers, in fact: the +normal Rust parser, and the macro parser. During the parsing phase, the normal +Rust parser will set aside the contents of macros and their invokations. Later, +before name resolution, macros are expanded using these portions of the code. +The macro parser, in turn, may call the normal Rust parser when it needs to +bind a metavariable (e.g. `$my_expr`) while parsing the contents of a macro +invocation. The code for macro expansion is in +[`src/libsyntax/ext/tt/`][code_dir]. This chapter aims to explain how macro +expansion works. + +### Example + +It's helpful to have an example to refer to. For the remainder of this chapter, +whenever we refer to the "example _definition_", we mean the following: + +```rust +macro_rules! printer { + (print $mvar:ident) => { + println!("{}", $mvar); + } + (print twice $mvar:ident) => { + println!("{}", $mvar); + println!("{}", $mvar); + } +} +``` + +`$mvar` is called a _metavariable_. Unlike normal variables, rather than +binding to a value in a computation, a metavariable binds _at compile time_ to +a tree of _tokens_. A _token_ is a single "unit" of the grammar, such as an +identifier (e.g., `foo`) or punctuation (e.g., `=>`). There are also other +special tokens, such as `EOF`, which indicates that there are no more tokens. +Token trees resulting from paired parentheses-like characters (`(`...`)`, +`[`...`]`, and `{`...`}`) -- they include the open and close and all the tokens +in between (we do require that parentheses-like characters be balanced). Having +macro expansion operate on token streams rather than the raw bytes of a source +file abstracts away a lot of complexity. The macro expander (and much of the +rest of the compiler) doesn't really care that much about the exact line and +column of some syntactic construct in the code; it cares about what constructs +are used in the code. Using tokens allows us to care about _what_ without +worrying about _where_. For more information about tokens, see the +[Parsing][parsing] chapter of this book. + +Whenever we refer to the "example _invocation_", we mean the following snippet: + +```rust +printer!(print foo); // Assume `foo` is a variable defined somewhere else... +``` + +The process of expanding the macro invocation into the syntax tree +`println!("{}", foo)` and then expanding that into a call to `Display::fmt` is +called _macro expansion_, and it is the topic of this chapter. + +### The macro parser + +There are two parts to macro expansion: parsing the definition and parsing the +invocations. Interestingly, both are done by the macro parser. + +Basically, the macro parser is like an NFA-based regex parser. It uses an +algorithm similar in spirit to the [Earley parsing +algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is +defined in [`src/libsyntax/ext/tt/macro_parser.rs`][code_mp]. + +The interface of the macro parser is as follows (this is slightly simplified): + +```rust +fn parse( + sess: ParserSession, + tts: TokenStream, + ms: &[TokenTree] +) -> NamedParseResult +``` + +In this interface: + +- `sess` is a "parsing session", which keeps track of some metadata. Most + notably, this is used to keep track of errors that are generated so they can + be reported to the user. +- `tts` is a stream of tokens. The macro parser's job is to consume the raw + stream of tokens and output a binding of metavariables to corresponding token + trees. +- `ms` a _matcher_. This is a sequence of token trees that we want to match + `tts` against. + +In the analogy of a regex parser, `tts` is the input and we are matching it +against the pattern `ms`. Using our examples, `tts` could be the stream of +tokens containing the inside of the example invocation `print foo`, while `ms` +might be the sequence of token (trees) `print $mvar:ident`. + +The output of the parser is a `NamedParserResult`, which indicates which of +three cases has occured: + +- Success: `tts` matches the given matcher `ms`, and we have produced a binding + from metavariables to the corresponding token trees. +- Failure: `tts` does not match `ms`. This results in an error message such as + "No rule expected token _blah_". +- Error: some fatal error has occured _in the parser_. For example, this happens + if there are more than one pattern match, since that indicates the macro is + ambiguous. + +The full interface is defined [here][code_parse_int]. + +The macro parser does pretty much exactly the same as a normal regex parser with +one exception: in order to parse different types of metavariables, such as +`ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the +normal Rust parser. + +As mentioned above, both definitions and invocations of macros are parsed using +the macro parser. This is extremely non-intuitive and self-referential. The code +to parse macro _definitions_ is in +[`src/libsyntax/ext/tt/macro_rules.rs`][code_mr]. It defines the pattern for +matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words, +a `macro_rules` defintion should have in its body at least one occurence of a +token tree followed by `=>` followed by another token tree. When the compiler +comes to a `macro_rules` definition, it uses this pattern to match the two token +trees per rule in the definition of the macro _using the macro parser itself_. +In our example definition, the metavariable `$lhs` would match the patterns of +both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` +would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{ +println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this +knowledge around for when it needs to expand a macro invocation. + +When the compiler comes to a macro invocation, it parses that invocation using +the same NFA-based macro parser that is described above. However, the matcher +used is the first token tree (`$lhs`) extracted from the arms of the macro +_definition_. Using our example, we would try to match the token stream `print +foo` from the invocation against the matchers `print $mvar:ident` and `print +twice $mvar:ident` that we previously extracted from the definition. The +algorithm is exactly the same, but when the macro parser comes to a place in the +current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`), +it calls back to the normal Rust parser to get the contents of that +non-terminal. In this case, the Rust parser would look for an `ident` token, +which it finds (`foo`) and returns to the macro parser. Then, the macro parser +proceeds in parsing as normal. Also, note that exactly one of the matchers from +the various arms should match the invocation (otherwise, the macro is +ambiguous). + +For more information about the macro parser's implementation, see the comments +in [`src/libsyntax/ext/tt/macro_parser.rs`][code_mp]. + +### Hygiene + +TODO + +### Procedural Macros + +TODO + +### Custom Derive + +TODO + + + +[code_dir]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt +[code_mp]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt/macro_parser.rs +[code_mp]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt/macro_rules.rs +[code_parse_int]: https://github.com/rust-lang/rust/blob/a97cd17f5d71fb4ec362f4fbd79373a6e7ed7b82/src/libsyntax/ext/tt/macro_parser.rs#L421 +[parsing]: ./the-parser.md