diff --git a/src/SUMMARY.md b/src/SUMMARY.md index bf479ca31..dc38ff669 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -111,8 +111,8 @@ - [Constant Evaluation](const_eval.md) -[Appendix: Influences](influences.md) - -[Appendix: As-yet-undocumented Features](undocumented.md) - -[Appendix: Glossary](glossary.md) +- [Appendices](appendices.md) + - [Macro Follow-Set Ambiguity Formal Specification](macro-ambiguity.md) + - [Influences](influences.md) + - [As-Yet-Undocumented Features](undocumented.md) + - [Glossary](glossary.md) diff --git a/src/appendices.md b/src/appendices.md new file mode 100644 index 000000000..28acb81ce --- /dev/null +++ b/src/appendices.md @@ -0,0 +1 @@ +# Appendices diff --git a/src/attributes.md b/src/attributes.md index 7454789cf..85c540e9c 100644 --- a/src/attributes.md +++ b/src/attributes.md @@ -175,8 +175,6 @@ which can be used to control type layout. macros named. The `extern crate` must appear at the crate root, not inside `mod`, which ensures proper function of the `$crate` macro variable. -- `macro_reexport` on an `extern crate` — re-export the named macros. - - `macro_export` - export a `macro_rules` macro for cross-crate usage. - `no_link` on an `extern crate` — even if we load this crate for macros, don't diff --git a/src/macro-ambiguity.md b/src/macro-ambiguity.md new file mode 100644 index 000000000..582ee684d --- /dev/null +++ b/src/macro-ambiguity.md @@ -0,0 +1,378 @@ +# Appendix: Macro Follow-Set Ambiguity Formal Specification + +This page documents the formal specification of the follow rules for [Macros +By Example]. They were originally specified in [RFC 550], from which the bulk +of this text is copied, and expanded upon in subsequent RFCs. + +## Definitions & Conventions + + - `macro`: anything invokable as `foo!(...)` in source code. + - `MBE`: macro-by-example, a macro defined by `macro_rules`. + - `matcher`: the left-hand-side of a rule in a `macro_rules` invocation, or a + subportion thereof. + - `macro parser`: the bit of code in the Rust parser that will parse the + input using a grammar derived from all of the matchers. + - `fragment`: The class of Rust syntax that a given matcher will accept (or + "match"). + - `repetition` : a fragment that follows a regular repeating pattern + - `NT`: non-terminal, the various "meta-variables" or repetition matchers + that can appear in a matcher, specified in MBE syntax with a leading `$` + character. + - `simple NT`: a "meta-variable" non-terminal (further discussion below). + - `complex NT`: a repetition matching non-terminal, specified via repetition + operators (`\*`, `+`, `?`). + - `token`: an atomic element of a matcher; i.e. identifiers, operators, + open/close delimiters, *and* simple NT's. + - `token tree`: a tree structure formed from tokens (the leaves), complex + NT's, and finite sequences of token trees. + - `delimiter token`: a token that is meant to divide the end of one fragment + and the start of the next fragment. + - `separator token`: an optional delimiter token in an complex NT that + separates each pair of elements in the matched repetition. + - `separated complex NT`: a complex NT that has its own separator token. + - `delimited sequence`: a sequence of token trees with appropriate open- and + close-delimiters at the start and end of the sequence. + - `empty fragment`: The class of invisible Rust syntax that separates tokens, + i.e. whitespace, or (in some lexical contexts), the empty token sequence. + - `fragment specifier`: The identifier in a simple NT that specifies which + fragment the NT accepts. + - `language`: a context-free language. + +Example: + +```rust,compile_fail +macro_rules! i_am_an_mbe { + (start $foo:expr $($i:ident),* end) => ($foo) +} +``` + +`(start $foo:expr $($i:ident),\* end)` is a matcher. The whole matcher is a +delimited sequence (with open- and close-delimiters `(` and `)`), and `$foo` +and `$i` are simple NT's with `expr` and `ident` as their respective fragment +specifiers. + +`$(i:ident),\*` is *also* an NT; it is a complex NT that matches a +comma-separated repetition of identifiers. The `,` is the separator token for +the complex NT; it occurs in between each pair of elements (if any) of the +matched fragment. + +Another example of a complex NT is `$(hi $e:expr ;)+`, which matches any +fragment of the form `hi ; hi ; ...` where `hi ;` occurs at +least once. Note that this complex NT does not have a dedicated separator +token. + +(Note that Rust's parser ensures that delimited sequences always occur with +proper nesting of token tree structure and correct matching of open- and +close-delimiters.) + +We will tend to use the variable "M" to stand for a matcher, variables "t" and +"u" for arbitrary individual tokens, and the variables "tt" and "uu" for +arbitrary token trees. (The use of "tt" does present potential ambiguity with +its additional role as a fragment specifier; but it will be clear from context +which interpretation is meant.) + +"SEP" will range over separator tokens, "OP" over the repetition operators +`\*`, `+`, and `?`, "OPEN"/"CLOSE" over matching token pairs surrounding a +delimited sequence (e.g. `[` and `]`). + +Greek letters "α" "β" "γ" "δ" stand for potentially empty token-tree sequences. +(However, the Greek letter "ε" (epsilon) has a special role in the presentation +and does not stand for a token-tree sequence.) + + * This Greek letter convention is usually just employed when the presence of + a sequence is a technical detail; in particular, when we wish to *emphasize* + that we are operating on a sequence of token-trees, we will use the notation + "tt ..." for the sequence, not a Greek letter. + +Note that a matcher is merely a token tree. A "simple NT", as mentioned above, +is an meta-variable NT; thus it is a non-repetition. For example, `$foo:ty` is +a simple NT but `$($foo:ty)+` is a complex NT. + +Note also that in the context of this formalism, the term "token" generally +*includes* simple NTs. + +Finally, it is useful for the reader to keep in mind that according to the +definitions of this formalism, no simple NT matches the empty fragment, and +likewise no token matches the empty fragment of Rust syntax. (Thus, the *only* +NT that can match the empty fragment is a complex NT.) This is not actually +true, because the `vis` matcher can match an empty fragment. Thus, for the +purposes of the formalism, we will treat `$v:vis` as actually being +`$($v:vis)?`, with a requirement that the matcher match an empty fragment. + +### The Matcher Invariants + +To be valid, a matcher must meet the following three invariants. The definitions +of FIRST and FOLLOW are described later. + +1. For any two successive token tree sequences in a matcher `M` (i.e. `M = ... + tt uu ...`) with `uu ...` nonempty, we must have FOLLOW(`... tt`) ∪ {ε} ⊇ + FIRST(`uu ...`). +1. For any separated complex NT in a matcher, `M = ... $(tt ...) SEP OP ...`, + we must have `SEP` ∈ FOLLOW(`tt ...`). +1. For an unseparated complex NT in a matcher, `M = ... $(tt ...) OP ...`, if + OP = `\*` or `+`, we must have FOLLOW(`tt ...`) ⊇ FIRST(`tt ...`). + +The first invariant says that whatever actual token that comes after a matcher, +if any, must be somewhere in the predetermined follow set. This ensures that a +legal macro definition will continue to assign the same determination as to +where `... tt` ends and `uu ...` begins, even as new syntactic forms are added +to the language. + +The second invariant says that a separated complex NT must use a seperator token +that is part of the predetermined follow set for the internal contents of the +NT. This ensures that a legal macro definition will continue to parse an input +fragment into the same delimited sequence of `tt ...`'s, even as new syntactic +forms are added to the language. + +The third invariant says that when we have a complex NT that can match two or +more copies of the same thing with no separation in between, it must be +permissible for them to be placed next to each other as per the first invariant. +This invariant also requires they be nonempty, which eliminates a possible +ambiguity. + +**NOTE: The third invariant is currently unenforced due to historical oversight +and significant reliance on the behaviour. It is currently undecided what to do +about this going forward. Macros that do not respect the behaviour may become +invalid in a future edition of Rust. See the [tracking issue].** + +### FIRST and FOLLOW, informally + +A given matcher M maps to three sets: FIRST(M), LAST(M) and FOLLOW(M). + +Each of the three sets is made up of tokens. FIRST(M) and LAST(M) may also +contain a distinguished non-token element ε ("epsilon"), which indicates that M +can match the empty fragment. (But FOLLOW(M) is always just a set of tokens.) + +Informally: + + * FIRST(M): collects the tokens potentially used first when matching a + fragment to M. + + * LAST(M): collects the tokens potentially used last when matching a fragment + to M. + + * FOLLOW(M): the set of tokens allowed to follow immediately after some + fragment matched by M. + + In other words: t ∈ FOLLOW(M) if and only if there exists (potentially + empty) token sequences α, β, γ, δ where: + + * M matches β, + + * t matches γ, and + + * The concatenation α β γ δ is a parseable Rust program. + +We use the shorthand ANYTOKEN to denote the set of all tokens (including simple +NTs). For example, if any token is legal after a matcher M, then FOLLOW(M) = +ANYTOKEN. + +(To review one's understanding of the above informal descriptions, the reader +at this point may want to jump ahead to the [examples of +FIRST/LAST][#examples-of-first-and-last] before reading their formal +definitions.) + +### FIRST, LAST + +Below are formal inductive definitions for FIRST and LAST. + +"A ∪ B" denotes set union, "A ∩ B" denotes set intersection, and "A \ B" +denotes set difference (i.e. all elements of A that are not present in B). + +#### FIRST + +FIRST(M) is defined by case analysis on the sequence M and the structure of its +first token-tree (if any): + + * if M is the empty sequence, then FIRST(M) = { ε }, + + * if M starts with a token t, then FIRST(M) = { t }, + + (Note: this covers the case where M starts with a delimited token-tree + sequence, `M = OPEN tt ... CLOSE ...`, in which case `t = OPEN` and thus + FIRST(M) = { `OPEN` }.) + + (Note: this critically relies on the property that no simple NT matches the + empty fragment.) + + * Otherwise, M is a token-tree sequence starting with a complex NT: `M = $( tt + ... ) OP α`, or `M = $( tt ... ) SEP OP α`, (where `α` is the (potentially + empty) sequence of token trees for the rest of the matcher). + + * Let SEP\_SET(M) = { SEP } if SEP is present and ε ∈ FIRST(`tt ...`); + otherwise SEP\_SET(M) = {}. + + * Let ALPHA\_SET(M) = FIRST(`α`) if OP = `\*` or `?` and ALPHA\_SET(M) = {} if + OP = `+`. + * FIRST(M) = (FIRST(`tt ...`) \\ {ε}) ∪ SEP\_SET(M) ∪ ALPHA\_SET(M). + +The definition for complex NTs deserves some justification. SEP\_SET(M) defines +the possibility that the separator could be a valid first token for M, which +happens when there is a separator defined and the repeated fragment could be +empty. ALPHA\_SET(M) defines the possibility that the complex NT could be empty, +meaning that M's valid first tokens are those of the following token-tree +sequences `α`. This occurs when either `\*` or `?` is used, in which case there +could be zero repetitions. In theory, this could also occur if `+` was used with +a potentially-empty repeating fragment, but this is forbidden by the third +invariant. + +From there, clearly FIRST(M) can include any token from SEP\_SET(M) or +ALPHA\_SET(M), and if the complex NT match is nonempty, then any token starting +FIRST(`tt ...`) could work too. The last piece to consider is ε. SEP\_SET(M) and +FIRST(`tt ...`) \ {ε} cannot contain ε, but ALPHA\_SET(M) could. Hence, this +definition allows M to accept ε if and only if ε ∈ ALPHA\_SET(M) does. This is +correct because for M to accept ε in the complex NT case, both the complex NT +and α must accept it. If OP = `+`, meaning that the complex NT cannot be empty, +then by definition ε ∉ ALPHA\_SET(M). Otherwise, the complex NT can accept zero +repetitions, and then ALPHA\_SET(M) = FOLLOW(`α`). So this definition is correct +with respect to \varepsilon as well. + +#### LAST + +LAST(M), defined by case analysis on M itself (a sequence of token-trees): + + * if M is the empty sequence, then LAST(M) = { ε } + + * if M is a singleton token t, then LAST(M) = { t } + + * if M is the singleton complex NT repeating zero or more times, `M = $( tt + ... ) *`, or `M = $( tt ... ) SEP *` + + * Let sep_set = { SEP } if SEP present; otherwise sep_set = {}. + + * if ε ∈ LAST(`tt ...`) then LAST(M) = LAST(`tt ...`) ∪ sep_set + + * otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt + ...`) ∪ {ε}. + + * if M is the singleton complex NT repeating one or more times, `M = $( tt ... + ) +`, or `M = $( tt ... ) SEP +` + + * Let sep_set = { SEP } if SEP present; otherwise sep_set = {}. + + * if ε ∈ LAST(`tt ...`) then LAST(M) = LAST(`tt ...`) ∪ sep_set + + * otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt + ...`) + + * if M is the singleton complex NT repeating zero or one time, `M = $( tt ...) + ?`, then LAST(M) = LAST(`tt ...`) ∪ {ε}. + + * if M is a delimited token-tree sequence `OPEN tt ... CLOSE`, then LAST(M) = + { `CLOSE` }. + + * if M is a non-empty sequence of token-trees `tt uu ...`, + + * If ε ∈ LAST(`uu ...`), then LAST(M) = LAST(`tt`) ∪ (LAST(`uu ...`) \ { ε }). + + * Otherwise, the sequence `uu ...` must be non-empty; then LAST(M) = + LAST(`uu ...`). + +### Examples of FIRST and LAST +[examples-of-first-and-last]: #examples-of-first-and-last + +Below are some examples of FIRST and LAST. +(Note in particular how the special ε element is introduced and +eliminated based on the interation between the pieces of the input.) + +Our first example is presented in a tree structure to elaborate on how +the analysis of the matcher composes. (Some of the simpler subtrees +have been elided.) + +```text +INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g + ~~~~~~~~ ~~~~~~~ ~ + | | | +FIRST: { $d:ident } { $e:expr } { h } + + +INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ + ~~~~~~~~~~~~~~~~~~ ~~~~~~~ ~~~ + | | | +FIRST: { $d:ident } { h, ε } { f } + +INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~ ~ + | | | | +FIRST: { $d:ident, ε } { h, ε, ; } { f } { g } + + +INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + | +FIRST: { $d:ident, h, ;, f } +``` + +Thus: + + * FIRST(`$($d:ident $e:expr );* $( $(h)* );* $( f ;)+ g`) = { `$d:ident`, `h`, `;`, `f` } + +Note however that: + + * FIRST(`$($d:ident $e:expr );* $( $(h)* );* $($( f ;)+ g)*`) = { `$d:ident`, `h`, `;`, `f`, ε } + +Here are similar examples but now for LAST. + + * LAST(`$d:ident $e:expr`) = { `$e:expr` } + * LAST(`$( $d:ident $e:expr );*`) = { `$e:expr`, ε } + * LAST(`$( $d:ident $e:expr );* $(h)*`) = { `$e:expr`, ε, `h` } + * LAST(`$( $d:ident $e:expr );* $(h)* $( f ;)+`) = { `;` } + * LAST(`$( $d:ident $e:expr );* $(h)* $( f ;)+ g`) = { `g` } + +### FOLLOW(M) + +Finally, the definition for FOLLOW(M) is built up as follows. pat, expr, etc. +represent simple nonterminals with the given fragment specifier. + + * FOLLOW(pat) = {`=>`, `,`, `=`, `|`, `if`, `in`}`. + + * FOLLOW(expr) = FOLLOW(stmt) = {`=>`, `,`, `;`}`. + + * FOLLOW(ty) = FOLLOW(path) = {`{`, `[`, `,`, `=>`, `:`, `=`, `>`, `>>`, `;`, + `|`, `as`, `where`, block nonterminals}. + + * FOLLOW(vis) = {`,`l any keyword or identifier except a non-raw `priv`; any + token that can begin a type; ident, ty, and path nonterminals}. + + * FOLLOW(t) = ANYTOKEN for any other simple token, including block, ident, + tt, item, lifetime, literal and meta simple nonterminals, and all terminals. + + * FOLLOW(M), for any other M, is defined as the intersection, as t ranges over + (LAST(M) \ {ε}), of FOLLOW(t). + +The tokens that can begin a type are, as of this writing, {`(`, `[`, `!`, `\*`, +`&`, `&&`, `?`, lifetimes, `>`, `>>`, `::`, any non-keyword identifier, `super`, +`self`, `Self`, `extern`, `crate`, `$crate`, `_`, `for`, `impl`, `fn`, `unsafe`, +`typeof`, `dyn`}, although this list may not be complete because people won't +always remember to update the appendix when new ones are added. + +Examples of FOLLOW for complex M: + + * FOLLOW(`$( $d:ident $e:expr )\*`) = FOLLOW(`$e:expr`) + * FOLLOW(`$( $d:ident $e:expr )\* $(;)\*`) = FOLLOW(`$e:expr`) ∩ ANYTOKEN = FOLLOW(`$e:expr`) + * FOLLOW(`$( $d:ident $e:expr )\* $(;)\* $( f |)+`) = ANYTOKEN + +### Examples of valid and invalid matchers + +With the above specification in hand, we can present arguments for +why particular matchers are legal and others are not. + + * `($ty:ty < foo ,)` : illegal, because FIRST(`< foo ,`) = { `<` } ⊈ FOLLOW(`ty`) + + * `($ty:ty , foo <)` : legal, because FIRST(`, foo <`) = { `,` } is ⊆ FOLLOW(`ty`). + + * `($pa:pat $pb:pat $ty:ty ,)` : illegal, because FIRST(`$pb:pat $ty:ty ,`) = { `$pb:pat` } ⊈ FOLLOW(`pat`), and also FIRST(`$ty:ty ,`) = { `$ty:ty` } ⊈ FOLLOW(`pat`). + + * `( $($a:tt $b:tt)* ; )` : legal, because FIRST(`$b:tt`) = { `$b:tt` } is ⊆ FOLLOW(`tt`) = ANYTOKEN, as is FIRST(`;`) = { `;` }. + + * `( $($t:tt),* , $(t:tt),* )` : legal, (though any attempt to actually use this macro will signal a local ambguity error during expansion). + + * `($ty:ty $(; not sep)* -)` : illegal, because FIRST(`$(; not sep)* -`) = { `;`, `-` } is not in FOLLOW(`ty`). + + * `($($ty:ty)-+)` : illegal, because separator `-` is not in FOLLOW(`ty`). + + * `($($e:expr)*)` : illegal, because expr NTs are not in FOLLOW(expr NT). + +[Macros by Example]: macros-by-example.html +[RFC 550]: https://github.com/rust-lang/rfcs/blob/master/text/0550-macro-future-proofing.html +[tracking issue]: https://github.com/rust-lang/rust/issues/56575 diff --git a/src/macros-by-example.md b/src/macros-by-example.md index c62e357c0..ff113765a 100644 --- a/src/macros-by-example.md +++ b/src/macros-by-example.md @@ -24,20 +24,17 @@ >       [_Token_]_except $ and delimiters_\ >    | _MacroMatcher_\ >    | `$` [IDENTIFIER] `:` _MacroFragSpec_\ ->    | `$` `(` _MacroMatch_+ `)` _MacroRepSep_? _MacroKleeneOp_ +>    | `$` `(` _MacroMatch_+ `)` _MacroRepSep_? _MacroRepOp_ > > _MacroFragSpec_ :\ >       `block` | `expr` | `ident` | `item` | `lifetime` | `literal`\ >    | `meta` | `pat` | `path` | `stmt` | `tt` | `ty` | `vis` > > _MacroRepSep_ :\ ->    [_Token_]_except delimiters and kleene operators_ +>    [_Token_]_except delimiters and repetition operators_ > -> _MacroKleeneOp_2015 :\ ->    `*` | `+` -> -> _MacroKleeneOp_2018+ :\ ->    `*` | `+` | `?` +> _MacroRepOp_2018+ :\ +>    `*` | `+` | `?`2018+ > > _MacroTranscriber_ :\ >    [_DelimTokenTree_] @@ -45,38 +42,400 @@ `macro_rules` allows users to define syntax extension in a declarative way. We call such extensions "macros by example" or simply "macros". -Macros can expand to expressions, statements, items, types, or patterns. - -The macro expander looks up macro invocations by name, and tries each macro -rule in turn. It transcribes the first successful match. Matching and -transcription are closely related to each other, and we will describe them -together. - -The macro expander matches and transcribes every token that does not begin with -a `$` literally, including delimiters. For parsing reasons, delimiters must be -balanced, but they are otherwise not special. - -In the matcher, `$` _name_ `:` _designator_ matches the nonterminal in the Rust -syntax named by _designator_. Valid designators are: - -* `item`: an [_Item_] -* `block`: a [_BlockExpression_] -* `stmt`: a [_Statement_] without the trailing semicolon -* `pat`: a [_Pattern_] -* `expr`: an [_Expression_] -* `ty`: a [_Type_] -* `ident`: an [IDENTIFIER_OR_KEYWORD] -* `path`: a [_TypePath_] style path -* `tt`: a [_TokenTree_] (a single [token] or tokens in matching delimiters `()`, `[]`, or `{}`) -* `meta`: a [_MetaItem_], the contents of an attribute -* `lifetime`: a [LIFETIME_TOKEN] -* `vis`: a [_Visibility_] qualifier -* `literal`: matches `-`?[_LiteralExpression_] +Each macro by example has a name, and one or more _rules_. Each rule has two +parts: a _matcher_, describing the syntax that it matches, and a _transcriber_, +describing the syntax that will replace a successfully matched invocation. Both +the matcher and the transcriber must be surrounded by delimiters. Macros can +expand to expressions, statements, items (including traits, impls, and foreign +items), types, or patterns. + +When a macro is invoked, the macro expander looks up macro invocations by name, +and tries each macro rule in turn. It transcribes the first successful match; if +this results in an error, then future matches are not tried. When matching, no +lookahead is performed; if the compiler cannot unambiguously determine how to +parse the macro invocation one token at a time, then it is an error. In the +following example, the compiler does not look ahead past the identifier to see +if the following token is a `)`, even though that would allow it to parse the +invocation unambiguously: + +```rust,compile_fail +macro_rules! ambiguity { + ($($i:ident)* $j:ident) => { ($($i)-*) * $j }; +} + +ambiguity!(error); // Error: local ambiguity +``` + +In both the matcher and the transcriber, the `$` token is used to invoke special +behaviours from the macro engine. Tokens that aren't part of such an invocation +are matched and transcribed literally, with one exception. The exception is that +the outer delimiters for the matcher will match any pair of delimiters. Thus, +for instance, the matcher `(())` will match `{()}` but not `{{}}`. The character +`$` cannot be matched or transcribed literally. + +## Metavariables + +In the matcher, `$` _name_ `:` _fragment-specifier_ matches a Rust syntax +fragment of the kind specified and binds it to the metavariable `$`_name_. Valid +fragment specifiers are: + + * `item`: an [_Item_] + * `block`: a [_BlockExpression_] + * `stmt`: a [_Statement_] without the trailing semicolon (except for item + statements that require semicolons) + * `pat`: a [_Pattern_] + * `expr`: an [_Expression_] + * `ty`: a [_Type_] + * `ident`: an [IDENTIFIER_OR_KEYWORD] + * `path`: a [_TypePath_] style path + * `tt`: a [_TokenTree_] (a single [token] or tokens in matching delimiters `()`, `[]`, or `{}`) + * `meta`: a [_MetaItem_], the contents of an attribute + * `lifetime`: a [LIFETIME_TOKEN] + * `vis`: a possibly empty [_Visibility_] qualifier + * `literal`: matches `-`?[_LiteralExpression_] + +In the transcriber, metavariables are referred to simply by $`_name_`, since +the fragment kind is specified in the matcher. Metavariables are replaced with +the syntax element that matched them. The keyword metavariable `$crate` can be +used to refer to the current crate; see [Hygiene] below. Metavariables can be +transcribed more than once or not at all. + +## Repititions + +In both the matcher and transcriber, repetitions are indicated by placing the +tokens to be repeated inside `$( ... )`, followed by a repetition operator, +optionally with a separator token between. The separator token can be any token +other than a delimiter or one of the repetition operators, but `;` and `,` are +the most common. For instance, `$( $i:ident ),*` represents any number of +identifiers separated by commas. Nested repititions are permitted. + +The repetition operators are `*`, which indicates any number of repetitions, +`+`, which indicates any number but at least one, and `?` which indicates an +optional fragment with zero or one occurrences. Since `?` represents at most one +occurrence, it cannot be used with a separator. + +The repeated fragment both matches and transcribes to the specified number of +the fragment, separated by the separator token. Metavariables are matched to +every repetition of their corresponding fragment. For instance, the `$( $i:ident +),*` example above matches `$i` to all of the identifiers in the list. + +During transcription, additional restrictions apply to repititions so that the +compiler knows how to expand them properly: + +1. A metavariable must appear in exactly the same number, kind, and nesting + order of repetitions in the transcriber as it did in the matcher. So for the + matcher `$( $i:ident ),*`, the transcribers `=> $i`, `=> $( $( $i)* )*`, and + `=> $( $i )+` are all illegal, but `=> { $( $i );* }` is correct and + replaces a comma-separated list of identifiers with a semicolon-separated + list. +1. Second, each repetition in the transcriber must contain at least one + metavariable to decide now many times to expand it. If multiple + metavariables appear in the same repetition, they must be bound to the same + number of fragments. For instance, `( $( $i:ident ),* ; $( $j:ident ),* ) => + ( $( ($i,$j) ),*` must bind the same number of `$i` fragments as `$j` + fragments. This means that invoking the macro with `(a, b, c; d, e, f`) is + legal and expands to `((a,d), (b,e), c,f))`, but `(a, b, c; d, e)` is + illegal because it does not have the same number. This requirement applies + to every layer of nested repetitions. + +> **Edition Differences**: The `?` repetition operator did not exist before the +> 2018 edition. Prior to the 2018 Edition, `?` was an allowed +> separator token, rather than a repetition operator. + +## Scoping, Exporting, and Importing + +For historical reasons, the scoping of macros by example does not work entirely like +items. Macros have two forms of scope: textual scope, and path-based scope. +Textual scope is based on the order that things appear in source files, or even +across multiple files, and is the default scoping. It's explained further below. +Path-based scope works exactly the same way that item scoping does. The scoping, +exporting, and importing of macros is controlled largely by attributes. + +When a macro is invoked by an unqualified identifier (not part of a multi-part +path), it's first looked up in textual scoping. If this does not yield any +results, then it is looked up in path-based scoping. If the macro's name is +qualified with a path, then it is only looked up in path-based scoping. + +```rust,ignore +use lazy_static::lazy_static; // Path-based import. + +macro_rules! lazy_static { // Textual definition. + (lazy) => {}; +} + +lazy_static!{lazy} // Textual lookup finds our macro first. +self::lazy_static!{} // Path-based lookup ignores our macro, finds imported one. +``` + +### Textual Scope + +Textual scope is based largely on the order that things appear in source files, +and works similarly to the scope of local variables declared with `let` except +it also applies at the module level. When `macro_rules!` is used to define a +macro, the macro enters the scope after the definition (note that it can still +be used recursively, since names are looked up from the invocation site), up +until its surrounding scope, typically a module, is closed. This can enter child +modules and even span across multiple files: + +```rust,ignore +//// src/lib.rs +mod has_macro { + // m!{} // Error: m is not in scope. + + macro_rules! m { + () => {}; + } + m!{} // OK: appears after declaration of m. + + mod uses_macro; +} + +// m!{} // Error: m is not in scope. + +//// src/has_macro/uses_macro.rs + +m!{} // OK: appears after delcaration of m in src/lib.rs +``` + +It is not an error to define a macro multiple times; the most recent declaration +will shadow the previous one unless it has gone out of scope. + +```rust +macro_rules! m { + (1) => {}; +} + +m!(1); + +mod inner { + m!(1); + + macro_rules! m { + (2) => {}; + } + // m!(1); // Error: no rule matches '1' + m!(2); + + macro_rules! m { + (3) => {}; + } + m!(3); +} + +m!(1); +``` + +Macros can be declared and used locally inside functions as well, and work +similarly: + +```rust +fn foo() { + // m!(); // Error: m is not in scope. + macro_rules! m { + () => {}; + } + m!(); +} + + +// m!(); // Error: m is not in scope. +``` + +The `#[macro_use]` attribute has two purposes. First, it can be used to make a +module's macro scope not end when the module is closed, by applying it to a +module: + +```rust +#[macro_use] +mod inner { + macro_rules! m { + () => {}; + } +} + +m!(); +``` + +Second, it can be used to import macros from another crate, by attaching it to +an `extern crate` declaration appearing in the crate's root module. Macros +imported this way are imported into the prelude of the crate, not textually, +which means that they can be shadowed by any other name. While macros imported +by `#[macro_use]` can be used before the import statement, in case of a +conflict, the last macro imported wins. Optionally, a list of macros to import +can be specified; this is not supported when `#[macro_use]` is applied to a +module. + +```rust,ignore +#[macro_use(lazy_static)] // Or #[macro_use] to import all macros. +extern crate lazy_static; + +lazy_static!{}; +// self::lazy_static!{} // Error: lazy_static is not defined inself +``` + +Macros to be imported with `#[macro_use]` must be exported with +`#[macro_export]`, which is described below. + +### Path-Based Scope + +By default, a macro has no path-based scope. However, if it has the +`#[macro_export]` attribute, then it is declared in the crate root scope and can +be referred to normally as such: + +```rust +self::m!(); +m!(); // OK: Path-based lookup finds m in the current module. + +mod inner { + super::m!(); + crate::m!(); +} + +mod mac { + #[macro_export] + macro_rules! m { + () => {}; + } +} +``` + +Macros labeled with `#[macro_export]` are always `pub` and can be referred to +by other crates, either by path or by `#[macro_use]` as described above. + +## Hygiene + +By default, all identifiers referred to in a macro are expanded as-is, and are +looked up at the macro's invocation site. This can lead to issues if a macro +refers to an item or macro which isn't in scope at the invocation site. To +alleviate this, the `$crate` metavariable can be used at the start of a path to +force lookup to occur inside the crate defining the macro. + +```rust,ignore +//// Definitions in the `helper_macro` crate. +#[macro_export] +macro_rules! helped { + // () => { helper!() } // This might lead to an error due to 'helper' not being in scope. + () => { $crate::helper!() } +} + +#[macro_export] +macro_rules! helper { + () => { () } +} + +//// Usage in another crate. +// Note that `helper_macro::helper` is not imported! +use helper_macro::helped; + +fn unit() { + helped!(); +} +``` + +Note that, because `$crate` refers to the current crate, it must be used with a +fully qualified module path when referring to non-macro items: + +```rust +pub mod inner { + #[macro_export] + macro_rules! call_foo { + () => { $crate::inner::foo() }; + } + + pub fn foo() {} +} +``` + +Additionally, even though `$crate` allows a macro to refer to items within its +own crate when expanding, its use has no effect on visibility. An item or macro +referred to must still be visible from the invocation site. In the following +example, any attempt to invoke `call_foo!()` from outside its crate will fail +because `foo()` is not public. + +```rust +#[macro_export] +macro_rules! call_foo { + () => { $crate::foo() }; +} + +fn foo() {} +``` + +> **Version & Edition Differences**: Prior to Rust 1.30, `$crate` and +> `local_inner_macros` (below) were unsupported. They were added alongside +> path-based imports of macros (described above), to ensure that helper macros +> did not need to be manually imported by users of a macro-exporting crate. +> Crates written for earlier versions of Rust that use helper macros need to be +> modified to use `$crate` or `local_inner_macros` to work well with path-based +> imports. + +When a macro is exported, the `#[macro_export]` attribute can have the +`local_inner_macros` keyword added to automatically prefix all contained macro +invocations with `$crate::`. This is intended primarily as a tool to migrate +code written before `$crate` was added to the language to work with Rust 2018's +path-based imports of macros. Its use is discouraged in new code. + +```rust +#[macro_export(local_inner_macros)] +macro_rules! helped { + () => { helper!() } // Automatically converted to $crate::helper!(). +} + +#[macro_export] +macro_rules! helper { + () => { () } +} +``` + +## Follow-set Ambiguity Restrictions + +The parser used by the macro system is reasonably powerful, but it is limited in +order to prevent ambiguity in current or future versions of the language. In +particular, in addition to the rule about ambiguous expansions, a nonterminal +matched by a metavariable must be followed by a token which has been decided can +be safely used after that kind of match. + +As an example, a macro matcher like `$i:expr [ , ]` could in theory be accepted +in Rust today, since `[,]` cannot be part of a legal expression and therefore +the parse would always be unambiguous. However, because `[` can start trailing +expressions, `[` is not a character which can safely be ruled out as coming +after an expression. If `[,]` were accepted in a later version of Rust, this +matcher would become ambiguous or would misparse, breaking working code. +Matchers like `$i:expr,` or `$i:expr;` would be legal, however, because `,` and +`;` are legal expression separators. The specific rules are: + + * `expr` and `stmt` may only be followed by one of: `=>`, `,`, or `;`. + * `pat` may only be followed by one of: `=>`, `,`, `=`, `|`, `if`, or `in`. + * `path` and `ty` may only be followed by one of: `=>`, `,`, `=`, `|`, `;`, + `:`, `>`, `>>`, `[`, `{`, `as`, `where`, or a macro variable of `block` + fragment specifier. + * `vis` may only be followed by one of: `,`, an identifier other than a + non-raw `priv`, any token that can begin a type, or a metavariable with a + `ident`, `ty`, or `path` fragment specifier. + * All other fragment specifiers have no restrictions. + +When repetitions are involved, then the rules apply to every possible number of +expansions, taking separators into account. This means: + + * If the repetition includes a separator, that separator must be able to + follow the contents of the repitition. + * If the repitition can repeat multiple times (`*` or `+`), then the contents + must be able to follow themselves. + * The contents of the repetition must be able to follow whatever comes + before, and whatever comes after must be able to follow the contents of the + repitition. + * If the repitition can match zero times (`*` or `?`), then whatever comes + after must be able to follow whatever comes before. + + +For more detail, see the [formal specification]. [IDENTIFIER]: identifiers.html [IDENTIFIER_OR_KEYWORD]: identifiers.html [LIFETIME_TOKEN]: tokens.html#lifetimes-and-loop-labels +[formal specification]: macro-ambiguity.html [_BlockExpression_]: expressions/block-expr.html +[_DelimTokenTree_]: macros.html [_Expression_]: expressions.html [_Item_]: items.html [_LiteralExpression_]: expressions/literal-expr.html @@ -84,89 +443,9 @@ syntax named by _designator_. Valid designators are: [_Pattern_]: patterns.html [_Statement_]: statements.html [_TokenTree_]: macros.html#macro-invocation +[_Token_]: tokens.html [_TypePath_]: paths.html#paths-in-types [_Type_]: types.html#type-expressions [_Visibility_]: visibility-and-privacy.html [token]: tokens.html - -In the transcriber, the -designator is already known, and so only the name of a matched nonterminal comes -after the dollar sign. - -In both the matcher and transcriber, the Kleene star-like operator indicates -repetition. The Kleene star operator consists of `$` and parentheses, -optionally followed by a separator token, followed by `*`, `+`, or `?`. `*` -means zero or more repetitions; `+` means _at least_ one repetition; `?` means -at most one repetition. The parentheses are not matched or transcribed. On the -matcher side, a name is bound to _all_ of the names it matches, in a structure -that mimics the structure of the repetition encountered on a successful match. -The job of the transcriber is to sort that structure out. Also, `?`, unlike `*` -and `+`, does _not_ allow a separator, since one could never match against it -anyway. - -> **Edition Differences**: The `?` Kleene operator did not exist before the -> 2018 edition. - -> **Edition Differences**: Prior to the 2018 Edition, `?` was an allowed -> separator token, rather than a Kleene operator. It is no longer allowed as a -> separator as of the 2018 edition. This avoids ambiguity with the `?` Kleene -> operator. - -The rules for transcription of these repetitions are called "Macro By Example". -Essentially, one "layer" of repetition is discharged at a time, and all of them -must be discharged by the time a name is transcribed. Therefore, `( $( $i:ident -),* ) => ( $i )` is an invalid macro, but `( $( $i:ident ),* ) => ( $( $i:ident -),* )` is acceptable (if trivial). - -When Macro By Example encounters a repetition, it examines all of the `$` -_name_ s that occur in its body. At the "current layer", they all must repeat -the same number of times, so ` ( $( $i:ident ),* ; $( $j:ident ),* ) => ( $( -($i,$j) ),* )` is valid if given the argument `(a,b,c ; d,e,f)`, but not -`(a,b,c ; d,e)`. The repetition walks through the choices at that layer in -lockstep, so the former input transcribes to `(a,d), (b,e), (c,f)`. - -Nested repetitions are allowed. - -### Parsing limitations - -The parser used by the macro system is reasonably powerful, but the parsing of -Rust syntax is restricted in two ways: - -1. Macro definitions are required to include suitable separators after parsing - expressions and other bits of the Rust grammar. This implies that - a macro definition like `$i:expr [ , ]` is not legal, because `[` could be part - of an expression. A macro definition like `$i:expr,` or `$i:expr;` would be legal, - however, because `,` and `;` are legal separators. See [RFC 550] for more information. - Specifically: - - * `expr` and `stmt` may only be followed by one of `=>`, `,`, or `;`. - * `pat` may only be followed by one of `=>`, `,`, `=`, `|`, `if`, or `in`. - * `path` and `ty` may only be followed by one of `=>`, `,`, `=`, `|`, `;`, - `:`, `>`, `>>`, `[`, `{`, `as`, `where`, or a macro variable of `block` - fragment type. - * `vis` may only be followed by one of `,`, `priv`, a raw identifier, any - token that can begin a type, or a macro variable of `ident`, `ty`, or - `path` fragment type. - * All other fragment types have no restrictions. - -2. The parser must have eliminated all ambiguity by the time it reaches a `$` - _name_ `:` _designator_. This requirement most often affects name-designator - pairs when they occur at the beginning of, or immediately after, a `$(...)*`; - requiring a distinctive token in front can solve the problem. For example: - - ```rust - // The matcher `$($i:ident)* $e:expr` would be ambiguous because the parser - // would be forced to choose between an identifier or an expression. Use some - // token to distinguish them. - macro_rules! example { - ($(I $i:ident)* E $e:expr) => { ($($i)-*) * $e }; - } - let foo = 2; - let bar = 3; - // The following expands to `(foo - bar) * 5` - example!(I foo I bar E 5); - ``` - -[RFC 550]: https://github.com/rust-lang/rfcs/blob/master/text/0550-macro-future-proofing.md -[_DelimTokenTree_]: macros.html -[_Token_]: tokens.html +[Hygiene]: #hygiene