From 884c42977d8d6649d0e076fcecd60ba9b7dfc20c Mon Sep 17 00:00:00 2001 From: Alexis Hunt Date: Tue, 15 Jan 2019 05:22:29 -0500 Subject: [PATCH 1/4] Expand docs on Macros By Example. The primary motivation here was to increase clarity and fully address the scoping and naming details. The inclusion of RFC 550's formal specification is to move it to the reference where it can be updated. I made several changes, motivated by accommodating `?` and new fragment specifiers, but there are some other things which need highlighting so that they can be double-checked for correctness. * Permit the empty string to follow on in the first invariant; this is a technical oversight in the definition I believe. * Added a requirement that repetitions obey the follow rules; this was an oversight in the original RFC and currently planned for fix. * Rewrote the definition of FIRST for complex NTs to be more clear. * Added a case to LAST for `?` repetitions * Removed the last example of LAST, because it is wrong. * Rearranged the definition of FOLLOW to be more clear * Added Shl to FOLLOW(ty) and FOLLOW(path), as documented in the Reference already. * Added missing follow sets for newer fragment specifiers. The scoping text is probably not completely accurate, but it's certainly much better than what was there before (i.e. basically nothing). --- src/SUMMARY.md | 10 +- src/appendices.md | 1 + src/macro-ambiguity.md | 378 +++++++++++++++++++++++++++++ src/macros-by-example.md | 511 ++++++++++++++++++++++++++++++--------- 4 files changed, 779 insertions(+), 121 deletions(-) create mode 100644 src/appendices.md create mode 100644 src/macro-ambiguity.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 3a2a46ec0..6810f644b 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -115,8 +115,8 @@ - [The Rust runtime](runtime.md) -[Appendix: Influences](influences.md) - -[Appendix: As-yet-undocumented Features](undocumented.md) - -[Appendix: Glossary](glossary.md) +- [Appendices](appendices.md) + - [Macro Follow-Set Ambiguity Formal Specification](macro-ambiguity.md) + - [Influences](influences.md) + - [As-Yet-Undocumented Features](undocumented.md) + - [Glossary](glossary.md) diff --git a/src/appendices.md b/src/appendices.md new file mode 100644 index 000000000..28acb81ce --- /dev/null +++ b/src/appendices.md @@ -0,0 +1 @@ +# Appendices diff --git a/src/macro-ambiguity.md b/src/macro-ambiguity.md new file mode 100644 index 000000000..582ee684d --- /dev/null +++ b/src/macro-ambiguity.md @@ -0,0 +1,378 @@ +# Appendix: Macro Follow-Set Ambiguity Formal Specification + +This page documents the formal specification of the follow rules for [Macros +By Example]. They were originally specified in [RFC 550], from which the bulk +of this text is copied, and expanded upon in subsequent RFCs. + +## Definitions & Conventions + + - `macro`: anything invokable as `foo!(...)` in source code. + - `MBE`: macro-by-example, a macro defined by `macro_rules`. + - `matcher`: the left-hand-side of a rule in a `macro_rules` invocation, or a + subportion thereof. + - `macro parser`: the bit of code in the Rust parser that will parse the + input using a grammar derived from all of the matchers. + - `fragment`: The class of Rust syntax that a given matcher will accept (or + "match"). + - `repetition` : a fragment that follows a regular repeating pattern + - `NT`: non-terminal, the various "meta-variables" or repetition matchers + that can appear in a matcher, specified in MBE syntax with a leading `$` + character. + - `simple NT`: a "meta-variable" non-terminal (further discussion below). + - `complex NT`: a repetition matching non-terminal, specified via repetition + operators (`\*`, `+`, `?`). + - `token`: an atomic element of a matcher; i.e. identifiers, operators, + open/close delimiters, *and* simple NT's. + - `token tree`: a tree structure formed from tokens (the leaves), complex + NT's, and finite sequences of token trees. + - `delimiter token`: a token that is meant to divide the end of one fragment + and the start of the next fragment. + - `separator token`: an optional delimiter token in an complex NT that + separates each pair of elements in the matched repetition. + - `separated complex NT`: a complex NT that has its own separator token. + - `delimited sequence`: a sequence of token trees with appropriate open- and + close-delimiters at the start and end of the sequence. + - `empty fragment`: The class of invisible Rust syntax that separates tokens, + i.e. whitespace, or (in some lexical contexts), the empty token sequence. + - `fragment specifier`: The identifier in a simple NT that specifies which + fragment the NT accepts. + - `language`: a context-free language. + +Example: + +```rust,compile_fail +macro_rules! i_am_an_mbe { + (start $foo:expr $($i:ident),* end) => ($foo) +} +``` + +`(start $foo:expr $($i:ident),\* end)` is a matcher. The whole matcher is a +delimited sequence (with open- and close-delimiters `(` and `)`), and `$foo` +and `$i` are simple NT's with `expr` and `ident` as their respective fragment +specifiers. + +`$(i:ident),\*` is *also* an NT; it is a complex NT that matches a +comma-separated repetition of identifiers. The `,` is the separator token for +the complex NT; it occurs in between each pair of elements (if any) of the +matched fragment. + +Another example of a complex NT is `$(hi $e:expr ;)+`, which matches any +fragment of the form `hi ; hi ; ...` where `hi ;` occurs at +least once. Note that this complex NT does not have a dedicated separator +token. + +(Note that Rust's parser ensures that delimited sequences always occur with +proper nesting of token tree structure and correct matching of open- and +close-delimiters.) + +We will tend to use the variable "M" to stand for a matcher, variables "t" and +"u" for arbitrary individual tokens, and the variables "tt" and "uu" for +arbitrary token trees. (The use of "tt" does present potential ambiguity with +its additional role as a fragment specifier; but it will be clear from context +which interpretation is meant.) + +"SEP" will range over separator tokens, "OP" over the repetition operators +`\*`, `+`, and `?`, "OPEN"/"CLOSE" over matching token pairs surrounding a +delimited sequence (e.g. `[` and `]`). + +Greek letters "α" "β" "γ" "δ" stand for potentially empty token-tree sequences. +(However, the Greek letter "ε" (epsilon) has a special role in the presentation +and does not stand for a token-tree sequence.) + + * This Greek letter convention is usually just employed when the presence of + a sequence is a technical detail; in particular, when we wish to *emphasize* + that we are operating on a sequence of token-trees, we will use the notation + "tt ..." for the sequence, not a Greek letter. + +Note that a matcher is merely a token tree. A "simple NT", as mentioned above, +is an meta-variable NT; thus it is a non-repetition. For example, `$foo:ty` is +a simple NT but `$($foo:ty)+` is a complex NT. + +Note also that in the context of this formalism, the term "token" generally +*includes* simple NTs. + +Finally, it is useful for the reader to keep in mind that according to the +definitions of this formalism, no simple NT matches the empty fragment, and +likewise no token matches the empty fragment of Rust syntax. (Thus, the *only* +NT that can match the empty fragment is a complex NT.) This is not actually +true, because the `vis` matcher can match an empty fragment. Thus, for the +purposes of the formalism, we will treat `$v:vis` as actually being +`$($v:vis)?`, with a requirement that the matcher match an empty fragment. + +### The Matcher Invariants + +To be valid, a matcher must meet the following three invariants. The definitions +of FIRST and FOLLOW are described later. + +1. For any two successive token tree sequences in a matcher `M` (i.e. `M = ... + tt uu ...`) with `uu ...` nonempty, we must have FOLLOW(`... tt`) ∪ {ε} ⊇ + FIRST(`uu ...`). +1. For any separated complex NT in a matcher, `M = ... $(tt ...) SEP OP ...`, + we must have `SEP` ∈ FOLLOW(`tt ...`). +1. For an unseparated complex NT in a matcher, `M = ... $(tt ...) OP ...`, if + OP = `\*` or `+`, we must have FOLLOW(`tt ...`) ⊇ FIRST(`tt ...`). + +The first invariant says that whatever actual token that comes after a matcher, +if any, must be somewhere in the predetermined follow set. This ensures that a +legal macro definition will continue to assign the same determination as to +where `... tt` ends and `uu ...` begins, even as new syntactic forms are added +to the language. + +The second invariant says that a separated complex NT must use a seperator token +that is part of the predetermined follow set for the internal contents of the +NT. This ensures that a legal macro definition will continue to parse an input +fragment into the same delimited sequence of `tt ...`'s, even as new syntactic +forms are added to the language. + +The third invariant says that when we have a complex NT that can match two or +more copies of the same thing with no separation in between, it must be +permissible for them to be placed next to each other as per the first invariant. +This invariant also requires they be nonempty, which eliminates a possible +ambiguity. + +**NOTE: The third invariant is currently unenforced due to historical oversight +and significant reliance on the behaviour. It is currently undecided what to do +about this going forward. Macros that do not respect the behaviour may become +invalid in a future edition of Rust. See the [tracking issue].** + +### FIRST and FOLLOW, informally + +A given matcher M maps to three sets: FIRST(M), LAST(M) and FOLLOW(M). + +Each of the three sets is made up of tokens. FIRST(M) and LAST(M) may also +contain a distinguished non-token element ε ("epsilon"), which indicates that M +can match the empty fragment. (But FOLLOW(M) is always just a set of tokens.) + +Informally: + + * FIRST(M): collects the tokens potentially used first when matching a + fragment to M. + + * LAST(M): collects the tokens potentially used last when matching a fragment + to M. + + * FOLLOW(M): the set of tokens allowed to follow immediately after some + fragment matched by M. + + In other words: t ∈ FOLLOW(M) if and only if there exists (potentially + empty) token sequences α, β, γ, δ where: + + * M matches β, + + * t matches γ, and + + * The concatenation α β γ δ is a parseable Rust program. + +We use the shorthand ANYTOKEN to denote the set of all tokens (including simple +NTs). For example, if any token is legal after a matcher M, then FOLLOW(M) = +ANYTOKEN. + +(To review one's understanding of the above informal descriptions, the reader +at this point may want to jump ahead to the [examples of +FIRST/LAST][#examples-of-first-and-last] before reading their formal +definitions.) + +### FIRST, LAST + +Below are formal inductive definitions for FIRST and LAST. + +"A ∪ B" denotes set union, "A ∩ B" denotes set intersection, and "A \ B" +denotes set difference (i.e. all elements of A that are not present in B). + +#### FIRST + +FIRST(M) is defined by case analysis on the sequence M and the structure of its +first token-tree (if any): + + * if M is the empty sequence, then FIRST(M) = { ε }, + + * if M starts with a token t, then FIRST(M) = { t }, + + (Note: this covers the case where M starts with a delimited token-tree + sequence, `M = OPEN tt ... CLOSE ...`, in which case `t = OPEN` and thus + FIRST(M) = { `OPEN` }.) + + (Note: this critically relies on the property that no simple NT matches the + empty fragment.) + + * Otherwise, M is a token-tree sequence starting with a complex NT: `M = $( tt + ... ) OP α`, or `M = $( tt ... ) SEP OP α`, (where `α` is the (potentially + empty) sequence of token trees for the rest of the matcher). + + * Let SEP\_SET(M) = { SEP } if SEP is present and ε ∈ FIRST(`tt ...`); + otherwise SEP\_SET(M) = {}. + + * Let ALPHA\_SET(M) = FIRST(`α`) if OP = `\*` or `?` and ALPHA\_SET(M) = {} if + OP = `+`. + * FIRST(M) = (FIRST(`tt ...`) \\ {ε}) ∪ SEP\_SET(M) ∪ ALPHA\_SET(M). + +The definition for complex NTs deserves some justification. SEP\_SET(M) defines +the possibility that the separator could be a valid first token for M, which +happens when there is a separator defined and the repeated fragment could be +empty. ALPHA\_SET(M) defines the possibility that the complex NT could be empty, +meaning that M's valid first tokens are those of the following token-tree +sequences `α`. This occurs when either `\*` or `?` is used, in which case there +could be zero repetitions. In theory, this could also occur if `+` was used with +a potentially-empty repeating fragment, but this is forbidden by the third +invariant. + +From there, clearly FIRST(M) can include any token from SEP\_SET(M) or +ALPHA\_SET(M), and if the complex NT match is nonempty, then any token starting +FIRST(`tt ...`) could work too. The last piece to consider is ε. SEP\_SET(M) and +FIRST(`tt ...`) \ {ε} cannot contain ε, but ALPHA\_SET(M) could. Hence, this +definition allows M to accept ε if and only if ε ∈ ALPHA\_SET(M) does. This is +correct because for M to accept ε in the complex NT case, both the complex NT +and α must accept it. If OP = `+`, meaning that the complex NT cannot be empty, +then by definition ε ∉ ALPHA\_SET(M). Otherwise, the complex NT can accept zero +repetitions, and then ALPHA\_SET(M) = FOLLOW(`α`). So this definition is correct +with respect to \varepsilon as well. + +#### LAST + +LAST(M), defined by case analysis on M itself (a sequence of token-trees): + + * if M is the empty sequence, then LAST(M) = { ε } + + * if M is a singleton token t, then LAST(M) = { t } + + * if M is the singleton complex NT repeating zero or more times, `M = $( tt + ... ) *`, or `M = $( tt ... ) SEP *` + + * Let sep_set = { SEP } if SEP present; otherwise sep_set = {}. + + * if ε ∈ LAST(`tt ...`) then LAST(M) = LAST(`tt ...`) ∪ sep_set + + * otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt + ...`) ∪ {ε}. + + * if M is the singleton complex NT repeating one or more times, `M = $( tt ... + ) +`, or `M = $( tt ... ) SEP +` + + * Let sep_set = { SEP } if SEP present; otherwise sep_set = {}. + + * if ε ∈ LAST(`tt ...`) then LAST(M) = LAST(`tt ...`) ∪ sep_set + + * otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt + ...`) + + * if M is the singleton complex NT repeating zero or one time, `M = $( tt ...) + ?`, then LAST(M) = LAST(`tt ...`) ∪ {ε}. + + * if M is a delimited token-tree sequence `OPEN tt ... CLOSE`, then LAST(M) = + { `CLOSE` }. + + * if M is a non-empty sequence of token-trees `tt uu ...`, + + * If ε ∈ LAST(`uu ...`), then LAST(M) = LAST(`tt`) ∪ (LAST(`uu ...`) \ { ε }). + + * Otherwise, the sequence `uu ...` must be non-empty; then LAST(M) = + LAST(`uu ...`). + +### Examples of FIRST and LAST +[examples-of-first-and-last]: #examples-of-first-and-last + +Below are some examples of FIRST and LAST. +(Note in particular how the special ε element is introduced and +eliminated based on the interation between the pieces of the input.) + +Our first example is presented in a tree structure to elaborate on how +the analysis of the matcher composes. (Some of the simpler subtrees +have been elided.) + +```text +INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g + ~~~~~~~~ ~~~~~~~ ~ + | | | +FIRST: { $d:ident } { $e:expr } { h } + + +INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ + ~~~~~~~~~~~~~~~~~~ ~~~~~~~ ~~~ + | | | +FIRST: { $d:ident } { h, ε } { f } + +INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~ ~ + | | | | +FIRST: { $d:ident, ε } { h, ε, ; } { f } { g } + + +INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + | +FIRST: { $d:ident, h, ;, f } +``` + +Thus: + + * FIRST(`$($d:ident $e:expr );* $( $(h)* );* $( f ;)+ g`) = { `$d:ident`, `h`, `;`, `f` } + +Note however that: + + * FIRST(`$($d:ident $e:expr );* $( $(h)* );* $($( f ;)+ g)*`) = { `$d:ident`, `h`, `;`, `f`, ε } + +Here are similar examples but now for LAST. + + * LAST(`$d:ident $e:expr`) = { `$e:expr` } + * LAST(`$( $d:ident $e:expr );*`) = { `$e:expr`, ε } + * LAST(`$( $d:ident $e:expr );* $(h)*`) = { `$e:expr`, ε, `h` } + * LAST(`$( $d:ident $e:expr );* $(h)* $( f ;)+`) = { `;` } + * LAST(`$( $d:ident $e:expr );* $(h)* $( f ;)+ g`) = { `g` } + +### FOLLOW(M) + +Finally, the definition for FOLLOW(M) is built up as follows. pat, expr, etc. +represent simple nonterminals with the given fragment specifier. + + * FOLLOW(pat) = {`=>`, `,`, `=`, `|`, `if`, `in`}`. + + * FOLLOW(expr) = FOLLOW(stmt) = {`=>`, `,`, `;`}`. + + * FOLLOW(ty) = FOLLOW(path) = {`{`, `[`, `,`, `=>`, `:`, `=`, `>`, `>>`, `;`, + `|`, `as`, `where`, block nonterminals}. + + * FOLLOW(vis) = {`,`l any keyword or identifier except a non-raw `priv`; any + token that can begin a type; ident, ty, and path nonterminals}. + + * FOLLOW(t) = ANYTOKEN for any other simple token, including block, ident, + tt, item, lifetime, literal and meta simple nonterminals, and all terminals. + + * FOLLOW(M), for any other M, is defined as the intersection, as t ranges over + (LAST(M) \ {ε}), of FOLLOW(t). + +The tokens that can begin a type are, as of this writing, {`(`, `[`, `!`, `\*`, +`&`, `&&`, `?`, lifetimes, `>`, `>>`, `::`, any non-keyword identifier, `super`, +`self`, `Self`, `extern`, `crate`, `$crate`, `_`, `for`, `impl`, `fn`, `unsafe`, +`typeof`, `dyn`}, although this list may not be complete because people won't +always remember to update the appendix when new ones are added. + +Examples of FOLLOW for complex M: + + * FOLLOW(`$( $d:ident $e:expr )\*`) = FOLLOW(`$e:expr`) + * FOLLOW(`$( $d:ident $e:expr )\* $(;)\*`) = FOLLOW(`$e:expr`) ∩ ANYTOKEN = FOLLOW(`$e:expr`) + * FOLLOW(`$( $d:ident $e:expr )\* $(;)\* $( f |)+`) = ANYTOKEN + +### Examples of valid and invalid matchers + +With the above specification in hand, we can present arguments for +why particular matchers are legal and others are not. + + * `($ty:ty < foo ,)` : illegal, because FIRST(`< foo ,`) = { `<` } ⊈ FOLLOW(`ty`) + + * `($ty:ty , foo <)` : legal, because FIRST(`, foo <`) = { `,` } is ⊆ FOLLOW(`ty`). + + * `($pa:pat $pb:pat $ty:ty ,)` : illegal, because FIRST(`$pb:pat $ty:ty ,`) = { `$pb:pat` } ⊈ FOLLOW(`pat`), and also FIRST(`$ty:ty ,`) = { `$ty:ty` } ⊈ FOLLOW(`pat`). + + * `( $($a:tt $b:tt)* ; )` : legal, because FIRST(`$b:tt`) = { `$b:tt` } is ⊆ FOLLOW(`tt`) = ANYTOKEN, as is FIRST(`;`) = { `;` }. + + * `( $($t:tt),* , $(t:tt),* )` : legal, (though any attempt to actually use this macro will signal a local ambguity error during expansion). + + * `($ty:ty $(; not sep)* -)` : illegal, because FIRST(`$(; not sep)* -`) = { `;`, `-` } is not in FOLLOW(`ty`). + + * `($($ty:ty)-+)` : illegal, because separator `-` is not in FOLLOW(`ty`). + + * `($($e:expr)*)` : illegal, because expr NTs are not in FOLLOW(expr NT). + +[Macros by Example]: macros-by-example.html +[RFC 550]: https://github.com/rust-lang/rfcs/blob/master/text/0550-macro-future-proofing.html +[tracking issue]: https://github.com/rust-lang/rust/issues/56575 diff --git a/src/macros-by-example.md b/src/macros-by-example.md index 8156114a1..fe35d1844 100644 --- a/src/macros-by-example.md +++ b/src/macros-by-example.md @@ -24,20 +24,17 @@ >       [_Token_]_except $ and delimiters_\ >    | _MacroMatcher_\ >    | `$` [IDENTIFIER] `:` _MacroFragSpec_\ ->    | `$` `(` _MacroMatch_+ `)` _MacroRepSep_? _MacroKleeneOp_ +>    | `$` `(` _MacroMatch_+ `)` _MacroRepSep_? _MacroRepOp_ > > _MacroFragSpec_ :\ >       `block` | `expr` | `ident` | `item` | `lifetime` | `literal`\ >    | `meta` | `pat` | `path` | `stmt` | `tt` | `ty` | `vis` > > _MacroRepSep_ :\ ->    [_Token_]_except delimiters and kleene operators_ +>    [_Token_]_except delimiters and repetition operators_ > -> _MacroKleeneOp_2015 :\ ->    `*` | `+` -> -> _MacroKleeneOp_2018+ :\ ->    `*` | `+` | `?` +> _MacroRepOp_2018+ :\ +>    `*` | `+` | `?`2018+ > > _MacroTranscriber_ :\ >    [_DelimTokenTree_] @@ -45,38 +42,400 @@ `macro_rules` allows users to define syntax extension in a declarative way. We call such extensions "macros by example" or simply "macros". -Macros can expand to expressions, statements, items, types, or patterns. - -The macro expander looks up macro invocations by name, and tries each macro -rule in turn. It transcribes the first successful match. Matching and -transcription are closely related to each other, and we will describe them -together. - -The macro expander matches and transcribes every token that does not begin with -a `$` literally, including delimiters. For parsing reasons, delimiters must be -balanced, but they are otherwise not special. - -In the matcher, `$` _name_ `:` _designator_ matches the nonterminal in the Rust -syntax named by _designator_. Valid designators are: - -* `item`: an [_Item_] -* `block`: a [_BlockExpression_] -* `stmt`: a [_Statement_] without the trailing semicolon -* `pat`: a [_Pattern_] -* `expr`: an [_Expression_] -* `ty`: a [_Type_] -* `ident`: an [IDENTIFIER_OR_KEYWORD] -* `path`: a [_TypePath_] style path -* `tt`: a [_TokenTree_] (a single [token] or tokens in matching delimiters `()`, `[]`, or `{}`) -* `meta`: a [_MetaItem_], the contents of an attribute -* `lifetime`: a [LIFETIME_TOKEN] -* `vis`: a [_Visibility_] qualifier -* `literal`: matches `-`?[_LiteralExpression_] +Each macro by example has a name, and one or more _rules_. Each rule has two +parts: a _matcher_, describing the syntax that it matches, and a _transcriber_, +describing the syntax that will replace a successfully matched invocation. Both +the matcher and the transcriber must be surrounded by delimiters. Macros can +expand to expressions, statements, items (including traits, impls, and foreign +items), types, or patterns. + +When a macro is invoked, the macro expander looks up macro invocations by name, +and tries each macro rule in turn. It transcribes the first successful match; if +this results in an error, then future matches are not tried. When matching, no +lookahead is performed; if the compiler cannot unambiguously determine how to +parse the macro invocation one token at a time, then it is an error. In the +following example, the compiler does not look ahead past the identifier to see +if the following token is a `)`, even though that would allow it to parse the +invocation unambiguously: + +```rust,compile_fail +macro_rules! ambiguity { + ($($i:ident)* $j:ident) => { ($($i)-*) * $j }; +} + +ambiguity!(error); // Error: local ambiguity +``` + +In both the matcher and the transcriber, the `$` token is used to invoke special +behaviours from the macro engine. Tokens that aren't part of such an invocation +are matched and transcribed literally, with one exception. The exception is that +the outer delimiters for the matcher will match any pair of delimiters. Thus, +for instance, the matcher `(())` will match `{()}` but not `{{}}`. The character +`$` cannot be matched or transcribed literally. + +## Metavariables + +In the matcher, `$` _name_ `:` _fragment-specifier_ matches a Rust syntax +fragment of the kind specified and binds it to the metavariable `$`_name_. Valid +fragment specifiers are: + + * `item`: an [_Item_] + * `block`: a [_BlockExpression_] + * `stmt`: a [_Statement_] without the trailing semicolon (except for item + statements that require semicolons) + * `pat`: a [_Pattern_] + * `expr`: an [_Expression_] + * `ty`: a [_Type_] + * `ident`: an [IDENTIFIER_OR_KEYWORD] + * `path`: a [_TypePath_] style path + * `tt`: a [_TokenTree_] (a single [token] or tokens in matching delimiters `()`, `[]`, or `{}`) + * `meta`: a [_MetaItem_], the contents of an attribute + * `lifetime`: a [LIFETIME_TOKEN] + * `vis`: a possibly empty [_Visibility_] qualifier + * `literal`: matches `-`?[_LiteralExpression_] + +In the transcriber, metavariables are referred to simply by $`_name_`, since +the fragment kind is specified in the matcher. Metavariables are replaced with +the syntax element that matched them. The keyword metavariable `$crate` can be +used to refer to the current crate; see [Hygiene] below. Metavariables can be +transcribed more than once or not at all. + +## Repititions + +In both the matcher and transcriber, repetitions are indicated by placing the +tokens to be repeated inside `$( ... )`, followed by a repetition operator, +optionally with a separator token between. The separator token can be any token +other than a delimiter or one of the repetition operators, but `;` and `,` are +the most common. For instance, `$( $i:ident ),*` represents any number of +identifiers separated by commas. Nested repititions are permitted. + +The repetition operators are `*`, which indicates any number of repetitions, +`+`, which indicates any number but at least one, and `?` which indicates an +optional fragment with zero or one occurrences. Since `?` represents at most one +occurrence, it cannot be used with a separator. + +The repeated fragment both matches and transcribes to the specified number of +the fragment, separated by the separator token. Metavariables are matched to +every repetition of their corresponding fragment. For instance, the `$( $i:ident +),*` example above matches `$i` to all of the identifiers in the list. + +During transcription, additional restrictions apply to repititions so that the +compiler knows how to expand them properly: + +1. A metavariable must appear in exactly the same number, kind, and nesting + order of repetitions in the transcriber as it did in the matcher. So for the + matcher `$( $i:ident ),*`, the transcribers `=> $i`, `=> $( $( $i)* )*`, and + `=> $( $i )+` are all illegal, but `=> { $( $i );* }` is correct and + replaces a comma-separated list of identifiers with a semicolon-separated + list. +1. Second, each repetition in the transcriber must contain at least one + metavariable to decide now many times to expand it. If multiple + metavariables appear in the same repetition, they must be bound to the same + number of fragments. For instance, `( $( $i:ident ),* ; $( $j:ident ),* ) => + ( $( ($i,$j) ),*` must bind the same number of `$i` fragments as `$j` + fragments. This means that invoking the macro with `(a, b, c; d, e, f`) is + legal and expands to `((a,d), (b,e), c,f))`, but `(a, b, c; d, e)` is + illegal because it does not have the same number. This requirement applies + to every layer of nested repetitions. + +> **Edition Differences**: The `?` repetition operator did not exist before the +> 2018 edition. Prior to the 2018 Edition, `?` was an allowed +> separator token, rather than a repetition operator. + +## Scoping, Exporting, and Importing + +For historical reasons, the scoping of macros by example does not work entirely like +items. Macros have two forms of scope: textual scope, and path-based scope. +Textual scope is based on the order that things appear in source files, or even +across multiple files, and is the default scoping. It's explained further below. +Path-based scope works exactly the same way that item scoping does. The scoping, +exporting, and importing of macros is controlled largely by attributes. + +When a macro is invoked by an unqualified identifier (not part of a multi-part +path), it's first looked up in textual scoping. If this does not yield any +results, then it is looked up in path-based scoping. If the macro's name is +qualified with a path, then it is only looked up in path-based scoping. + +```rust,ignore +use lazy_static::lazy_static; // Path-based import. + +macro_rules! lazy_static { // Textual definition. + (lazy) => {}; +} + +lazy_static!{lazy} // Textual lookup finds our macro first. +self::lazy_static!{} // Path-based lookup ignores our macro, finds imported one. +``` + +### Textual Scope + +Textual scope is based largely on the order that things appear in source files, +and works similarly to the scope of local variables declared with `let` except +it also applies at the module level. When `macro_rules!` is used to define a +macro, the macro enters the scope after the definition (note that it can still +be used recursively, since names are looked up from the invocation site), up +until its surrounding scope, typically a module, is closed. This can enter child +modules and even span across multiple files: + +```rust,ignore +//// src/lib.rs +mod has_macro { + // m!{} // Error: m is not in scope. + + macro_rules! m { + () => {}; + } + m!{} // OK: appears after declaration of m. + + mod uses_macro; +} + +// m!{} // Error: m is not in scope. + +//// src/has_macro/uses_macro.rs + +m!{} // OK: appears after delcaration of m in src/lib.rs +``` + +It is not an error to define a macro multiple times; the most recent declaration +will shadow the previous one unless it has gone out of scope. + +```rust +macro_rules! m { + (1) => {}; +} + +m!(1); + +mod inner { + m!(1); + + macro_rules! m { + (2) => {}; + } + // m!(1); // Error: no rule matches '1' + m!(2); + + macro_rules! m { + (3) => {}; + } + m!(3); +} + +m!(1); +``` + +Macros can be declared and used locally inside functions as well, and work +similarly: + +```rust +fn foo() { + // m!(); // Error: m is not in scope. + macro_rules! m { + () => {}; + } + m!(); +} + + +// m!(); // Error: m is not in scope. +``` + +The `#[macro_use]` attribute has two purposes. First, it can be used to make a +module's macro scope not end when the module is closed, by applying it to a +module: + +```rust +#[macro_use] +mod inner { + macro_rules! m { + () => {}; + } +} + +m!(); +``` + +Second, it can be used to import macros from another crate, by attaching it to +an `extern crate` declaration appearing in the crate's root module. Macros +imported this way are imported into the prelude of the crate, not textually, +which means that they can be shadowed by any other name. While macros imported +by `#[macro_use]` can be used before the import statement, in case of a +conflict, the last macro imported wins. Optionally, a list of macros to import +can be specified; this is not supported when `#[macro_use]` is applied to a +module. + +```rust,ignore +#[macro_use(lazy_static)] // Or #[macro_use] to import all macros. +extern crate lazy_static; + +lazy_static!{}; +// self::lazy_static!{} // Error: lazy_static is not defined inself +``` + +Macros to be imported with `#[macro_use]` must be exported with +`#[macro_export]`, which is described below. + +### Path-Based Scope + +By default, a macro has no path-based scope. However, if it has the +`#[macro_export]` attribute, then it is declared in the crate root scope and can +be referred to normally as such: + +```rust +self::m!(); +m!(); // OK: Path-based lookup finds m in the current module. + +mod inner { + super::m!(); + crate::m!(); +} + +mod mac { + #[macro_export] + macro_rules! m { + () => {}; + } +} +``` + +Macros labeled with `#[macro_export]` are always `pub` and can be referred to +by other crates, either by path or by `#[macro_use]` as described above. + +## Hygiene + +By default, all identifiers referred to in a macro are expanded as-is, and are +looked up at the macro's invocation site. This can lead to issues if a macro +refers to an item or macro which isn't in scope at the invocation site. To +alleviate this, the `$crate` metavariable can be used at the start of a path to +force lookup to occur inside the crate defining the macro. + +```rust,ignore +//// Definitions in the `helper_macro` crate. +#[macro_export] +macro_rules! helped { + // () => { helper!() } // This might lead to an error due to 'helper' not being in scope. + () => { $crate::helper!() } +} + +#[macro_export] +macro_rules! helper { + () => { () } +} + +//// Usage in another crate. +// Note that `helper_macro::helper` is not imported! +use helper_macro::helped; + +fn unit() { + helped!(); +} +``` + +Note that, because `$crate` refers to the current crate, it must be used with a +fully qualified module path when referring to non-macro items: + +```rust +pub mod inner { + #[macro_export] + macro_rules! call_foo { + () => { $crate::inner::foo() }; + } + + pub fn foo() {} +} +``` + +Additionally, even though `$crate` allows a macro to refer to items within its +own crate when expanding, its use has no effect on visibility. An item or macro +referred to must still be visible from the invocation site. In the following +example, any attempt to invoke `call_foo!()` from outside its crate will fail +because `foo()` is not public. + +```rust +#[macro_export] +macro_rules! call_foo { + () => { $crate::foo() }; +} + +fn foo() {} +``` + +> **Version & Edition Differences**: Prior to Rust 1.30, `$crate` and +> `local_inner_macros` (below) were unsupported. They were added alongside +> path-based imports of macros (described above), to ensure that helper macros +> did not need to be manually imported by users of a macro-exporting crate. +> Crates written for earlier versions of Rust that use helper macros need to be +> modified to use `$crate` or `local_inner_macros` to work well with path-based +> imports. + +When a macro is exported, the `#[macro_export]` attribute can have the +`local_inner_macros` keyword added to automatically prefix all contained macro +invocations with `$crate::`. This is intended primarily as a tool to migrate +code written before `$crate` was added to the language to work with Rust 2018's +path-based imports of macros. Its use is discouraged in new code. + +```rust +#[macro_export(local_inner_macros)] +macro_rules! helped { + () => { helper!() } // Automatically converted to $crate::helper!(). +} + +#[macro_export] +macro_rules! helper { + () => { () } +} +``` + +## Follow-set Ambiguity Restrictions + +The parser used by the macro system is reasonably powerful, but it is limited in +order to prevent ambiguity in current or future versions of the language. In +particular, in addition to the rule about ambiguous expansions, a nonterminal +matched by a metavariable must be followed by a token which has been decided can +be safely used after that kind of match. + +As an example, a macro matcher like `$i:expr [ , ]` could in theory be accepted +in Rust today, since `[,]` cannot be part of a legal expression and therefore +the parse would always be unambiguous. However, because `[` can start trailing +expressions, `[` is not a character which can safely be ruled out as coming +after an expression. If `[,]` were accepted in a later version of Rust, this +matcher would become ambiguous or would misparse, breaking working code. +Matchers like `$i:expr,` or `$i:expr;` would be legal, however, because `,` and +`;` are legal expression separators. The specific rules are: + + * `expr` and `stmt` may only be followed by one of: `=>`, `,`, or `;`. + * `pat` may only be followed by one of: `=>`, `,`, `=`, `|`, `if`, or `in`. + * `path` and `ty` may only be followed by one of: `=>`, `,`, `=`, `|`, `;`, + `:`, `>`, `>>`, `[`, `{`, `as`, `where`, or a macro variable of `block` + fragment specifier. + * `vis` may only be followed by one of: `,`, an identifier other than a + non-raw `priv`, any token that can begin a type, or a metavariable with a + `ident`, `ty`, or `path` fragment specifier. + * All other fragment specifiers have no restrictions. + +When repetitions are involved, then the rules apply to every possible number of +expansions, taking separators into account. This means: + + * If the repetition includes a separator, that separator must be able to + follow the contents of the repitition. + * If the repitition can repeat multiple times (`*` or `+`), then the contents + must be able to follow themselves. + * The contents of the repetition must be able to follow whatever comes + before, and whatever comes after must be able to follow the contents of the + repitition. + * If the repitition can match zero times (`*` or `?`), then whatever comes + after must be able to follow whatever comes before. + + +For more detail, see the [formal specification]. [IDENTIFIER]: identifiers.html [IDENTIFIER_OR_KEYWORD]: identifiers.html [LIFETIME_TOKEN]: tokens.html#lifetimes-and-loop-labels +[formal specification]: macro-ambiguity.html [_BlockExpression_]: expressions/block-expr.html +[_DelimTokenTree_]: macros.html [_Expression_]: expressions.html [_Item_]: items.html [_LiteralExpression_]: expressions/literal-expr.html @@ -84,89 +443,9 @@ syntax named by _designator_. Valid designators are: [_Pattern_]: patterns.html [_Statement_]: statements.html [_TokenTree_]: macros.html#macro-invocation +[_Token_]: tokens.html [_TypePath_]: paths.html#paths-in-types [_Type_]: types.html#type-expressions [_Visibility_]: visibility-and-privacy.html [token]: tokens.html - -In the transcriber, the -designator is already known, and so only the name of a matched nonterminal comes -after the dollar sign. - -In both the matcher and transcriber, the Kleene star-like operator indicates -repetition. The Kleene star operator consists of `$` and parentheses, -optionally followed by a separator token, followed by `*`, `+`, or `?`. `*` -means zero or more repetitions; `+` means _at least_ one repetition; `?` means -at most one repetition. The parentheses are not matched or transcribed. On the -matcher side, a name is bound to _all_ of the names it matches, in a structure -that mimics the structure of the repetition encountered on a successful match. -The job of the transcriber is to sort that structure out. Also, `?`, unlike `*` -and `+`, does _not_ allow a separator, since one could never match against it -anyway. - -> **Edition Differences**: The `?` Kleene operator did not exist before the -> 2018 edition. - -> **Edition Differences**: Prior to the 2018 Edition, `?` was an allowed -> separator token, rather than a Kleene operator. It is no longer allowed as a -> separator as of the 2018 edition. This avoids ambiguity with the `?` Kleene -> operator. - -The rules for transcription of these repetitions are called "Macro By Example". -Essentially, one "layer" of repetition is discharged at a time, and all of them -must be discharged by the time a name is transcribed. Therefore, `( $( $i:ident -),* ) => ( $i )` is an invalid macro, but `( $( $i:ident ),* ) => ( $( $i:ident -),* )` is acceptable (if trivial). - -When Macro By Example encounters a repetition, it examines all of the `$` -_name_ s that occur in its body. At the "current layer", they all must repeat -the same number of times, so ` ( $( $i:ident ),* ; $( $j:ident ),* ) => ( $( -($i,$j) ),* )` is valid if given the argument `(a,b,c ; d,e,f)`, but not -`(a,b,c ; d,e)`. The repetition walks through the choices at that layer in -lockstep, so the former input transcribes to `(a,d), (b,e), (c,f)`. - -Nested repetitions are allowed. - -### Parsing limitations - -The parser used by the macro system is reasonably powerful, but the parsing of -Rust syntax is restricted in two ways: - -1. Macro definitions are required to include suitable separators after parsing - expressions and other bits of the Rust grammar. This implies that - a macro definition like `$i:expr [ , ]` is not legal, because `[` could be part - of an expression. A macro definition like `$i:expr,` or `$i:expr;` would be legal, - however, because `,` and `;` are legal separators. See [RFC 550] for more information. - Specifically: - - * `expr` and `stmt` may only be followed by one of `=>`, `,`, or `;`. - * `pat` may only be followed by one of `=>`, `,`, `=`, `|`, `if`, or `in`. - * `path` and `ty` may only be followed by one of `=>`, `,`, `=`, `|`, `;`, - `:`, `>`, `>>`, `[`, `{`, `as`, `where`, or a macro variable of `block` - fragment type. - * `vis` may only be followed by one of `,`, `priv`, a raw identifier, any - token that can begin a type, or a macro variable of `ident`, `ty`, or - `path` fragment type. - * All other fragment types have no restrictions. - -2. The parser must have eliminated all ambiguity by the time it reaches a `$` - _name_ `:` _designator_. This requirement most often affects name-designator - pairs when they occur at the beginning of, or immediately after, a `$(...)*`; - requiring a distinctive token in front can solve the problem. For example: - - ```rust - // The matcher `$($i:ident)* $e:expr` would be ambiguous because the parser - // would be forced to choose between an identifier or an expression. Use some - // token to distinguish them. - macro_rules! example { - ($(I $i:ident)* E $e:expr) => { ($($i)-*) * $e }; - } - let foo = 2; - let bar = 3; - // The following expands to `(foo - bar) * 5` - example!(I foo I bar E 5); - ``` - -[RFC 550]: https://github.com/rust-lang/rfcs/blob/master/text/0550-macro-future-proofing.md -[_DelimTokenTree_]: macros.html -[_Token_]: tokens.html +[Hygiene]: #hygiene From fc68e62b8ca33bdf112f5e937a987ae747dec1a2 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Mon, 11 Mar 2019 16:47:13 -0700 Subject: [PATCH 2/4] Update for review comments and minor additions. - Address review comments. - Minor typo and formatting fixes. - Link `macro_use` and `macro_export` to their new home. --- src/attributes.md | 13 ++--- src/items/extern-crates.md | 2 +- src/items/modules.md | 3 +- src/macro-ambiguity.md | 6 +- src/macros-by-example.md | 109 ++++++++++++++++++++++++++----------- 5 files changed, 89 insertions(+), 44 deletions(-) diff --git a/src/attributes.md b/src/attributes.md index 8117ad847..414cca42f 100644 --- a/src/attributes.md +++ b/src/attributes.md @@ -175,15 +175,10 @@ which can be used to control type layout. ## Macro-related attributes -- `macro_use` on a `mod` — macros defined in this module will be visible in the - module's parent, after this module has been included. +- [`macro_use`] — Expands macro visibility, or imports macros from other + crates. -- `macro_use` on an `extern crate` — load macros from this crate. An optional - list of names `#[macro_use(foo, bar)]` restricts the import to just those - macros named. The `extern crate` must appear at the crate root, not inside - `mod`, which ensures proper function of the `$crate` macro variable. - -- `macro_export` - export a `macro_rules` macro for cross-crate usage. +- [`macro_export`] — Exports a `macro_rules` macro for cross-crate usage. - `no_link` on an `extern crate` — even if we load this crate for macros, don't link it into the output. @@ -634,3 +629,5 @@ pub fn f() {} [`meta` macro fragment specifier]: macros-by-example.html [`used`]: abi.html#the-used-attribute [`panic_handler`]: runtime.html#the-panic_handler-attribute +[`macro_use`]: macros-by-example.html#the-macro_use-attribute +[`macro_export`]: macros-by-example.html#path-based-scope diff --git a/src/items/extern-crates.md b/src/items/extern-crates.md index 8648fb47b..69c7bde4a 100644 --- a/src/items/extern-crates.md +++ b/src/items/extern-crates.md @@ -98,7 +98,7 @@ into the macro-use prelude. [IDENTIFIER]: identifiers.html [RFC 940]: https://github.com/rust-lang/rfcs/blob/master/text/0940-hyphens-considered-harmful.md -[`#[macro_use]` attribute]: attributes.html#macro-related-attributes +[`#[macro_use]` attribute]: macros-by-example.html#the-macro_use-attribute [`alloc`]: https://doc.rust-lang.org/alloc/ [`crate::`]: paths.html#crate [`no_implicit_prelude`]: items/modules.html#prelude-items diff --git a/src/items/modules.md b/src/items/modules.md index f22b69bff..b12f915a2 100644 --- a/src/items/modules.md +++ b/src/items/modules.md @@ -123,7 +123,7 @@ mod thread { ## Prelude Items Modules implicitly have some names in scope. These name are to built-in types, -macros imported with `#[macro_use]` on an extern crate, and by the crate's +macros imported with [`#[macro_use]`] on an extern crate, and by the crate's [prelude]. These names are all made of a single identifier. These names are not part of the module, so for example, any name `name`, `self::name` is not a valid path. The names added by the [prelude] can be removed by placing the @@ -142,6 +142,7 @@ The built-in attributes that have meaning on a function are [`cfg`], [_InnerAttribute_]: attributes.html [_Item_]: items.html [_OuterAttribute_]: attributes.html +[`#[macro_use]`]: macros-by-example.html#the-macro_use-attribute [`cfg`]: conditional-compilation.html [`deprecated`]: attributes.html#deprecation [`doc`]: attributes.html#documentation diff --git a/src/macro-ambiguity.md b/src/macro-ambiguity.md index 582ee684d..de0b6ae28 100644 --- a/src/macro-ambiguity.md +++ b/src/macro-ambiguity.md @@ -118,7 +118,7 @@ legal macro definition will continue to assign the same determination as to where `... tt` ends and `uu ...` begins, even as new syntactic forms are added to the language. -The second invariant says that a separated complex NT must use a seperator token +The second invariant says that a separated complex NT must use a separator token that is part of the predetermined follow set for the internal contents of the NT. This ensures that a legal macro definition will continue to parse an input fragment into the same delimited sequence of `tt ...`'s, even as new syntactic @@ -273,7 +273,7 @@ LAST(M), defined by case analysis on M itself (a sequence of token-trees): Below are some examples of FIRST and LAST. (Note in particular how the special ε element is introduced and -eliminated based on the interation between the pieces of the input.) +eliminated based on the interaction between the pieces of the input.) Our first example is presented in a tree structure to elaborate on how the analysis of the matcher composes. (Some of the simpler subtrees @@ -365,7 +365,7 @@ why particular matchers are legal and others are not. * `( $($a:tt $b:tt)* ; )` : legal, because FIRST(`$b:tt`) = { `$b:tt` } is ⊆ FOLLOW(`tt`) = ANYTOKEN, as is FIRST(`;`) = { `;` }. - * `( $($t:tt),* , $(t:tt),* )` : legal, (though any attempt to actually use this macro will signal a local ambguity error during expansion). + * `( $($t:tt),* , $(t:tt),* )` : legal, (though any attempt to actually use this macro will signal a local ambiguity error during expansion). * `($ty:ty $(; not sep)* -)` : illegal, because FIRST(`$(; not sep)* -`) = { `;`, `-` } is not in FOLLOW(`ty`). diff --git a/src/macros-by-example.md b/src/macros-by-example.md index fe35d1844..24f506509 100644 --- a/src/macros-by-example.md +++ b/src/macros-by-example.md @@ -49,6 +49,8 @@ the matcher and the transcriber must be surrounded by delimiters. Macros can expand to expressions, statements, items (including traits, impls, and foreign items), types, or patterns. +## Transcription + When a macro is invoked, the macro expander looks up macro invocations by name, and tries each macro rule in turn. It transcribes the first successful match; if this results in an error, then future matches are not tried. When matching, no @@ -60,19 +62,56 @@ invocation unambiguously: ```rust,compile_fail macro_rules! ambiguity { - ($($i:ident)* $j:ident) => { ($($i)-*) * $j }; + ($($i:ident)* $j:ident) => { }; } ambiguity!(error); // Error: local ambiguity ``` In both the matcher and the transcriber, the `$` token is used to invoke special -behaviours from the macro engine. Tokens that aren't part of such an invocation -are matched and transcribed literally, with one exception. The exception is that -the outer delimiters for the matcher will match any pair of delimiters. Thus, -for instance, the matcher `(())` will match `{()}` but not `{{}}`. The character +behaviours from the macro engine (described below in [Metavariables] and +[Repetitions]). Tokens that aren't part of such an invocation are matched and +transcribed literally, with one exception. The exception is that the outer +delimiters for the matcher will match any pair of delimiters. Thus, for +instance, the matcher `(())` will match `{()}` but not `{{}}`. The character `$` cannot be matched or transcribed literally. +When forwarding a matched fragment to another macro-by-example, matchers in +the second macro will see an opaque AST of the fragment type. The second macro +can't use literal tokens to match the fragments in the matcher, only a +fragment specifier of the same type. The `ident`, `lifetime`, and `tt` +fragment types are an exception, and can be matched by literal tokens. The +following illustrates this restriction: + +```rust,compile_fail +macro_rules! foo { + ($l:expr) => { bar!($l); } +// ERROR: ^^ no rules expected this token in macro call +} + +macro_rules! bar { + (3) => {} +} + +foo!(3); +``` + +The following illustrates how tokens can be directly matched after matching a +`tt` fragment: + +```rust +// compiles OK +macro_rules! foo { + ($l:tt) => { bar!($l); } +} + +macro_rules! bar { + (3) => {} +} + +foo!(3); +``` + ## Metavariables In the matcher, `$` _name_ `:` _fragment-specifier_ matches a Rust syntax @@ -94,40 +133,44 @@ fragment specifiers are: * `vis`: a possibly empty [_Visibility_] qualifier * `literal`: matches `-`?[_LiteralExpression_] -In the transcriber, metavariables are referred to simply by $`_name_`, since +In the transcriber, metavariables are referred to simply by `$`_name_, since the fragment kind is specified in the matcher. Metavariables are replaced with the syntax element that matched them. The keyword metavariable `$crate` can be used to refer to the current crate; see [Hygiene] below. Metavariables can be transcribed more than once or not at all. -## Repititions +## Repetitions In both the matcher and transcriber, repetitions are indicated by placing the -tokens to be repeated inside `$( ... )`, followed by a repetition operator, +tokens to be repeated inside `$(`…`)`, followed by a repetition operator, optionally with a separator token between. The separator token can be any token other than a delimiter or one of the repetition operators, but `;` and `,` are the most common. For instance, `$( $i:ident ),*` represents any number of -identifiers separated by commas. Nested repititions are permitted. +identifiers separated by commas. Nested repetitions are permitted. + +The repetition operators are: -The repetition operators are `*`, which indicates any number of repetitions, -`+`, which indicates any number but at least one, and `?` which indicates an -optional fragment with zero or one occurrences. Since `?` represents at most one -occurrence, it cannot be used with a separator. +- `*` — indicates any number of repetitions. +- `+` — indicates any number but at least one. +- `?` — indicates an optional fragment with zero or one occurrences. + +Since `?` represents at most one occurrence, it cannot be used with a +separator. The repeated fragment both matches and transcribes to the specified number of the fragment, separated by the separator token. Metavariables are matched to every repetition of their corresponding fragment. For instance, the `$( $i:ident ),*` example above matches `$i` to all of the identifiers in the list. -During transcription, additional restrictions apply to repititions so that the +During transcription, additional restrictions apply to repetitions so that the compiler knows how to expand them properly: 1. A metavariable must appear in exactly the same number, kind, and nesting order of repetitions in the transcriber as it did in the matcher. So for the - matcher `$( $i:ident ),*`, the transcribers `=> $i`, `=> $( $( $i)* )*`, and - `=> $( $i )+` are all illegal, but `=> { $( $i );* }` is correct and - replaces a comma-separated list of identifiers with a semicolon-separated - list. + matcher `$( $i:ident ),*`, the transcribers `=> { $i }`, + `=> { $( $( $i)* )* }`, and `=> { $( $i )+ }` are all illegal, but + `=> { $( $i );* }` is correct and replaces a comma-separated list of + identifiers with a semicolon-separated list. 1. Second, each repetition in the transcriber must contain at least one metavariable to decide now many times to expand it. If multiple metavariables appear in the same repetition, they must be bound to the same @@ -147,12 +190,12 @@ compiler knows how to expand them properly: For historical reasons, the scoping of macros by example does not work entirely like items. Macros have two forms of scope: textual scope, and path-based scope. Textual scope is based on the order that things appear in source files, or even -across multiple files, and is the default scoping. It's explained further below. +across multiple files, and is the default scoping. It is explained further below. Path-based scope works exactly the same way that item scoping does. The scoping, exporting, and importing of macros is controlled largely by attributes. When a macro is invoked by an unqualified identifier (not part of a multi-part -path), it's first looked up in textual scoping. If this does not yield any +path), it is first looked up in textual scoping. If this does not yield any results, then it is looked up in path-based scoping. If the macro's name is qualified with a path, then it is only looked up in path-based scoping. @@ -194,7 +237,7 @@ mod has_macro { //// src/has_macro/uses_macro.rs -m!{} // OK: appears after delcaration of m in src/lib.rs +m!{} // OK: appears after declaration of m in src/lib.rs ``` It is not an error to define a macro multiple times; the most recent declaration @@ -241,7 +284,9 @@ fn foo() { // m!(); // Error: m is not in scope. ``` -The `#[macro_use]` attribute has two purposes. First, it can be used to make a +### The `macro_use` attribute + +The *`macro_use` attribute* has two purposes. First, it can be used to make a module's macro scope not end when the module is closed, by applying it to a module: @@ -257,7 +302,7 @@ m!(); ``` Second, it can be used to import macros from another crate, by attaching it to -an `extern crate` declaration appearing in the crate's root module. Macros +an `extern crate` declaration appearing in the crate's root module. Macros imported this way are imported into the prelude of the crate, not textually, which means that they can be shadowed by any other name. While macros imported by `#[macro_use]` can be used before the import statement, in case of a @@ -270,7 +315,7 @@ module. extern crate lazy_static; lazy_static!{}; -// self::lazy_static!{} // Error: lazy_static is not defined inself +// self::lazy_static!{} // Error: lazy_static is not defined in `self` ``` Macros to be imported with `#[macro_use]` must be exported with @@ -302,7 +347,7 @@ mod mac { Macros labeled with `#[macro_export]` are always `pub` and can be referred to by other crates, either by path or by `#[macro_use]` as described above. -## Hygiene +## Hygiene By default, all identifiers referred to in a macro are expanded as-is, and are looked up at the macro's invocation site. This can lead to issues if a macro @@ -418,22 +463,24 @@ When repetitions are involved, then the rules apply to every possible number of expansions, taking separators into account. This means: * If the repetition includes a separator, that separator must be able to - follow the contents of the repitition. - * If the repitition can repeat multiple times (`*` or `+`), then the contents + follow the contents of the repetition. + * If the repetition can repeat multiple times (`*` or `+`), then the contents must be able to follow themselves. * The contents of the repetition must be able to follow whatever comes before, and whatever comes after must be able to follow the contents of the - repitition. - * If the repitition can match zero times (`*` or `?`), then whatever comes + repetition. + * If the repetition can match zero times (`*` or `?`), then whatever comes after must be able to follow whatever comes before. For more detail, see the [formal specification]. +[Hygiene]: #hygiene [IDENTIFIER]: identifiers.html [IDENTIFIER_OR_KEYWORD]: identifiers.html [LIFETIME_TOKEN]: tokens.html#lifetimes-and-loop-labels -[formal specification]: macro-ambiguity.html +[Metavariables]: #metavariables +[Repetitions]: #repetitions [_BlockExpression_]: expressions/block-expr.html [_DelimTokenTree_]: macros.html [_Expression_]: expressions.html @@ -447,5 +494,5 @@ For more detail, see the [formal specification]. [_TypePath_]: paths.html#paths-in-types [_Type_]: types.html#type-expressions [_Visibility_]: visibility-and-privacy.html +[formal specification]: macro-ambiguity.html [token]: tokens.html -[Hygiene]: #hygiene From 327d03533b05e5c95a062087e5a8b5b286fad500 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Mon, 11 Mar 2019 17:00:15 -0700 Subject: [PATCH 3/4] Fix copy/paste mistake. --- src/macros-by-example.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/macros-by-example.md b/src/macros-by-example.md index 24f506509..24273badf 100644 --- a/src/macros-by-example.md +++ b/src/macros-by-example.md @@ -80,7 +80,7 @@ When forwarding a matched fragment to another macro-by-example, matchers in the second macro will see an opaque AST of the fragment type. The second macro can't use literal tokens to match the fragments in the matcher, only a fragment specifier of the same type. The `ident`, `lifetime`, and `tt` -fragment types are an exception, and can be matched by literal tokens. The +fragment types are an exception, and *can* be matched by literal tokens. The following illustrates this restriction: ```rust,compile_fail From f628b29a916e90ad24dd5cbe7e95cae90518dac9 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Tue, 12 Mar 2019 14:27:51 -0700 Subject: [PATCH 4/4] Fix wrong word. --- src/macros-by-example.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/macros-by-example.md b/src/macros-by-example.md index 24273badf..400259e7e 100644 --- a/src/macros-by-example.md +++ b/src/macros-by-example.md @@ -49,7 +49,7 @@ the matcher and the transcriber must be surrounded by delimiters. Macros can expand to expressions, statements, items (including traits, impls, and foreign items), types, or patterns. -## Transcription +## Transcribing When a macro is invoked, the macro expander looks up macro invocations by name, and tries each macro rule in turn. It transcribes the first successful match; if