diff --git a/src/doc/grammar.md b/src/doc/grammar.md index 3d9a5bafbd71e..542815e7afe3c 100644 --- a/src/doc/grammar.md +++ b/src/doc/grammar.md @@ -152,19 +152,19 @@ token : simple_token | ident | literal | symbol | whitespace token ;
-| | | | | | -|----------|----------|----------|----------|--------| -| abstract | alignof | as | become | box | -| break | const | continue | crate | do | -| else | enum | extern | false | final | -| fn | for | if | impl | in | -| let | loop | match | mod | move | -| mut | offsetof | once | override | priv | -| proc | pub | pure | ref | return | -| sizeof | static | self | struct | super | -| true | trait | type | typeof | unsafe | -| unsized | use | virtual | where | while | -| yield | | | | | +| | | | | | +|----------|----------|----------|----------|---------| +| abstract | alignof | as | become | box | +| break | const | continue | crate | do | +| else | enum | extern | false | final | +| fn | for | if | impl | in | +| let | loop | macro | match | mod | +| move | mut | offsetof | override | priv | +| proc | pub | pure | ref | return | +| Self | self | sizeof | static | struct | +| super | trait | true | type | typeof | +| unsafe | unsized | use | virtual | where | +| while | yield | | | | Each of these keywords has special meaning in its grammar, and all of them are @@ -524,6 +524,15 @@ array_elems : [expr [',' expr]*] | [expr ',' ".." expr] ; idx_expr : expr '[' expr ']' ; ``` +### Range expressions + +```antlr +range_expr : expr ".." expr | + expr ".." | + ".." expr | + ".." ; +``` + ### Unary operator expressions **FIXME:** grammar? @@ -610,7 +619,7 @@ lambda_expr : '|' ident_list '|' expr ; ### While loops ```antlr -while_expr : "while" no_struct_literal_expr '{' block '}' ; +while_expr : [ lifetime ':' ] "while" no_struct_literal_expr '{' block '}' ; ``` ### Infinite loops @@ -634,7 +643,7 @@ continue_expr : "continue" [ lifetime ]; ### For expressions ```antlr -for_expr : "for" pat "in" no_struct_literal_expr '{' block '}' ; +for_expr : [ lifetime ':' ] "for" pat "in" no_struct_literal_expr '{' block '}' ; ``` ### If expressions diff --git a/src/doc/reference.md b/src/doc/reference.md index f5a4f12e5faee..fdb791374b1a9 100644 --- a/src/doc/reference.md +++ b/src/doc/reference.md @@ -29,41 +29,6 @@ You may also be interested in the [grammar]. # Notation -Rust's grammar is defined over Unicode code points, each conventionally denoted -`U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is -confined to the ASCII range of Unicode, and is described in this document by a -dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF -supported by common automated LL(k) parsing tools such as `llgen`, rather than -the dialect given in ISO 14977. The dialect can be defined self-referentially -as follows: - -```{.ebnf .notation} -grammar : rule + ; -rule : nonterminal ':' productionrule ';' ; -productionrule : production [ '|' production ] * ; -production : term * ; -term : element repeats ; -element : LITERAL | IDENTIFIER | '[' productionrule ']' ; -repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ; -``` - -Where: - -- Whitespace in the grammar is ignored. -- Square brackets are used to group rules. -- `LITERAL` is a single printable ASCII character, or an escaped hexadecimal - ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding - Unicode code point `U+00QQ`. -- `IDENTIFIER` is a nonempty string of ASCII letters and underscores. -- The `repeat` forms apply to the adjacent `element`, and are as follows: - - `?` means zero or one repetition - - `*` means zero or more repetitions - - `+` means one or more repetitions - - NUMBER trailing a repeat symbol gives a maximum repetition count - - NUMBER on its own gives an exact repetition count - -This EBNF dialect should hopefully be familiar to many readers. - ## Unicode productions A few productions in Rust's grammar permit Unicode code points outside the ASCII @@ -132,13 +97,6 @@ Some productions are defined by exclusion of particular Unicode characters: ## Comments -```{.ebnf .gram} -comment : block_comment | line_comment ; -block_comment : "/*" block_comment_body * "*/" ; -block_comment_body : [block_comment | character] * ; -line_comment : "//" non_eol * ; -``` - Comments in Rust code follow the general C++ style of line and block-comment forms. Nested block comments are supported. @@ -159,11 +117,6 @@ Non-doc comments are interpreted as a form of whitespace. ## Whitespace -```{.ebnf .gram} -whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ; -whitespace : [ whitespace_char | comment ] + ; -``` - The `whitespace_char` production is any nonempty Unicode string consisting of any of the following Unicode characters: `U+0020` (space, `' '`), `U+0009` (tab, `'\t'`), `U+000A` (LF, `'\n'`), `U+000D` (CR, `'\r'`). @@ -176,41 +129,11 @@ with any other legal whitespace element, such as a single space character. ## Tokens -```{.ebnf .gram} -simple_token : keyword | unop | binop ; -token : simple_token | ident | literal | symbol | whitespace token ; -``` - Tokens are primitive productions in the grammar defined by regular (non-recursive) languages. "Simple" tokens are given in [string table production](#string-table-productions) form, and occur in the rest of the grammar as double-quoted strings. Other tokens have exact rules given. -### Keywords - - - -| | | | | | -|----------|----------|----------|----------|---------| -| abstract | alignof | as | become | box | -| break | const | continue | crate | do | -| else | enum | extern | false | final | -| fn | for | if | impl | in | -| let | loop | macro | match | mod | -| move | mut | offsetof | override | priv | -| proc | pub | pure | ref | return | -| Self | self | sizeof | static | struct | -| super | trait | true | type | typeof | -| unsafe | unsized | use | virtual | where | -| while | yield | | | | - - -Each of these keywords has special meaning in its grammar, and all of them are -excluded from the `ident` rule. - -Note that some of these keywords are reserved, and do not currently do -anything. - ### Literals A literal is an expression consisting of a single token, rather than a sequence @@ -218,11 +141,6 @@ of tokens, that immediately and directly denotes the value it evaluates to, rather than referring to it by name or some other evaluation rule. A literal is a form of constant expression, so is evaluated (primarily) at compile time. -```{.ebnf .gram} -lit_suffix : ident; -literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit ] lit_suffix ?; -``` - The optional suffix is only used for certain numeric literals, but is reserved for future extension, that is, the above gives the lexical grammar, but a Rust parser will reject everything but the 12 special @@ -275,32 +193,6 @@ cases mentioned in [Number literals](#number-literals) below. #### Character and string literals -```{.ebnf .gram} -char_lit : '\x27' char_body '\x27' ; -string_lit : '"' string_body * '"' | 'r' raw_string ; - -char_body : non_single_quote - | '\x5c' [ '\x27' | common_escape | unicode_escape ] ; - -string_body : non_double_quote - | '\x5c' [ '\x22' | common_escape | unicode_escape ] ; -raw_string : '"' raw_string_body '"' | '#' raw_string '#' ; - -common_escape : '\x5c' - | 'n' | 'r' | 't' | '0' - | 'x' hex_digit 2 - -unicode_escape : 'u' '{' hex_digit+ 6 '}'; - -hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f' - | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' - | dec_digit ; -oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ; -dec_digit : '0' | nonzero_dec ; -nonzero_dec: '1' | '2' | '3' | '4' - | '5' | '6' | '7' | '8' | '9' ; -``` - ##### Character literals A _character literal_ is a single Unicode character enclosed within two @@ -349,11 +241,10 @@ following forms: Raw string literals do not process any escapes. They start with the character `U+0072` (`r`), followed by zero or more of the character `U+0023` (`#`) and a -`U+0022` (double-quote) character. The _raw string body_ is not defined in the -EBNF grammar above: it can contain any sequence of Unicode characters and is -terminated only by another `U+0022` (double-quote) character, followed by the -same number of `U+0023` (`#`) characters that preceded the opening `U+0022` -(double-quote) character. +`U+0022` (double-quote) character. The _raw string body_ can contain any sequence +of Unicode characters and is terminated only by another `U+0022` (double-quote) +character, followed by the same number of `U+0023` (`#`) characters that preceded +the opening `U+0022` (double-quote) character. All Unicode characters contained in the raw string body represent themselves, the characters `U+0022` (double-quote) (except when followed by at least as @@ -375,19 +266,6 @@ r##"foo #"# bar"##; // foo #"# bar #### Byte and byte string literals -```{.ebnf .gram} -byte_lit : "b\x27" byte_body '\x27' ; -byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ; - -byte_body : ascii_non_single_quote - | '\x5c' [ '\x27' | common_escape ] ; - -byte_string_body : ascii_non_double_quote - | '\x5c' [ '\x22' | common_escape ] ; -raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ; - -``` - ##### Byte literals A _byte literal_ is a single ASCII character (in the `U+0000` to `U+007F` @@ -424,11 +302,10 @@ following forms: Raw byte string literals do not process any escapes. They start with the character `U+0062` (`b`), followed by `U+0072` (`r`), followed by zero or more of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The -_raw string body_ is not defined in the EBNF grammar above: it can contain any -sequence of ASCII characters and is terminated only by another `U+0022` -(double-quote) character, followed by the same number of `U+0023` (`#`) -characters that preceded the opening `U+0022` (double-quote) character. A raw -byte string literal can not contain any non-ASCII byte. +_raw string body_ can contain any sequence of ASCII characters and is terminated +only by another `U+0022` (double-quote) character, followed by the same number of +`U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote) +character. A raw byte string literal can not contain any non-ASCII byte. All characters contained in the raw string body represent their ASCII encoding, the characters `U+0022` (double-quote) (except when followed by at least as @@ -450,19 +327,6 @@ b"\\x52"; br"\x52"; // \x52 #### Number literals -```{.ebnf .gram} -num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ? - | '0' [ [ dec_digit | '_' ] * float_suffix ? - | 'b' [ '1' | '0' | '_' ] + - | 'o' [ oct_digit | '_' ] + - | 'x' [ hex_digit | '_' ] + ] ; - -float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ; - -exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ; -dec_lit : [ dec_digit | '_' ] + ; -``` - A _number literal_ is either an _integer literal_ or a _floating-point literal_. The grammar for recognizing the two kinds of literals is mixed. @@ -540,12 +404,6 @@ The two values of the boolean type are written `true` and `false`. ### Symbols -```{.ebnf .gram} -symbol : "::" | "->" - | '#' | '[' | ']' | '(' | ')' | '{' | '}' - | ',' | ';' ; -``` - Symbols are a general class of printable [token](#tokens) that play structural roles in a variety of grammar productions. They are catalogued here for completeness as the set of remaining miscellaneous printable tokens that do not @@ -555,16 +413,6 @@ operators](#binary-operator-expressions), or [keywords](#keywords). ## Paths -```{.ebnf .gram} -expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ; -expr_path_tail : '<' type_expr [ ',' type_expr ] + '>' - | expr_path ; - -type_path : ident [ type_path_tail ] + ; -type_path_tail : '<' type_expr [ ',' type_expr ] + '>' - | "::" type_path ; -``` - A _path_ is a sequence of one or more path components _logically_ separated by a namespace qualifier (`::`). If a path consists of only one component, it may refer to either an [item](#items) or a [variable](#variables) in a local control @@ -660,19 +508,6 @@ Users of `rustc` can define new syntax extensions in two ways: ## Macros -```{.ebnf .gram} -expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ; -macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ; -matcher : '(' matcher * ')' | '[' matcher * ']' - | '{' matcher * '}' | '$' ident ':' ident - | '$' '(' matcher * ')' sep_token? [ '*' | '+' ] - | non_special_token ; -transcriber : '(' transcriber * ')' | '[' transcriber * ']' - | '{' transcriber * '}' | '$' ident - | '$' '(' transcriber * ')' sep_token? [ '*' | '+' ] - | non_special_token ; -``` - `macro_rules` allows users to define syntax extension in a declarative way. We call such extensions "macros by example" or simply "macros" — to be distinguished from the "procedural macros" defined in [compiler plugins][plugin]. @@ -811,12 +646,6 @@ Crates contain [items](#items), each of which may have some number of ## Items -```{.ebnf .gram} -item : extern_crate_decl | use_decl | mod_item | fn_item | type_item - | struct_item | enum_item | static_item | trait_item | impl_item - | extern_block ; -``` - An _item_ is a component of a crate. Items are organized within a crate by a nested set of [modules](#modules). Every crate has a single "outermost" anonymous module; all further items within the crate have [paths](#paths) @@ -863,11 +692,6 @@ no notion of type abstraction: there are no first-class "forall" types. ### Modules -```{.ebnf .gram} -mod_item : "mod" ident ( ';' | '{' mod '}' ); -mod : item * ; -``` - A module is a container for zero or more [items](#items). A _module item_ is a module, surrounded in braces, named, and prefixed with the @@ -928,11 +752,6 @@ mod thread { ##### Extern crate declarations -```{.ebnf .gram} -extern_crate_decl : "extern" "crate" crate_name -crate_name: ident | ( string_lit "as" ident ) -``` - An _`extern crate` declaration_ specifies a dependency on an external crate. The external crate is then bound into the declaring scope as the `ident` provided in the `extern_crate_decl`. @@ -958,17 +777,6 @@ extern crate std as ruststd; // linking to 'std' under another name ##### Use declarations -```{.ebnf .gram} -use_decl : "pub" ? "use" [ path "as" ident - | path_glob ] ; - -path_glob : ident [ "::" [ path_glob - | '*' ] ] ? - | '{' path_item [ ',' path_item ] * '}' ; - -path_item : ident | "self" ; -``` - A _use declaration_ creates one or more local name bindings synonymous with some other [path](#paths). Usually a `use` declaration is used to shorten the path required to refer to a module item. These declarations may appear at the @@ -1413,10 +1221,6 @@ it were `Bar(i32)`, this is disallowed. ### Constant items -```{.ebnf .gram} -const_item : "const" ident ':' type '=' expr ';' ; -``` - A *constant item* is a named _constant value_ which is not associated with a specific memory location in the program. Constants are essentially inlined wherever they are used, meaning that they are copied directly into the relevant @@ -1453,10 +1257,6 @@ const BITS_N_STRINGS: BitsNStrings<'static> = BitsNStrings { ### Static items -```{.ebnf .gram} -static_item : "static" ident ':' type '=' expr ';' ; -``` - A *static item* is similar to a *constant*, except that it represents a precise memory location in the program. A static is never "inlined" at the usage site, and all references to it refer to the same memory location. Static items have @@ -1711,11 +1511,6 @@ impl Seq