Merge pull request #1305 from mattheww/2022-11_parse_all_suffixes

ehuss · web-flow · commit c7a39ca2c783 · 2022-11-25T14:44:24.000-08:00
Update literal suffix docs for rust-lang/rust#102944
diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md
@@ -8,11 +8,9 @@
 > &nbsp;&nbsp; | [BYTE_LITERAL]\
 > &nbsp;&nbsp; | [BYTE_STRING_LITERAL]\
 > &nbsp;&nbsp; | [RAW_BYTE_STRING_LITERAL]\
-> &nbsp;&nbsp; | [INTEGER_LITERAL][^out-of-range]\
+> &nbsp;&nbsp; | [INTEGER_LITERAL]\
 > &nbsp;&nbsp; | [FLOAT_LITERAL]\
 > &nbsp;&nbsp; | `true` | `false`
->
-> [^out-of-range]: A value ≥ 2<sup>128</sup> is not allowed.
 
 A _literal expression_ is an expression consisting of a single token, rather than a sequence of tokens, that immediately and directly denotes the value it evaluates to, rather than referring to it by name or some other evaluation rule.
 
@@ -54,7 +52,7 @@ A string literal expression consists of a single [BYTE_STRING_LITERAL] or [RAW_B
 
 An integer literal expression consists of a single [INTEGER_LITERAL] token.
 
-If the token has a [suffix], the suffix will be the name of one of the [primitive integer types][numeric types]: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`, and the expression has that type.
+If the token has a [suffix], the suffix must be the name of one of the [primitive integer types][numeric types]: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`, and the expression has that type.
 
 If the token has no suffix, the expression's type is determined by type inference:
 
@@ -96,10 +94,12 @@ The value of the expression is determined from the string representation of the
 
 * If the radix is not 10, the first two characters are removed from the string.
 
+* Any suffix is removed from the string.
+
 * Any underscores are removed from the string.
 
 * The string is converted to a `u128` value as if by [`u128::from_str_radix`] with the chosen radix.
-If the value does not fit in `u128`, the expression is rejected by the parser.
+If the value does not fit in `u128`, it is a compiler error.
 
 * The `u128` value is converted to the expression's type via a [numeric cast].
 
@@ -111,9 +111,11 @@ If the value does not fit in `u128`, the expression is rejected by the parser.
 
 ## Floating-point literal expressions
 
-A floating-point literal expression consists of a single [FLOAT_LITERAL] token.
+A floating-point literal expression has one of two forms:
+ * a single [FLOAT_LITERAL] token
+ * a single [INTEGER_LITERAL] token which has a suffix and no radix indicator
 
-If the token has a [suffix], the suffix will be the name of one of the [primitive floating-point types][floating-point types]: `f32` or `f64`, and the expression has that type.
+If the token has a [suffix], the suffix must be the name of one of the [primitive floating-point types][floating-point types]: `f32` or `f64`, and the expression has that type.
 
 If the token has no suffix, the expression's type is determined by type inference:
 
@@ -136,6 +138,8 @@ let x: f64 = 2.; // type f64
 
 The value of the expression is determined from the string representation of the token as follows:
 
+* Any suffix is removed from the string.
+
 * Any underscores are removed from the string.
 
 * The string is converted to the expression's type as if by [`f32::from_str`] or [`f64::from_str`].
diff --git a/src/tokens.md b/src/tokens.md
@@ -72,31 +72,40 @@ Literals are tokens used in [literal expressions].
 
 #### Numbers
 
-| [Number literals](#number-literals)`*` | Example | Exponentiation | Suffixes |
-|----------------------------------------|---------|----------------|----------|
-| Decimal integer | `98_222` | `N/A` | Integer suffixes |
-| Hex integer | `0xff` | `N/A` | Integer suffixes |
-| Octal integer | `0o77` | `N/A` | Integer suffixes |
-| Binary integer | `0b1111_0000` | `N/A` | Integer suffixes |
-| Floating-point | `123.0E+77` | `Optional` | Floating-point suffixes |
+| [Number literals](#number-literals)`*` | Example | Exponentiation |
+|----------------------------------------|---------|----------------|
+| Decimal integer | `98_222` | `N/A` |
+| Hex integer | `0xff` | `N/A` |
+| Octal integer | `0o77` | `N/A` |
+| Binary integer | `0b1111_0000` | `N/A` |
+| Floating-point | `123.0E+77` | `Optional` |
 
 `*` All number literals allow `_` as a visual separator: `1_234.0E+18f64`
 
 #### Suffixes
 
 A suffix is a sequence of characters following the primary part of a literal (without intervening whitespace), of the same form as a non-raw identifier or keyword.
 
-Any kind of literal (string, integer, etc) with any suffix is valid as a token,
-and can be passed to a macro without producing an error.
+
+> **<sup>Lexer</sup>**\
+> SUFFIX : IDENTIFIER_OR_KEYWORD\
+> SUFFIX_NO_E : SUFFIX <sub>_not beginning with `e` or `E`_</sub>
+
+Any kind of literal (string, integer, etc) with any suffix is valid as a token.
+
+A literal token with any suffix can be passed to a macro without producing an error.
 The macro itself will decide how to interpret such a token and whether to produce an error or not.
+In particular, the `literal` fragment specifier for by-example macros matches literal tokens with arbitrary suffixes.
 
 ```rust
 macro_rules! blackhole { ($tt:tt) => () }
+macro_rules! blackhole_lit { ($l:literal) => () }
 
 blackhole!("string"suffix); // OK
+blackhole_lit!(1suffix); // OK
 ```
 
-However, suffixes on literal tokens parsed as Rust code are restricted.
+However, suffixes on literal tokens which are interpreted as literal expressions or patterns are restricted.
 Any suffixes are rejected on non-numeric literal tokens,
 and numeric literal tokens are accepted only with suffixes from the list below.
 
@@ -110,7 +119,7 @@ and numeric literal tokens are accepted only with suffixes from the list below.
 
 > **<sup>Lexer</sup>**\
 > CHAR_LITERAL :\
-> &nbsp;&nbsp; `'` ( ~\[`'` `\` \\n \\r \\t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) `'`
+> &nbsp;&nbsp; `'` ( ~\[`'` `\` \\n \\r \\t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) `'` SUFFIX<sup>?</sup>
 >
 > QUOTE_ESCAPE :\
 > &nbsp;&nbsp; `\'` | `\"`
@@ -136,7 +145,7 @@ which must be _escaped_ by a preceding `U+005C` character (`\`).
 > &nbsp;&nbsp; &nbsp;&nbsp; | ASCII_ESCAPE\
 > &nbsp;&nbsp; &nbsp;&nbsp; | UNICODE_ESCAPE\
 > &nbsp;&nbsp; &nbsp;&nbsp; | STRING_CONTINUE\
-> &nbsp;&nbsp; )<sup>\*</sup> `"`
+> &nbsp;&nbsp; )<sup>\*</sup> `"` SUFFIX<sup>?</sup>
 >
 > STRING_CONTINUE :\
 > &nbsp;&nbsp; `\` _followed by_ \\n
@@ -196,7 +205,7 @@ following forms:
 
 > **<sup>Lexer</sup>**\
 > RAW_STRING_LITERAL :\
-> &nbsp;&nbsp; `r` RAW_STRING_CONTENT
+> &nbsp;&nbsp; `r` RAW_STRING_CONTENT SUFFIX<sup>?</sup>
 >
 > RAW_STRING_CONTENT :\
 > &nbsp;&nbsp; &nbsp;&nbsp; `"` ( ~ _IsolatedCR_ )<sup>* (non-greedy)</sup> `"`\
@@ -233,7 +242,7 @@ r##"foo #"# bar"##;                // foo #"# bar
 
 > **<sup>Lexer</sup>**\
 > BYTE_LITERAL :\
-> &nbsp;&nbsp; `b'` ( ASCII_FOR_CHAR | BYTE_ESCAPE )  `'`
+> &nbsp;&nbsp; `b'` ( ASCII_FOR_CHAR | BYTE_ESCAPE )  `'` SUFFIX<sup>?</sup>
 >
 > ASCII_FOR_CHAR :\
 > &nbsp;&nbsp; _any ASCII (i.e. 0x00 to 0x7F), except_ `'`, `\`, \\n, \\r or \\t
@@ -253,7 +262,7 @@ _number literal_.
 
 > **<sup>Lexer</sup>**\
 > BYTE_STRING_LITERAL :\
-> &nbsp;&nbsp; `b"` ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )<sup>\*</sup> `"`
+> &nbsp;&nbsp; `b"` ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )<sup>\*</sup> `"` SUFFIX<sup>?</sup>
 >
 > ASCII_FOR_STRING :\
 > &nbsp;&nbsp; _any ASCII (i.e 0x00 to 0x7F), except_ `"`, `\` _and IsolatedCR_
@@ -284,7 +293,7 @@ following forms:
 
 > **<sup>Lexer</sup>**\
 > RAW_BYTE_STRING_LITERAL :\
-> &nbsp;&nbsp; `br` RAW_BYTE_STRING_CONTENT
+> &nbsp;&nbsp; `br` RAW_BYTE_STRING_CONTENT SUFFIX<sup>?</sup>
 >
 > RAW_BYTE_STRING_CONTENT :\
 > &nbsp;&nbsp; &nbsp;&nbsp; `"` ASCII<sup>* (non-greedy)</sup> `"`\
@@ -329,7 +338,7 @@ literal_. The grammar for recognizing the two kinds of literals is mixed.
 > **<sup>Lexer</sup>**\
 > INTEGER_LITERAL :\
 > &nbsp;&nbsp; ( DEC_LITERAL | BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )
->              INTEGER_SUFFIX<sup>?</sup>
+>              SUFFIX_NO_E<sup>?</sup>
 >
 > DEC_LITERAL :\
 > &nbsp;&nbsp; DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup>
@@ -350,10 +359,6 @@ literal_. The grammar for recognizing the two kinds of literals is mixed.
 > DEC_DIGIT : \[`0`-`9`]
 >
 > HEX_DIGIT : \[`0`-`9` `a`-`f` `A`-`F`]
->
-> INTEGER_SUFFIX :\
-> &nbsp;&nbsp; &nbsp;&nbsp; `u8` | `u16` | `u32` | `u64` | `u128` | `usize`\
-> &nbsp;&nbsp; | `i8` | `i16` | `i32` | `i64` | `i128` | `isize`
 
 An _integer literal_ has one of four forms:
 
@@ -369,11 +374,11 @@ An _integer literal_ has one of four forms:
   (`0b`) and continues as any mixture (with at least one digit) of binary digits
   and underscores.
 
-Like any literal, an integer literal may be followed (immediately, without any spaces) by an _integer suffix_, which must be the name of one of the [primitive integer types][numeric types]:
-`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`.
+Like any literal, an integer literal may be followed (immediately, without any spaces) by a suffix as described above.
+The suffix may not begin with `e` or `E`, as that would be interpreted as the exponent of a floating-point literal.
 See [literal expressions] for the effect of these suffixes.
 
-Examples of integer literals of various forms:
+Examples of integer literals which are accepted as literal expressions:
 
 ```rust
 # #![allow(overflowing_literals)]
@@ -396,27 +401,27 @@ Examples of integer literals of various forms:
 
 0usize;
 
-// These are too big for their type, but are still valid tokens
-
+// These are too big for their type, but are accepted as literal expressions.
 128_i8;
 256_u8;
 
+// This is an integer literal, accepted as a floating-point literal expression.
+5f32;
 ```
 
 Note that `-1i8`, for example, is analyzed as two tokens: `-` followed by `1i8`.
 
-Examples of invalid integer literals:
 
-```rust,compile_fail
-// uses numbers of the wrong base
+Examples of integer literals which are not accepted as literal expressions:
 
-0b0102;
-0o0581;
-
-// bin, hex, and octal literals must have at least one digit
-
-0b_;
-0b____;
+```rust
+# #[cfg(FALSE)] {
+0invalidSuffix;
+123AFB43;
+0b010a;
+0xAB_CD_EF_GH;
+0b1111_f32;
+# }
 ```
 
 #### Tuple index
@@ -442,48 +447,41 @@ let cat = example.01;  // ERROR no field named `01`
 let horse = example.0b10;  // ERROR no field named `0b10`
 ```
 
-> **Note**: The tuple index may include an `INTEGER_SUFFIX`, but this is not
-> intended to be valid, and may be removed in a future version. See
-> <https://github.com/rust-lang/rust/issues/60210> for more information.
+> **Note**: Tuple indices may include certain suffixes, but this is not intended to be valid, and may be removed in a future version.
+> See <https://github.com/rust-lang/rust/issues/60210> for more information.
 
 #### Floating-point literals
 
 > **<sup>Lexer</sup>**\
 > FLOAT_LITERAL :\
 > &nbsp;&nbsp; &nbsp;&nbsp; DEC_LITERAL `.`
 >   _(not immediately followed by `.`, `_` or an XID_Start character)_\
-> &nbsp;&nbsp; | DEC_LITERAL FLOAT_EXPONENT\
-> &nbsp;&nbsp; | DEC_LITERAL `.` DEC_LITERAL FLOAT_EXPONENT<sup>?</sup>\
-> &nbsp;&nbsp; | DEC_LITERAL (`.` DEC_LITERAL)<sup>?</sup>
->                    FLOAT_EXPONENT<sup>?</sup> FLOAT_SUFFIX
+> &nbsp;&nbsp; | DEC_LITERAL `.` DEC_LITERAL SUFFIX_NO_E<sup>?</sup>\
+> &nbsp;&nbsp; | DEC_LITERAL (`.` DEC_LITERAL)<sup>?</sup> FLOAT_EXPONENT SUFFIX<sup>?</sup>\
 >
 > FLOAT_EXPONENT :\
 > &nbsp;&nbsp; (`e`|`E`) (`+`|`-`)<sup>?</sup>
 >               (DEC_DIGIT|`_`)<sup>\*</sup> DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup>
 >
-> FLOAT_SUFFIX :\
-> &nbsp;&nbsp; `f32` | `f64`
 
-A _floating-point literal_ has one of three forms:
+A _floating-point literal_ has one of two forms:
 
 * A _decimal literal_ followed by a period character `U+002E` (`.`). This is
   optionally followed by another decimal literal, with an optional _exponent_.
 * A single _decimal literal_ followed by an _exponent_.
-* A single _decimal literal_ (in which case a suffix is required).
 
 Like integer literals, a floating-point literal may be followed by a
 suffix, so long as the pre-suffix part does not end with `U+002E` (`.`).
-There are two valid _floating-point suffixes_: `f32` and `f64` (the names of the 32-bit and 64-bit [primitive floating-point types][floating-point types]).
+The suffix may not begin with `e` or `E` if the literal does not include an exponent.
 See [literal expressions] for the effect of these suffixes.
 
-Examples of floating-point literals of various forms:
+Examples of floating-point literals which are accepted as literal expressions:
 
 ```rust
 123.0f64;
 0.1f64;
 0.1f32;
 12E+99_f64;
-5f32;
 let x: f64 = 2.;
 ```
 
@@ -493,39 +491,16 @@ to call a method named `f64` on `2`.
 
 Note that `-1.0`, for example, is analyzed as two tokens: `-` followed by `1.0`.
 
-#### Number pseudoliterals
-
-> **<sup>Lexer</sup>**\
-> NUMBER_PSEUDOLITERAL :\
-> &nbsp;&nbsp; &nbsp;&nbsp; DEC_LITERAL ( . DEC_LITERAL )<sup>?</sup> FLOAT_EXPONENT\
-> &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; ( NUMBER_PSEUDOLITERAL_SUFFIX | INTEGER_SUFFIX )\
-> &nbsp;&nbsp; | DEC_LITERAL . DEC_LITERAL\
-> &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | INTEGER SUFFIX )\
-> &nbsp;&nbsp; | DEC_LITERAL NUMBER_PSEUDOLITERAL_SUFFIX_NO_E\
-> &nbsp;&nbsp; | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )\
-> &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | FLOAT_SUFFIX )
->
-> NUMBER_PSEUDOLITERAL_SUFFIX :\
-> &nbsp;&nbsp; IDENTIFIER_OR_KEYWORD <sub>_not matching INTEGER_SUFFIX or FLOAT_SUFFIX_</sub>
->
-> NUMBER_PSEUDOLITERAL_SUFFIX_NO_E :\
-> &nbsp;&nbsp; NUMBER_PSEUDOLITERAL_SUFFIX <sub>_not beginning with `e` or `E`_</sub>
-
-Tokenization of numeric literals allows arbitrary suffixes as described in the grammar above.
-These values generate valid tokens, but are not valid [literal expressions], so are usually an error except as macro arguments.
+Examples of floating-point literals which are not accepted as literal expressions:
 
-Examples of such tokens:
-```rust,compile_fail
-0invalidSuffix;
-123AFB43;
-0b010a;
-0xAB_CD_EF_GH;
+```rust
+# #[cfg(FALSE)] {
 2.0f80;
 2e5f80;
 2e5e6;
 2.0e5e6;
 1.3e10u64;
-0b1111_f32;
+# }
 ```
 
 #### Reserved forms similar to number literals
@@ -536,7 +511,7 @@ Examples of such tokens:
 > &nbsp;&nbsp; | OCT_LITERAL \[`8`-`9`&ZeroWidthSpace;]\
 > &nbsp;&nbsp; | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) `.` \
 > &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; _(not immediately followed by `.`, `_` or an XID_Start character)_\
-> &nbsp;&nbsp; | ( BIN_LITERAL | OCT_LITERAL ) `e`\
+> &nbsp;&nbsp; | ( BIN_LITERAL | OCT_LITERAL ) (`e`|`E`)\
 > &nbsp;&nbsp; | `0b` `_`<sup>\*</sup> _end of input or not BIN_DIGIT_\
 > &nbsp;&nbsp; | `0o` `_`<sup>\*</sup> _end of input or not OCT_DIGIT_\
 > &nbsp;&nbsp; | `0x` `_`<sup>\*</sup> _end of input or not HEX_DIGIT_\
@@ -549,7 +524,7 @@ Due to the possible ambiguity these raise, they are rejected by the tokenizer in
 
 * An unsuffixed binary, octal, or hexadecimal literal followed, without intervening whitespace, by a period character (with the same restrictions on what follows the period as for floating-point literals).
 
-* An unsuffixed binary or octal literal followed, without intervening whitespace, by the character `e`.
+* An unsuffixed binary or octal literal followed, without intervening whitespace, by the character `e` or `E`.
 
 * Input which begins with one of the radix prefixes but is not a valid binary, octal, or hexadecimal literal (because it contains no digits).
 
@@ -561,13 +536,13 @@ Examples of reserved forms:
 0b0102;  // this is not `0b010` followed by `2`
 0o1279;  // this is not `0o127` followed by `9`
 0x80.0;  // this is not `0x80` followed by `.` and `0`
-0b101e;  // this is not a pseudoliteral, or `0b101` followed by `e`
-0b;      // this is not a pseudoliteral, or `0` followed by  `b`
-0b_;     // this is not a pseudoliteral, or `0` followed by  `b_`
-2e;      // this is not a pseudoliteral, or `2` followed by `e`
-2.0e;    // this is not a pseudoliteral, or `2.0` followed by `e`
-2em;     // this is not a pseudoliteral, or `2` followed by `em`
-2.0em;   // this is not a pseudoliteral, or `2.0` followed by `em`
+0b101e;  // this is not a suffixed literal, or `0b101` followed by `e`
+0b;      // this is not an integer literal, or `0` followed by `b`
+0b_;     // this is not an integer literal, or `0` followed by `b_`
+2e;      // this is not a floating-point literal, or `2` followed by `e`
+2.0e;    // this is not a floating-point literal, or `2.0` followed by `e`
+2em;     // this is not a suffixed literal, or `2` followed by `em`
+2.0em;   // this is not a suffixed literal, or `2.0` followed by `em`
 ```
 
 ## Lifetimes and loop labels