Skip to content

Commit c7a39ca

Browse files
authored
Merge pull request #1305 from mattheww/2022-11_parse_all_suffixes
Update literal suffix docs for rust-lang/rust#102944
2 parents e203b97 + 018b14b commit c7a39ca

File tree

2 files changed

+72
-93
lines changed

2 files changed

+72
-93
lines changed

src/expressions/literal-expr.md

+11-7
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,9 @@
88
>    | [BYTE_LITERAL]\
99
>    | [BYTE_STRING_LITERAL]\
1010
>    | [RAW_BYTE_STRING_LITERAL]\
11-
>    | [INTEGER_LITERAL][^out-of-range]\
11+
>    | [INTEGER_LITERAL]\
1212
>    | [FLOAT_LITERAL]\
1313
>    | `true` | `false`
14-
>
15-
> [^out-of-range]: A value ≥ 2<sup>128</sup> is not allowed.
1614
1715
A _literal expression_ is an expression consisting of a single token, rather than a sequence of tokens, that immediately and directly denotes the value it evaluates to, rather than referring to it by name or some other evaluation rule.
1816

@@ -54,7 +52,7 @@ A string literal expression consists of a single [BYTE_STRING_LITERAL] or [RAW_B
5452

5553
An integer literal expression consists of a single [INTEGER_LITERAL] token.
5654

57-
If the token has a [suffix], the suffix will be the name of one of the [primitive integer types][numeric types]: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`, and the expression has that type.
55+
If the token has a [suffix], the suffix must be the name of one of the [primitive integer types][numeric types]: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`, and the expression has that type.
5856

5957
If the token has no suffix, the expression's type is determined by type inference:
6058

@@ -96,10 +94,12 @@ The value of the expression is determined from the string representation of the
9694

9795
* If the radix is not 10, the first two characters are removed from the string.
9896

97+
* Any suffix is removed from the string.
98+
9999
* Any underscores are removed from the string.
100100

101101
* The string is converted to a `u128` value as if by [`u128::from_str_radix`] with the chosen radix.
102-
If the value does not fit in `u128`, the expression is rejected by the parser.
102+
If the value does not fit in `u128`, it is a compiler error.
103103

104104
* The `u128` value is converted to the expression's type via a [numeric cast].
105105

@@ -111,9 +111,11 @@ If the value does not fit in `u128`, the expression is rejected by the parser.
111111
112112
## Floating-point literal expressions
113113

114-
A floating-point literal expression consists of a single [FLOAT_LITERAL] token.
114+
A floating-point literal expression has one of two forms:
115+
* a single [FLOAT_LITERAL] token
116+
* a single [INTEGER_LITERAL] token which has a suffix and no radix indicator
115117

116-
If the token has a [suffix], the suffix will be the name of one of the [primitive floating-point types][floating-point types]: `f32` or `f64`, and the expression has that type.
118+
If the token has a [suffix], the suffix must be the name of one of the [primitive floating-point types][floating-point types]: `f32` or `f64`, and the expression has that type.
117119

118120
If the token has no suffix, the expression's type is determined by type inference:
119121

@@ -136,6 +138,8 @@ let x: f64 = 2.; // type f64
136138

137139
The value of the expression is determined from the string representation of the token as follows:
138140

141+
* Any suffix is removed from the string.
142+
139143
* Any underscores are removed from the string.
140144

141145
* The string is converted to the expression's type as if by [`f32::from_str`] or [`f64::from_str`].

src/tokens.md

+61-86
Original file line numberDiff line numberDiff line change
@@ -72,31 +72,40 @@ Literals are tokens used in [literal expressions].
7272

7373
#### Numbers
7474

75-
| [Number literals](#number-literals)`*` | Example | Exponentiation | Suffixes |
76-
|----------------------------------------|---------|----------------|----------|
77-
| Decimal integer | `98_222` | `N/A` | Integer suffixes |
78-
| Hex integer | `0xff` | `N/A` | Integer suffixes |
79-
| Octal integer | `0o77` | `N/A` | Integer suffixes |
80-
| Binary integer | `0b1111_0000` | `N/A` | Integer suffixes |
81-
| Floating-point | `123.0E+77` | `Optional` | Floating-point suffixes |
75+
| [Number literals](#number-literals)`*` | Example | Exponentiation |
76+
|----------------------------------------|---------|----------------|
77+
| Decimal integer | `98_222` | `N/A` |
78+
| Hex integer | `0xff` | `N/A` |
79+
| Octal integer | `0o77` | `N/A` |
80+
| Binary integer | `0b1111_0000` | `N/A` |
81+
| Floating-point | `123.0E+77` | `Optional` |
8282

8383
`*` All number literals allow `_` as a visual separator: `1_234.0E+18f64`
8484

8585
#### Suffixes
8686

8787
A suffix is a sequence of characters following the primary part of a literal (without intervening whitespace), of the same form as a non-raw identifier or keyword.
8888

89-
Any kind of literal (string, integer, etc) with any suffix is valid as a token,
90-
and can be passed to a macro without producing an error.
89+
90+
> **<sup>Lexer</sup>**\
91+
> SUFFIX : IDENTIFIER_OR_KEYWORD\
92+
> SUFFIX_NO_E : SUFFIX <sub>_not beginning with `e` or `E`_</sub>
93+
94+
Any kind of literal (string, integer, etc) with any suffix is valid as a token.
95+
96+
A literal token with any suffix can be passed to a macro without producing an error.
9197
The macro itself will decide how to interpret such a token and whether to produce an error or not.
98+
In particular, the `literal` fragment specifier for by-example macros matches literal tokens with arbitrary suffixes.
9299

93100
```rust
94101
macro_rules! blackhole { ($tt:tt) => () }
102+
macro_rules! blackhole_lit { ($l:literal) => () }
95103

96104
blackhole!("string"suffix); // OK
105+
blackhole_lit!(1suffix); // OK
97106
```
98107

99-
However, suffixes on literal tokens parsed as Rust code are restricted.
108+
However, suffixes on literal tokens which are interpreted as literal expressions or patterns are restricted.
100109
Any suffixes are rejected on non-numeric literal tokens,
101110
and numeric literal tokens are accepted only with suffixes from the list below.
102111

@@ -110,7 +119,7 @@ and numeric literal tokens are accepted only with suffixes from the list below.
110119

111120
> **<sup>Lexer</sup>**\
112121
> CHAR_LITERAL :\
113-
> &nbsp;&nbsp; `'` ( ~\[`'` `\` \\n \\r \\t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) `'`
122+
> &nbsp;&nbsp; `'` ( ~\[`'` `\` \\n \\r \\t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) `'` SUFFIX<sup>?</sup>
114123
>
115124
> QUOTE_ESCAPE :\
116125
> &nbsp;&nbsp; `\'` | `\"`
@@ -136,7 +145,7 @@ which must be _escaped_ by a preceding `U+005C` character (`\`).
136145
> &nbsp;&nbsp; &nbsp;&nbsp; | ASCII_ESCAPE\
137146
> &nbsp;&nbsp; &nbsp;&nbsp; | UNICODE_ESCAPE\
138147
> &nbsp;&nbsp; &nbsp;&nbsp; | STRING_CONTINUE\
139-
> &nbsp;&nbsp; )<sup>\*</sup> `"`
148+
> &nbsp;&nbsp; )<sup>\*</sup> `"` SUFFIX<sup>?</sup>
140149
>
141150
> STRING_CONTINUE :\
142151
> &nbsp;&nbsp; `\` _followed by_ \\n
@@ -196,7 +205,7 @@ following forms:
196205

197206
> **<sup>Lexer</sup>**\
198207
> RAW_STRING_LITERAL :\
199-
> &nbsp;&nbsp; `r` RAW_STRING_CONTENT
208+
> &nbsp;&nbsp; `r` RAW_STRING_CONTENT SUFFIX<sup>?</sup>
200209
>
201210
> RAW_STRING_CONTENT :\
202211
> &nbsp;&nbsp; &nbsp;&nbsp; `"` ( ~ _IsolatedCR_ )<sup>* (non-greedy)</sup> `"`\
@@ -233,7 +242,7 @@ r##"foo #"# bar"##; // foo #"# bar
233242

234243
> **<sup>Lexer</sup>**\
235244
> BYTE_LITERAL :\
236-
> &nbsp;&nbsp; `b'` ( ASCII_FOR_CHAR | BYTE_ESCAPE ) `'`
245+
> &nbsp;&nbsp; `b'` ( ASCII_FOR_CHAR | BYTE_ESCAPE ) `'` SUFFIX<sup>?</sup>
237246
>
238247
> ASCII_FOR_CHAR :\
239248
> &nbsp;&nbsp; _any ASCII (i.e. 0x00 to 0x7F), except_ `'`, `\`, \\n, \\r or \\t
@@ -253,7 +262,7 @@ _number literal_.
253262

254263
> **<sup>Lexer</sup>**\
255264
> BYTE_STRING_LITERAL :\
256-
> &nbsp;&nbsp; `b"` ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )<sup>\*</sup> `"`
265+
> &nbsp;&nbsp; `b"` ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )<sup>\*</sup> `"` SUFFIX<sup>?</sup>
257266
>
258267
> ASCII_FOR_STRING :\
259268
> &nbsp;&nbsp; _any ASCII (i.e 0x00 to 0x7F), except_ `"`, `\` _and IsolatedCR_
@@ -284,7 +293,7 @@ following forms:
284293

285294
> **<sup>Lexer</sup>**\
286295
> RAW_BYTE_STRING_LITERAL :\
287-
> &nbsp;&nbsp; `br` RAW_BYTE_STRING_CONTENT
296+
> &nbsp;&nbsp; `br` RAW_BYTE_STRING_CONTENT SUFFIX<sup>?</sup>
288297
>
289298
> RAW_BYTE_STRING_CONTENT :\
290299
> &nbsp;&nbsp; &nbsp;&nbsp; `"` ASCII<sup>* (non-greedy)</sup> `"`\
@@ -329,7 +338,7 @@ literal_. The grammar for recognizing the two kinds of literals is mixed.
329338
> **<sup>Lexer</sup>**\
330339
> INTEGER_LITERAL :\
331340
> &nbsp;&nbsp; ( DEC_LITERAL | BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )
332-
> INTEGER_SUFFIX<sup>?</sup>
341+
> SUFFIX_NO_E<sup>?</sup>
333342
>
334343
> DEC_LITERAL :\
335344
> &nbsp;&nbsp; DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup>
@@ -350,10 +359,6 @@ literal_. The grammar for recognizing the two kinds of literals is mixed.
350359
> DEC_DIGIT : \[`0`-`9`]
351360
>
352361
> HEX_DIGIT : \[`0`-`9` `a`-`f` `A`-`F`]
353-
>
354-
> INTEGER_SUFFIX :\
355-
> &nbsp;&nbsp; &nbsp;&nbsp; `u8` | `u16` | `u32` | `u64` | `u128` | `usize`\
356-
> &nbsp;&nbsp; | `i8` | `i16` | `i32` | `i64` | `i128` | `isize`
357362
358363
An _integer literal_ has one of four forms:
359364

@@ -369,11 +374,11 @@ An _integer literal_ has one of four forms:
369374
(`0b`) and continues as any mixture (with at least one digit) of binary digits
370375
and underscores.
371376

372-
Like any literal, an integer literal may be followed (immediately, without any spaces) by an _integer suffix_, which must be the name of one of the [primitive integer types][numeric types]:
373-
`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`.
377+
Like any literal, an integer literal may be followed (immediately, without any spaces) by a suffix as described above.
378+
The suffix may not begin with `e` or `E`, as that would be interpreted as the exponent of a floating-point literal.
374379
See [literal expressions] for the effect of these suffixes.
375380

376-
Examples of integer literals of various forms:
381+
Examples of integer literals which are accepted as literal expressions:
377382

378383
```rust
379384
# #![allow(overflowing_literals)]
@@ -396,27 +401,27 @@ Examples of integer literals of various forms:
396401

397402
0usize;
398403

399-
// These are too big for their type, but are still valid tokens
400-
404+
// These are too big for their type, but are accepted as literal expressions.
401405
128_i8;
402406
256_u8;
403407

408+
// This is an integer literal, accepted as a floating-point literal expression.
409+
5f32;
404410
```
405411

406412
Note that `-1i8`, for example, is analyzed as two tokens: `-` followed by `1i8`.
407413

408-
Examples of invalid integer literals:
409414

410-
```rust,compile_fail
411-
// uses numbers of the wrong base
415+
Examples of integer literals which are not accepted as literal expressions:
412416

413-
0b0102;
414-
0o0581;
415-
416-
// bin, hex, and octal literals must have at least one digit
417-
418-
0b_;
419-
0b____;
417+
```rust
418+
# #[cfg(FALSE)] {
419+
0invalidSuffix;
420+
123AFB43;
421+
0b010a;
422+
0xAB_CD_EF_GH;
423+
0b1111_f32;
424+
# }
420425
```
421426

422427
#### Tuple index
@@ -442,48 +447,41 @@ let cat = example.01; // ERROR no field named `01`
442447
let horse = example.0b10; // ERROR no field named `0b10`
443448
```
444449

445-
> **Note**: The tuple index may include an `INTEGER_SUFFIX`, but this is not
446-
> intended to be valid, and may be removed in a future version. See
447-
> <https://github.com/rust-lang/rust/issues/60210> for more information.
450+
> **Note**: Tuple indices may include certain suffixes, but this is not intended to be valid, and may be removed in a future version.
451+
> See <https://github.com/rust-lang/rust/issues/60210> for more information.
448452
449453
#### Floating-point literals
450454

451455
> **<sup>Lexer</sup>**\
452456
> FLOAT_LITERAL :\
453457
> &nbsp;&nbsp; &nbsp;&nbsp; DEC_LITERAL `.`
454458
> _(not immediately followed by `.`, `_` or an XID_Start character)_\
455-
> &nbsp;&nbsp; | DEC_LITERAL FLOAT_EXPONENT\
456-
> &nbsp;&nbsp; | DEC_LITERAL `.` DEC_LITERAL FLOAT_EXPONENT<sup>?</sup>\
457-
> &nbsp;&nbsp; | DEC_LITERAL (`.` DEC_LITERAL)<sup>?</sup>
458-
> FLOAT_EXPONENT<sup>?</sup> FLOAT_SUFFIX
459+
> &nbsp;&nbsp; | DEC_LITERAL `.` DEC_LITERAL SUFFIX_NO_E<sup>?</sup>\
460+
> &nbsp;&nbsp; | DEC_LITERAL (`.` DEC_LITERAL)<sup>?</sup> FLOAT_EXPONENT SUFFIX<sup>?</sup>\
459461
>
460462
> FLOAT_EXPONENT :\
461463
> &nbsp;&nbsp; (`e`|`E`) (`+`|`-`)<sup>?</sup>
462464
> (DEC_DIGIT|`_`)<sup>\*</sup> DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup>
463465
>
464-
> FLOAT_SUFFIX :\
465-
> &nbsp;&nbsp; `f32` | `f64`
466466
467-
A _floating-point literal_ has one of three forms:
467+
A _floating-point literal_ has one of two forms:
468468

469469
* A _decimal literal_ followed by a period character `U+002E` (`.`). This is
470470
optionally followed by another decimal literal, with an optional _exponent_.
471471
* A single _decimal literal_ followed by an _exponent_.
472-
* A single _decimal literal_ (in which case a suffix is required).
473472

474473
Like integer literals, a floating-point literal may be followed by a
475474
suffix, so long as the pre-suffix part does not end with `U+002E` (`.`).
476-
There are two valid _floating-point suffixes_: `f32` and `f64` (the names of the 32-bit and 64-bit [primitive floating-point types][floating-point types]).
475+
The suffix may not begin with `e` or `E` if the literal does not include an exponent.
477476
See [literal expressions] for the effect of these suffixes.
478477

479-
Examples of floating-point literals of various forms:
478+
Examples of floating-point literals which are accepted as literal expressions:
480479

481480
```rust
482481
123.0f64;
483482
0.1f64;
484483
0.1f32;
485484
12E+99_f64;
486-
5f32;
487485
let x: f64 = 2.;
488486
```
489487

@@ -493,39 +491,16 @@ to call a method named `f64` on `2`.
493491

494492
Note that `-1.0`, for example, is analyzed as two tokens: `-` followed by `1.0`.
495493

496-
#### Number pseudoliterals
497-
498-
> **<sup>Lexer</sup>**\
499-
> NUMBER_PSEUDOLITERAL :\
500-
> &nbsp;&nbsp; &nbsp;&nbsp; DEC_LITERAL ( . DEC_LITERAL )<sup>?</sup> FLOAT_EXPONENT\
501-
> &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; ( NUMBER_PSEUDOLITERAL_SUFFIX | INTEGER_SUFFIX )\
502-
> &nbsp;&nbsp; | DEC_LITERAL . DEC_LITERAL\
503-
> &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | INTEGER SUFFIX )\
504-
> &nbsp;&nbsp; | DEC_LITERAL NUMBER_PSEUDOLITERAL_SUFFIX_NO_E\
505-
> &nbsp;&nbsp; | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )\
506-
> &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | FLOAT_SUFFIX )
507-
>
508-
> NUMBER_PSEUDOLITERAL_SUFFIX :\
509-
> &nbsp;&nbsp; IDENTIFIER_OR_KEYWORD <sub>_not matching INTEGER_SUFFIX or FLOAT_SUFFIX_</sub>
510-
>
511-
> NUMBER_PSEUDOLITERAL_SUFFIX_NO_E :\
512-
> &nbsp;&nbsp; NUMBER_PSEUDOLITERAL_SUFFIX <sub>_not beginning with `e` or `E`_</sub>
513-
514-
Tokenization of numeric literals allows arbitrary suffixes as described in the grammar above.
515-
These values generate valid tokens, but are not valid [literal expressions], so are usually an error except as macro arguments.
494+
Examples of floating-point literals which are not accepted as literal expressions:
516495

517-
Examples of such tokens:
518-
```rust,compile_fail
519-
0invalidSuffix;
520-
123AFB43;
521-
0b010a;
522-
0xAB_CD_EF_GH;
496+
```rust
497+
# #[cfg(FALSE)] {
523498
2.0f80;
524499
2e5f80;
525500
2e5e6;
526501
2.0e5e6;
527502
1.3e10u64;
528-
0b1111_f32;
503+
# }
529504
```
530505

531506
#### Reserved forms similar to number literals
@@ -536,7 +511,7 @@ Examples of such tokens:
536511
> &nbsp;&nbsp; | OCT_LITERAL \[`8`-`9`&ZeroWidthSpace;]\
537512
> &nbsp;&nbsp; | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) `.` \
538513
> &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; _(not immediately followed by `.`, `_` or an XID_Start character)_\
539-
> &nbsp;&nbsp; | ( BIN_LITERAL | OCT_LITERAL ) `e`\
514+
> &nbsp;&nbsp; | ( BIN_LITERAL | OCT_LITERAL ) (`e`|`E`)\
540515
> &nbsp;&nbsp; | `0b` `_`<sup>\*</sup> _end of input or not BIN_DIGIT_\
541516
> &nbsp;&nbsp; | `0o` `_`<sup>\*</sup> _end of input or not OCT_DIGIT_\
542517
> &nbsp;&nbsp; | `0x` `_`<sup>\*</sup> _end of input or not HEX_DIGIT_\
@@ -549,7 +524,7 @@ Due to the possible ambiguity these raise, they are rejected by the tokenizer in
549524

550525
* An unsuffixed binary, octal, or hexadecimal literal followed, without intervening whitespace, by a period character (with the same restrictions on what follows the period as for floating-point literals).
551526

552-
* An unsuffixed binary or octal literal followed, without intervening whitespace, by the character `e`.
527+
* An unsuffixed binary or octal literal followed, without intervening whitespace, by the character `e` or `E`.
553528

554529
* Input which begins with one of the radix prefixes but is not a valid binary, octal, or hexadecimal literal (because it contains no digits).
555530

@@ -561,13 +536,13 @@ Examples of reserved forms:
561536
0b0102; // this is not `0b010` followed by `2`
562537
0o1279; // this is not `0o127` followed by `9`
563538
0x80.0; // this is not `0x80` followed by `.` and `0`
564-
0b101e; // this is not a pseudoliteral, or `0b101` followed by `e`
565-
0b; // this is not a pseudoliteral, or `0` followed by `b`
566-
0b_; // this is not a pseudoliteral, or `0` followed by `b_`
567-
2e; // this is not a pseudoliteral, or `2` followed by `e`
568-
2.0e; // this is not a pseudoliteral, or `2.0` followed by `e`
569-
2em; // this is not a pseudoliteral, or `2` followed by `em`
570-
2.0em; // this is not a pseudoliteral, or `2.0` followed by `em`
539+
0b101e; // this is not a suffixed literal, or `0b101` followed by `e`
540+
0b; // this is not an integer literal, or `0` followed by `b`
541+
0b_; // this is not an integer literal, or `0` followed by `b_`
542+
2e; // this is not a floating-point literal, or `2` followed by `e`
543+
2.0e; // this is not a floating-point literal, or `2.0` followed by `e`
544+
2em; // this is not a suffixed literal, or `2` followed by `em`
545+
2.0em; // this is not a suffixed literal, or `2.0` followed by `em`
571546
```
572547

573548
## Lifetimes and loop labels

0 commit comments

Comments
 (0)