Skip to content

Commit ccde77e

Browse files
committed
Numeric literals: say that the parser is now more lenient
This reflects the changes made in in rust-lang/rust#102944 . Previously, unknown suffixes were rejected by the parser. Now they are accepted by the parser and rejected at a later stage. Similarly, integer literals too large to fit in u128 are now accepted by the parser. Forms like 5f32 are now INTEGER_LITERAL rather than FLOAT_LITERAL. The notion of a 'pseudoliteral' is no longer required.
1 parent b333c25 commit ccde77e

File tree

2 files changed

+64
-75
lines changed

2 files changed

+64
-75
lines changed

src/expressions/literal-expr.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,9 @@
88
>    | [BYTE_LITERAL]\
99
>    | [BYTE_STRING_LITERAL]\
1010
>    | [RAW_BYTE_STRING_LITERAL]\
11-
>    | [INTEGER_LITERAL][^out-of-range]\
11+
>    | [INTEGER_LITERAL]\
1212
>    | [FLOAT_LITERAL]\
1313
>    | `true` | `false`
14-
>
15-
> [^out-of-range]: A value ≥ 2<sup>128</sup> is not allowed.
1614
1715
A _literal expression_ is an expression consisting of a single token, rather than a sequence of tokens, that immediately and directly denotes the value it evaluates to, rather than referring to it by name or some other evaluation rule.
1816

@@ -54,7 +52,7 @@ A string literal expression consists of a single [BYTE_STRING_LITERAL] or [RAW_B
5452

5553
An integer literal expression consists of a single [INTEGER_LITERAL] token.
5654

57-
If the token has a [suffix], the suffix will be the name of one of the [primitive integer types][numeric types]: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`, and the expression has that type.
55+
If the token has a [suffix], the suffix must be the name of one of the [primitive integer types][numeric types]: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`, and the expression has that type.
5856

5957
If the token has no suffix, the expression's type is determined by type inference:
6058

@@ -101,7 +99,7 @@ The value of the expression is determined from the string representation of the
10199
* Any underscores are removed from the string.
102100

103101
* The string is converted to a `u128` value as if by [`u128::from_str_radix`] with the chosen radix.
104-
If the value does not fit in `u128`, the expression is rejected by the parser.
102+
If the value does not fit in `u128`, it is a compiler error.
105103

106104
* The `u128` value is converted to the expression's type via a [numeric cast].
107105

@@ -113,9 +111,11 @@ If the value does not fit in `u128`, the expression is rejected by the parser.
113111
114112
## Floating-point literal expressions
115113

116-
A floating-point literal expression consists of a single [FLOAT_LITERAL] token.
114+
A floating-point literal expression has one of two forms:
115+
* a single [FLOAT_LITERAL] token
116+
* a single [INTEGER_LITERAL] token which has a suffix and no radix indicator
117117

118-
If the token has a [suffix], the suffix will be the name of one of the [primitive floating-point types][floating-point types]: `f32` or `f64`, and the expression has that type.
118+
If the token has a [suffix], the suffix must be the name of one of the [primitive floating-point types][floating-point types]: `f32` or `f64`, and the expression has that type.
119119

120120
If the token has no suffix, the expression's type is determined by type inference:
121121

src/tokens.md

+57-68
Original file line numberDiff line numberDiff line change
@@ -72,31 +72,40 @@ Literals are tokens used in [literal expressions].
7272

7373
#### Numbers
7474

75-
| [Number literals](#number-literals)`*` | Example | Exponentiation | Suffixes |
76-
|----------------------------------------|---------|----------------|----------|
77-
| Decimal integer | `98_222` | `N/A` | Integer suffixes |
78-
| Hex integer | `0xff` | `N/A` | Integer suffixes |
79-
| Octal integer | `0o77` | `N/A` | Integer suffixes |
80-
| Binary integer | `0b1111_0000` | `N/A` | Integer suffixes |
81-
| Floating-point | `123.0E+77` | `Optional` | Floating-point suffixes |
75+
| [Number literals](#number-literals)`*` | Example | Exponentiation |
76+
|----------------------------------------|---------|----------------|
77+
| Decimal integer | `98_222` | `N/A` |
78+
| Hex integer | `0xff` | `N/A` |
79+
| Octal integer | `0o77` | `N/A` |
80+
| Binary integer | `0b1111_0000` | `N/A` |
81+
| Floating-point | `123.0E+77` | `Optional` |
8282

8383
`*` All number literals allow `_` as a visual separator: `1_234.0E+18f64`
8484

8585
#### Suffixes
8686

8787
A suffix is a sequence of characters following the primary part of a literal (without intervening whitespace), of the same form as a non-raw identifier or keyword.
8888

89-
Any kind of literal (string, integer, etc) with any suffix is valid as a token,
90-
and can be passed to a macro without producing an error.
89+
90+
> **<sup>Lexer</sup>**\
91+
> SUFFIX : IDENTIFIER_OR_KEYWORD\
92+
> SUFFIX_NO_E : SUFFIX <sub>_not beginning with `e` or `E`_</sub>
93+
94+
Any kind of literal (string, integer, etc) with any suffix is valid as a token.
95+
96+
A literal token with any suffix can be passed to a macro without producing an error.
9197
The macro itself will decide how to interpret such a token and whether to produce an error or not.
98+
In particular, the `literal` fragment specifier for by-example macros matches literal tokens with arbitrary suffixes.
9299

93100
```rust
94101
macro_rules! blackhole { ($tt:tt) => () }
102+
macro_rules! blackhole_lit { ($l:literal) => () }
95103

96104
blackhole!("string"suffix); // OK
105+
blackhole_lit!(1suffix); // OK
97106
```
98107

99-
However, suffixes on literal tokens parsed as Rust code are restricted.
108+
However, suffixes on literal tokens which are interpreted as literal expressions or patterns are restricted.
100109
Any suffixes are rejected on non-numeric literal tokens,
101110
and numeric literal tokens are accepted only with suffixes from the list below.
102111

@@ -329,7 +338,7 @@ literal_. The grammar for recognizing the two kinds of literals is mixed.
329338
> **<sup>Lexer</sup>**\
330339
> INTEGER_LITERAL :\
331340
> &nbsp;&nbsp; ( DEC_LITERAL | BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )
332-
> INTEGER_SUFFIX<sup>?</sup>
341+
> SUFFIX_NO_E<sup>?</sup>
333342
>
334343
> DEC_LITERAL :\
335344
> &nbsp;&nbsp; DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup>
@@ -350,10 +359,6 @@ literal_. The grammar for recognizing the two kinds of literals is mixed.
350359
> DEC_DIGIT : \[`0`-`9`]
351360
>
352361
> HEX_DIGIT : \[`0`-`9` `a`-`f` `A`-`F`]
353-
>
354-
> INTEGER_SUFFIX :\
355-
> &nbsp;&nbsp; &nbsp;&nbsp; `u8` | `u16` | `u32` | `u64` | `u128` | `usize`\
356-
> &nbsp;&nbsp; | `i8` | `i16` | `i32` | `i64` | `i128` | `isize`
357362
358363
An _integer literal_ has one of four forms:
359364

@@ -369,11 +374,11 @@ An _integer literal_ has one of four forms:
369374
(`0b`) and continues as any mixture (with at least one digit) of binary digits
370375
and underscores.
371376

372-
Like any literal, an integer literal may be followed (immediately, without any spaces) by an _integer suffix_, which must be the name of one of the [primitive integer types][numeric types]:
373-
`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`.
377+
Like any literal, an integer literal may be followed (immediately, without any spaces) by a suffix as described above.
378+
The suffix may not begin with `e` or `E`, as that would be interpreted as the exponent of a floating-point literal.
374379
See [literal expressions] for the effect of these suffixes.
375380

376-
Examples of integer literals of various forms:
381+
Examples of integer literals which are accepted as literal expressions:
377382

378383
```rust
379384
# #![allow(overflowing_literals)]
@@ -396,15 +401,29 @@ Examples of integer literals of various forms:
396401

397402
0usize;
398403

399-
// These are too big for their type, but are still valid tokens
400-
404+
// These are too big for their type, but are accepted as literal expressions.
401405
128_i8;
402406
256_u8;
403407

408+
// This is an integer literal, accepted as a floating-point literal expression.
409+
5f32;
404410
```
405411

406412
Note that `-1i8`, for example, is analyzed as two tokens: `-` followed by `1i8`.
407413

414+
415+
Examples of integer literals which are not accepted as literal expressions:
416+
417+
```rust
418+
# #[cfg(FALSE)] {
419+
0invalidSuffix;
420+
123AFB43;
421+
0b010a;
422+
0xAB_CD_EF_GH;
423+
0b1111_f32;
424+
# }
425+
```
426+
408427
#### Tuple index
409428

410429
> **<sup>Lexer</sup>**\
@@ -428,48 +447,41 @@ let cat = example.01; // ERROR no field named `01`
428447
let horse = example.0b10; // ERROR no field named `0b10`
429448
```
430449

431-
> **Note**: The tuple index may include an `INTEGER_SUFFIX`, but this is not
432-
> intended to be valid, and may be removed in a future version. See
433-
> <https://github.com/rust-lang/rust/issues/60210> for more information.
450+
> **Note**: Tuple indices may include certain suffixes, but this is not intended to be valid, and may be removed in a future version.
451+
> See <https://github.com/rust-lang/rust/issues/60210> for more information.
434452
435453
#### Floating-point literals
436454

437455
> **<sup>Lexer</sup>**\
438456
> FLOAT_LITERAL :\
439457
> &nbsp;&nbsp; &nbsp;&nbsp; DEC_LITERAL `.`
440458
> _(not immediately followed by `.`, `_` or an XID_Start character)_\
441-
> &nbsp;&nbsp; | DEC_LITERAL FLOAT_EXPONENT\
442-
> &nbsp;&nbsp; | DEC_LITERAL `.` DEC_LITERAL FLOAT_EXPONENT<sup>?</sup>\
443-
> &nbsp;&nbsp; | DEC_LITERAL (`.` DEC_LITERAL)<sup>?</sup>
444-
> FLOAT_EXPONENT<sup>?</sup> FLOAT_SUFFIX
459+
> &nbsp;&nbsp; | DEC_LITERAL `.` DEC_LITERAL SUFFIX_NO_E<sup>?</sup>\
460+
> &nbsp;&nbsp; | DEC_LITERAL (`.` DEC_LITERAL)<sup>?</sup> FLOAT_EXPONENT SUFFIX<sup>?</sup>\
445461
>
446462
> FLOAT_EXPONENT :\
447463
> &nbsp;&nbsp; (`e`|`E`) (`+`|`-`)<sup>?</sup>
448464
> (DEC_DIGIT|`_`)<sup>\*</sup> DEC_DIGIT (DEC_DIGIT|`_`)<sup>\*</sup>
449465
>
450-
> FLOAT_SUFFIX :\
451-
> &nbsp;&nbsp; `f32` | `f64`
452466
453-
A _floating-point literal_ has one of three forms:
467+
A _floating-point literal_ has one of two forms:
454468

455469
* A _decimal literal_ followed by a period character `U+002E` (`.`). This is
456470
optionally followed by another decimal literal, with an optional _exponent_.
457471
* A single _decimal literal_ followed by an _exponent_.
458-
* A single _decimal literal_ (in which case a suffix is required).
459472

460473
Like integer literals, a floating-point literal may be followed by a
461474
suffix, so long as the pre-suffix part does not end with `U+002E` (`.`).
462-
There are two valid _floating-point suffixes_: `f32` and `f64` (the names of the 32-bit and 64-bit [primitive floating-point types][floating-point types]).
475+
The suffix may not begin with `e` or `E` if the literal does not include an exponent.
463476
See [literal expressions] for the effect of these suffixes.
464477

465-
Examples of floating-point literals of various forms:
478+
Examples of floating-point literals which are accepted as literal expressions:
466479

467480
```rust
468481
123.0f64;
469482
0.1f64;
470483
0.1f32;
471484
12E+99_f64;
472-
5f32;
473485
let x: f64 = 2.;
474486
```
475487

@@ -479,39 +491,16 @@ to call a method named `f64` on `2`.
479491

480492
Note that `-1.0`, for example, is analyzed as two tokens: `-` followed by `1.0`.
481493

482-
#### Number pseudoliterals
483-
484-
> **<sup>Lexer</sup>**\
485-
> NUMBER_PSEUDOLITERAL :\
486-
> &nbsp;&nbsp; &nbsp;&nbsp; DEC_LITERAL ( . DEC_LITERAL )<sup>?</sup> FLOAT_EXPONENT\
487-
> &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; ( NUMBER_PSEUDOLITERAL_SUFFIX | INTEGER_SUFFIX )\
488-
> &nbsp;&nbsp; | DEC_LITERAL . DEC_LITERAL\
489-
> &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | INTEGER SUFFIX )\
490-
> &nbsp;&nbsp; | DEC_LITERAL NUMBER_PSEUDOLITERAL_SUFFIX_NO_E\
491-
> &nbsp;&nbsp; | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )\
492-
> &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | FLOAT_SUFFIX )
493-
>
494-
> NUMBER_PSEUDOLITERAL_SUFFIX :\
495-
> &nbsp;&nbsp; IDENTIFIER_OR_KEYWORD <sub>_not matching INTEGER_SUFFIX or FLOAT_SUFFIX_</sub>
496-
>
497-
> NUMBER_PSEUDOLITERAL_SUFFIX_NO_E :\
498-
> &nbsp;&nbsp; NUMBER_PSEUDOLITERAL_SUFFIX <sub>_not beginning with `e` or `E`_</sub>
499-
500-
Tokenization of numeric literals allows arbitrary suffixes as described in the grammar above.
501-
These values generate valid tokens, but are not valid [literal expressions], so are usually an error except as macro arguments.
494+
Examples of floating-point literals which are not accepted as literal expressions:
502495

503-
Examples of such tokens:
504-
```rust,compile_fail
505-
0invalidSuffix;
506-
123AFB43;
507-
0b010a;
508-
0xAB_CD_EF_GH;
496+
```rust
497+
# #[cfg(FALSE)] {
509498
2.0f80;
510499
2e5f80;
511500
2e5e6;
512501
2.0e5e6;
513502
1.3e10u64;
514-
0b1111_f32;
503+
# }
515504
```
516505

517506
#### Reserved forms similar to number literals
@@ -547,13 +536,13 @@ Examples of reserved forms:
547536
0b0102; // this is not `0b010` followed by `2`
548537
0o1279; // this is not `0o127` followed by `9`
549538
0x80.0; // this is not `0x80` followed by `.` and `0`
550-
0b101e; // this is not a pseudoliteral, or `0b101` followed by `e`
551-
0b; // this is not a pseudoliteral, or `0` followed by `b`
552-
0b_; // this is not a pseudoliteral, or `0` followed by `b_`
553-
2e; // this is not a pseudoliteral, or `2` followed by `e`
554-
2.0e; // this is not a pseudoliteral, or `2.0` followed by `e`
555-
2em; // this is not a pseudoliteral, or `2` followed by `em`
556-
2.0em; // this is not a pseudoliteral, or `2.0` followed by `em`
539+
0b101e; // this is not a suffixed literal, or `0b101` followed by `e`
540+
0b; // this is not an integer literal, or `0` followed by `b`
541+
0b_; // this is not an integer literal, or `0` followed by `b_`
542+
2e; // this is not a floating-point literal, or `2` followed by `e`
543+
2.0e; // this is not a floating-point literal, or `2.0` followed by `e`
544+
2em; // this is not a suffixed literal, or `2` followed by `em`
545+
2.0em; // this is not a suffixed literal, or `2.0` followed by `em`
557546
```
558547

559548
## Lifetimes and loop labels

0 commit comments

Comments
 (0)