Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EXPERIMENT] Disallow all literal suffixes except the standard numeric ones #103872

Conversation

nnethercote
Copy link
Contributor

Partly out of curiosity, and partly because this would significantly simplify parts of the lexer and parser.

r? @ghost

…c ones.

Partly out of curiosity, and partly because this would significantly
simplify parts of the lexer and parser.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 2, 2022
@nnethercote
Copy link
Contributor Author

@bors try

@bors
Copy link
Contributor

bors commented Nov 2, 2022

⌛ Trying commit 1d0b161 with merge 8330aa5198c56eb493d06cd9c4cc91d00d70e3b6...

@nnethercote
Copy link
Contributor Author

The code is very rough, but should be good enough for the experiment.

@bors
Copy link
Contributor

bors commented Nov 2, 2022

☀️ Try build successful - checks-actions
Build commit: 8330aa5198c56eb493d06cd9c4cc91d00d70e3b6 (8330aa5198c56eb493d06cd9c4cc91d00d70e3b6)

@nnethercote
Copy link
Contributor Author

@craterbot run mode=check-only

@craterbot
Copy link
Collaborator

👌 Experiment pr-103872 created and queued.
🤖 Automatically detected try build 8330aa5198c56eb493d06cd9c4cc91d00d70e3b6
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 2, 2022
@craterbot
Copy link
Collaborator

🚧 Experiment pr-103872 is now running

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added the S-waiting-on-crater Status: Waiting on a crater run to be completed. label Nov 2, 2022
@craterbot
Copy link
Collaborator

🎉 Experiment pr-103872 is completed!
📊 253 regressed and 11 fixed (247083 total)
📰 Open the full report.

⚠️ If you notice any spurious failure please add them to the blacklist!
ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Nov 3, 2022
@nnethercote
Copy link
Contributor Author

Plenty of imaginative suffix use in macro-based DSLs. So disallowing them is a non-starter.

@nnethercote nnethercote closed this Nov 3, 2022
@nnethercote
Copy link
Contributor Author

nnethercote commented Nov 5, 2022

To summarize the crater results: dozens of crates use literal suffixes in all sorts of interesting ways. Here are some examples from many (but not all) of the crates that were broken by the crater run disallowing arbitrary suffixes:

Use cases:

  • Lots of custom units
  • Some coordinates
  • Some custom numeric suffixes, like _U256
  • Emulating syntax of others languages/formats, like C, HTML, CSS, UUIDs, timestamps
  • A few bizarre cases that possibly aren't doing what the author thinks they're doing

Correctly handling some of these on the macro side must be quite the task.

Note that arbitrary literal suffixes go back to #19103 RFC 463, where the idea was introduced to "futureproof" Rust for the possibility of fancier suffixes, e.g. for different kinds of literals. One of the unresolved questions in that RFC was:

Should it be the parser or the tokenizer rejecting invalid suffixes? This is effectively asking if it is legal for syntax extensions to be passed the raw literals? That is, can a foo procedural syntax extension accept and handle literals like foo!(1u2)?

The answer chosen by the implementation was "the parser", which allowed arbitrary suffixes to be used as macro inputs. This arguably broke the futureproofing. Can new suffixes can be reasonably added, given that existing macros can (and do) effectively define their own? I'm honestly not sure.

nnethercote added a commit to nnethercote/rust that referenced this pull request May 16, 2023
Integers with arbitrary suffixes are allowed as inputs to proc macros. A
number of real-world crates use this capability in interesting ways, as
seen in rust-lang#103872. For example:
- Suffixes representing units, such as `8bits`, `100px`, `20ns`, `30GB`
- CSS hex colours such as `#7CFC00` (LawnGreen)
- UUIDs, e.g. `785ada2c-f2d0-11fd-3839-b3104db0cb68`

The hex cases may be surprising.
- `#7CFC00` is tokenized as a `#` followed by a `7` integer with a
  `CFC00` suffix.
- `785ada2c` is tokenized as a `785` integer with an `ada2c` suffix.
- `f2d0` is tokenized as an identifier.
- `3839` is tokenized as an integer literal.

A proc macro will immediately stringify such tokens and reparse them
itself, and so won't care that the token types vary. All suffixes must
be consumed by the proc macro, of course; the only suffixes allowed
after macro expansion are the numeric ones like `u8`, `i32`, and `f64`.

Currently there is an annoying inconsistency in how integer literal
suffixes are handled, which is that no suffix starting with `e` is
allowed, because that it interpreted as a float literal with an
exponent. For example:
- Units: `1eV` and `1em`
- CSS colours: `#90EE90` (LightGreen)
- UUIDs: `785ada2c-f2d0-11ed-3839-b3104db0cb68`

In each case, a sequence of digits followed by an 'e' or 'E' followed by
a letter results in an "expected at least one digit in exponent" error.
This is an annoying inconsistency in general, and a problem in practice.
It's likely that some users haven't realized this inconsistency because
they've gotten lucky and never used a token with an 'e' that causes
problems. Other users *have* noticed; it's causing problems when
embedding DSLs into proc macros, as seen in rust-lang#111615, where the CSS
colours case is causing problems for two different UI frameworks (Slint
and Makepad).

We can do better. This commit changes the lexer so that, when it hits a
possible exponent, it looks ahead and only produces an exponent if a
valid one is present. Otherwise, it produces a non-exponent form, which
may be a single token (e.g. `1eV`) or multiple tokens (e.g. `1e+a`).

Consequences of this:
- All the proc macro problem cases mentioned above are fixed.
- The "expected at least one digit in exponent" error is no longer
  possible. A few tests that only worked in the presence of that error
  have been removed.
- The lexer requires unbounded lookahead due to the presence of '_'
  chars in exponents. E.g. to distinguish `1e+_______3` (a float literal
  with exponent) from `1e+_______a` (previously invalid, but now the
  tokenised as `1e`, `+`, `_______a`).

This is a backwards compatible language change: all existing valid
programs will be treated in the same way, and some previously invalid
programs will become valid. The tokens chapter of the language reference
(https://doc.rust-lang.org/reference/tokens.html) will need changing to
account for this. In particular, the "Reserved forms similar to number
literals" section will need updating, and grammar rules involving the
SUFFIX_NO_E nonterminal will need adjusting.

Fixes rust-lang#111615.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants