Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow numeric tokens containing 'e' that aren't exponents be passed to proc macros #111615

Open
ogoffart opened this issue May 15, 2023 · 6 comments
Assignees
Labels
A-grammar Area: The grammar of Rust A-proc-macros Area: Procedural macros C-feature-request Category: A feature request, i.e: not implemented / a PR. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@ogoffart
Copy link
Contributor

ogoffart commented May 15, 2023

I tried this code:

// This could also be a complex proc macro
macro_rules! print_token {
    ($x:tt) => { println!("{}", stringify!($x)) }
}

fn main() {
    print_token!(123aefg123);  // ok, unknown suffix
    print_token!(123e123); // ok: float literal
    
    print_token!(123ef3); // error:  expected at least one digit in exponent
}

I expected to see this happen: It should compile in print the token verbatim. If unkown/invalid suffix are OK, so should invalid float, because the macro DSL my use it for something else

Instead, this happened: Compilation error :

error: expected at least one digit in exponent
  --> src/main.rs:11:18
   |
11 |     print_token!(123ef3); // error:  expected at least one digit in exponent
   |                  ^^^^^^

(tested with Rust 1.69 as well as current nightly, this always has been a problem)

Context

This was discussed with @nnethercote and others last week and I wanted to make sure there is a track of this.

The usecase is so that Slint's color literal work in the slint! maco DSL.
Currently one can do color: #0e9 but not color:#0ea in the slint! macro. And Makepad DSL has to workaround the same issue.

@ogoffart ogoffart added the C-bug Category: This is a bug. label May 15, 2023
@makepaddev
Copy link

Chiming in that this is also absolutely a problem for our project (github.com/makepad/makepad) and would really appreciate a fix!. All proc-macro embedded UI systems for Rust need this.

@chenyukang
Copy link
Member

This error comes out at next_token, which is too early.
So it's a general issue, here is another case I read from code:

macro_rules! print_token {
    ($x:tt) => { println!("{}", stringify!($x)) }
}

fn main() {
    print_token!("hello"_);
}

Output:

error: underscore literal suffix is not allowed
  --> ./p/mac.rs:11:25
   |
11 |     print_token!("hello"_);
   |                         ^

Seems not easy to fix, we need to change to report this kind of error after macro expanding.

@nnethercote nnethercote self-assigned this May 15, 2023
@nnethercote nnethercote changed the title Invalid floating point literal token causes error before macro invocation Allow numeric tokens containing 'e' that aren't exponents be passed to proc macros May 15, 2023
@nnethercote
Copy link
Contributor

I have a partial implementation that works for most literals involving an 'e', including CSS colours of the form #xxyyzz. I need to extend it a bit more to properly distinguish cases like 1e+_______3 (a valid float literal) and 1e+_______a (not a valid float literal, and currently rejected by the lexer).

From an implementation point of view it shouldn't be hard, but the change will need sign-off from the lang team. The relevant section of the Reference will need changing; in particular the Reserved forms similar to literals section as well as the SUFFIX_NO_E nonterminal in the grammar. Working through all this will take some time.

@chenyukang "hello"_ is a different issue to the one facing @ogoffart and @makepaddev. _ is not accepted as a suffix because a suffix must be a valid identifier or keyword, and _ is not. Let's keep this issue about numeric tokens with 'e' in them.

@nnethercote
Copy link
Contributor

#111628 has an draft implementation. Zulip discussion is here.

nnethercote added a commit to nnethercote/rust that referenced this issue May 16, 2023
Integers with arbitrary suffixes are allowed as inputs to proc macros. A
number of real-world crates use this capability in interesting ways, as
seen in rust-lang#103872. For example:
- Suffixes representing units, such as `8bits`, `100px`, `20ns`, `30GB`
- CSS hex colours such as `#7CFC00` (LawnGreen)
- UUIDs, e.g. `785ada2c-f2d0-11fd-3839-b3104db0cb68`

The hex cases may be surprising.
- `#7CFC00` is tokenized as a `#` followed by a `7` integer with a
  `CFC00` suffix.
- `785ada2c` is tokenized as a `785` integer with an `ada2c` suffix.
- `f2d0` is tokenized as an identifier.
- `3839` is tokenized as an integer literal.

A proc macro will immediately stringify such tokens and reparse them
itself, and so won't care that the token types vary. All suffixes must
be consumed by the proc macro, of course; the only suffixes allowed
after macro expansion are the numeric ones like `u8`, `i32`, and `f64`.

Currently there is an annoying inconsistency in how integer literal
suffixes are handled, which is that no suffix starting with `e` is
allowed, because that it interpreted as a float literal with an
exponent. For example:
- Units: `1eV` and `1em`
- CSS colours: `#90EE90` (LightGreen)
- UUIDs: `785ada2c-f2d0-11ed-3839-b3104db0cb68`

In each case, a sequence of digits followed by an 'e' or 'E' followed by
a letter results in an "expected at least one digit in exponent" error.
This is an annoying inconsistency in general, and a problem in practice.
It's likely that some users haven't realized this inconsistency because
they've gotten lucky and never used a token with an 'e' that causes
problems. Other users *have* noticed; it's causing problems when
embedding DSLs into proc macros, as seen in rust-lang#111615, where the CSS
colours case is causing problems for two different UI frameworks (Slint
and Makepad).

We can do better. This commit changes the lexer so that, when it hits a
possible exponent, it looks ahead and only produces an exponent if a
valid one is present. Otherwise, it produces a non-exponent form, which
may be a single token (e.g. `1eV`) or multiple tokens (e.g. `1e+a`).

Consequences of this:
- All the proc macro problem cases mentioned above are fixed.
- The "expected at least one digit in exponent" error is no longer
  possible. A few tests that only worked in the presence of that error
  have been removed.
- The lexer requires unbounded lookahead due to the presence of '_'
  chars in exponents. E.g. to distinguish `1e+_______3` (a float literal
  with exponent) from `1e+_______a` (previously invalid, but now the
  tokenised as `1e`, `+`, `_______a`).

This is a backwards compatible language change: all existing valid
programs will be treated in the same way, and some previously invalid
programs will become valid. The tokens chapter of the language reference
(https://doc.rust-lang.org/reference/tokens.html) will need changing to
account for this. In particular, the "Reserved forms similar to number
literals" section will need updating, and grammar rules involving the
SUFFIX_NO_E nonterminal will need adjusting.

Fixes rust-lang#111615.
@nnethercote

This comment was marked as resolved.

@ogoffart

This comment was marked as resolved.

@ChrisDenton ChrisDenton added the needs-triage-legacy Old issue that were never triaged. Remove this label once the issue has been sufficiently triaged. label Jul 16, 2023
nnethercote added a commit to nnethercote/rust that referenced this issue Aug 6, 2023
Such as `1e_3`, `1E+__3`, `1e-_________3_3`.

- They are ugly and never used in practice. (The test suite and compiler
  code have no examples of them.)
- They don't match normal decimal literals. (You can't write `x = _3;`.)
- They complicate attempts to allow integers with suffixes beginning
  with `e`, such as `1em` (currently disallowed, but desired in
  rust-lang#111615). Because when given a char sequence like `1e` the lexer must
  decide whether what follows the `e` is a decimal integer (in which
  case it's a float with exponent) or something else (in which case it's
  an integer with a suffix). But unbounded char lookahead is required to
  get past the possibly unlimited number of leading underscores.
  Disallowing the leading underscores reduces the lookahead to two: one
  for a possible `+`/`-`, and then one more for a digit or non-digit.
@fmease fmease added A-grammar Area: The grammar of Rust T-lang Relevant to the language team, which will review and decide on the PR/issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-proc-macros Area: Procedural macros C-feature-request Category: A feature request, i.e: not implemented / a PR. and removed needs-triage-legacy Old issue that were never triaged. Remove this label once the issue has been sufficiently triaged. C-bug Category: This is a bug. labels Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-grammar Area: The grammar of Rust A-proc-macros Area: Procedural macros C-feature-request Category: A feature request, i.e: not implemented / a PR. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants