Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emoji in label/lifetime recovered as character literal (rather than identifier) #108019

Closed
izik1 opened this issue Feb 14, 2023 · 3 comments · Fixed by #108031
Closed

Emoji in label/lifetime recovered as character literal (rather than identifier) #108019

izik1 opened this issue Feb 14, 2023 · 3 comments · Fixed by #108031
Assignees
Labels
A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@izik1
Copy link

izik1 commented Feb 14, 2023

Code

fn bar() {
    '🐱 loop {
        break
    }
}

Current output

error[E0762]: unterminated character literal
 --> src/lib.rs:2:5
  |
2 |     '🐱 loop {
  |     ^^^^^^^^^^

Desired output

error: identifiers cannot contain emoji
 --> src/lib.rs:2:5
  |
2 |     '🐱: loop {
  |      ^^

or something else similar to the one for

fn bar() {
    let 🐱 = ();
}
error: identifiers cannot contain emoji: `🐱`
 --> src/lib.rs:2:9
  |
2 |     let 🐱 = ();
  |         ^^

Perhaps with a =help "did you mean to use a character literal?" when applicable

Rationale and extra context

I feel the rationale is self-explanatory, however, if it ends up not being such, I can provide one upon request.

Other cases

small aside: I originally wrote this all for 🥺, but that is bizarrely not recognized in idents at all (it gives a error: unknown start of token: \u{1f97a}), and after realizing that some emotes are handled better, I decided to use to use 🐱. I specifically avoided 🦀 because it has extra-special handling ("Ferris cannot be used as an identifier")

Another case is, as mentioned prior, in lifetime names (as far as I'm aware, this is the same underlying cause: the emoji causes the token to be a character literal):

fn foo<'🐱>() -> &'🐱 () {
   &()
}

which gives 2 errors:

error: character literal may only contain one codepoint
 --> src/lib.rs:1:8
  |
1 | fn foo<'🐱>() -> &'🐱 () {
  |        ^^^^^^^^^^^^
  |
help: if you meant to write a `str` literal, use double quotes
  |
1 | fn foo<"🐱>() -> &"🐱 () {
  |        ~~~~~~~~~~~~

error: expected one of `#`, `>`, `const`, identifier, or lifetime, found `'🐱>() -> &'`
 --> src/lib.rs:1:8
  |
1 | fn foo<'🐱>() -> &'🐱 () {
  |        ^^^^^^^^^^^^ expected one of `#`, `>`, `const`, identifier, or lifetime

The following sample also has very different output (and probably closer to the expected output, although it's not without its own weirdness):

fn bar() {
    'a🐱: loop {}
}
error: malformed loop label
 --> src/lib.rs:6:7
  |
6 |     'a🐱: loop {}
  |       ^^ help: use the correct loop label format: `'🐱`

error: expected `while`, `for`, `loop` or `{` after a label
 --> src/lib.rs:6:7
  |
6 |     'a🐱: loop {}
  |       ^^ expected `while`, `for`, `loop` or `{` after a label
  |
help: consider removing the label
  |
6 -     'a🐱: loop {}
6 +     🐱: loop {}
  |

error: labeled expression must be followed by `:`
 --> src/lib.rs:6:7
  |
6 |     'a🐱: loop {}
  |     ---^^^^^^^^^^
  |     | |
  |     | help: add `:` after the label
  |     the label
  |
  = note: labels are used before loops and blocks, allowing e.g., `break 'label` to them

error: identifiers cannot contain emoji: `🐱`
 --> src/lib.rs:6:7
  |
6 |     'a🐱: loop {}
  |       ^^

warning: unused label
 --> src/lib.rs:6:7
  |
6 |     'a🐱: loop {}
  |       ^^
  |
  = note: `#[warn(unused_labels)]` on by default

warning: `playground` (lib) generated 1 warning
error: could not compile `playground` due to 4 previous errors; 1 warning emitted

Anything else?

No response

@izik1 izik1 added A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 14, 2023
@jieyouxu
Copy link
Member

This looks like a fun one, would like to give it a try 🥺

@rustbot claim

@jieyouxu
Copy link
Member

For the '🐱 loop { case, it is incorrectly lexed as a character literal:

DEBUG rustc_parse::lexer next_token: Literal { kind: Char { terminated: false }, suffix_start: 12 }("'🐱 loop {")

@jieyouxu
Copy link
Member

Okay this is really weird,

[compiler/rustc_lexer/src/lib.rs:640] self.first() = '🥺'
[compiler/rustc_lexer/src/lib.rs:641] unic_emoji_char::is_emoji(self.first()) = false

[compiler/rustc_lexer/src/lib.rs:640] self.first() = '🐱'
[compiler/rustc_lexer/src/lib.rs:641] unic_emoji_char::is_emoji(self.first()) = true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants