Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust 2015 and 2018 allow emoji in identifiers in "Unknown prefix" position #123696

Closed
mattheww opened this issue Apr 9, 2024 · 3 comments · Fixed by #123752
Closed

Rust 2015 and 2018 allow emoji in identifiers in "Unknown prefix" position #123696

mattheww opened this issue Apr 9, 2024 · 3 comments · Fixed by #123752
Assignees
Labels
A-grammar Area: The grammar of Rust A-Unicode Area: Unicode C-bug Category: This is a bug. P-low Low priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@mattheww
Copy link
Contributor

mattheww commented Apr 9, 2024

In the 2015 and 2018 editions, the following compiles (with warnings):

macro_rules! lexes {($($_:tt)*) => {}}

lexes!(🐛#);
lexes!(🐛"foo");
lexes!(🐛'q');
lexes!(🐛'q);

playground

The 🐛 is taken as an identifier, although emoji aren't generally permitted in identifiers in any edition.

I tested with rustc 1.77.1.

I think the underlying problem is that ident_or_unknown_prefix() and fake_ident_or_unknown_prefix() in rustc_lexer distinguish "identifiers" containing emoji (as InvalidIdent rather than Ident), but don't have a way to make that distinction for UnknownPrefix.

@mattheww mattheww added the C-bug Category: This is a bug. label Apr 9, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Apr 9, 2024
@fmease fmease added A-grammar Area: The grammar of Rust T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Apr 9, 2024
@fmease
Copy link
Member

fmease commented Apr 9, 2024

Regression from 1.57 to 1.58 (stable to stable). Very likely in #88781, cc @estebank.
Regression from error to pass. Previous error: unknown start of token: \u{1f41b}.

@fmease fmease added the regression-from-stable-to-stable Performance or correctness regression from one stable version to another. label Apr 9, 2024
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Apr 9, 2024
@fmease fmease added the A-Unicode Area: Unicode label Apr 9, 2024
@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-low

@rustbot rustbot added P-low Low priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Apr 10, 2024
@estebank estebank self-assigned this Apr 10, 2024
estebank added a commit to estebank/rust that referenced this issue Apr 10, 2024
Do not accept the following

```rust
macro_rules! lexes {($($_:tt)*) => {}}
lexes!(🐛"foo");
```

Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro expansion of literal prefixes.

Fix rust-lang#123696.
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 10, 2024
Properly handle emojis as literal prefix in macros

Do not accept the following

```rust
macro_rules! lexes {($($_:tt)*) => {}}
lexes!(🐛"foo");
```

Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro expansion of literal prefixes.

Fix rust-lang#123696.
@estebank
Copy link
Contributor

@mattheww thank you for the detailed report. #123752 will fix this.

bors added a commit to rust-lang-ci/rust that referenced this issue Apr 10, 2024
Properly handle emojis as literal prefix in macros

Do not accept the following

```rust
macro_rules! lexes {($($_:tt)*) => {}}
lexes!(🐛"foo");
```

Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro pre-expansion of literal prefixes.

Fix rust-lang#123696.
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Apr 18, 2024
Properly handle emojis as literal prefix in macros

Do not accept the following

```rust
macro_rules! lexes {($($_:tt)*) => {}}
lexes!(🐛"foo");
```

Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro pre-expansion of literal prefixes.

Fix rust-lang#123696.
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Apr 18, 2024
Properly handle emojis as literal prefix in macros

Do not accept the following

```rust
macro_rules! lexes {($($_:tt)*) => {}}
lexes!(🐛"foo");
```

Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro pre-expansion of literal prefixes.

Fix rust-lang#123696.
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Apr 18, 2024
Properly handle emojis as literal prefix in macros

Do not accept the following

```rust
macro_rules! lexes {($($_:tt)*) => {}}
lexes!(🐛"foo");
```

Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro pre-expansion of literal prefixes.

Fix rust-lang#123696.
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Apr 18, 2024
Properly handle emojis as literal prefix in macros

Do not accept the following

```rust
macro_rules! lexes {($($_:tt)*) => {}}
lexes!(🐛"foo");
```

Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro pre-expansion of literal prefixes.

Fix rust-lang#123696.
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Apr 19, 2024
Properly handle emojis as literal prefix in macros

Do not accept the following

```rust
macro_rules! lexes {($($_:tt)*) => {}}
lexes!(🐛"foo");
```

Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro pre-expansion of literal prefixes.

Fix rust-lang#123696.
@bors bors closed this as completed in 19821ad Apr 19, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Apr 19, 2024
Rollup merge of rust-lang#123752 - estebank:emoji-prefix, r=wesleywiser

Properly handle emojis as literal prefix in macros

Do not accept the following

```rust
macro_rules! lexes {($($_:tt)*) => {}}
lexes!(🐛"foo");
```

Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro pre-expansion of literal prefixes.

Fix rust-lang#123696.
mattheww added a commit to mattheww/lexeywan that referenced this issue Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-grammar Area: The grammar of Rust A-Unicode Area: Unicode C-bug Category: This is a bug. P-low Low priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants