Fix unhandled cases for exotic idents #6658

cometkim · 2024-02-27T22:00:52Z

Changes

Exotic idents with empty string are now banned in syntax-level
Fixes Unexpected compiler error occurs on PascalCase type names #6539, allowing uppercase lident

Note this is an important feature for Gentype(#6196) and Library Mode(#6210), which are assumed to be used by users on the JS side.

Rationale

In OCaml AST, there are two separate identifier classes Lident(lowercase ident) and Uident(uppercase ident), depending on whether the first letter of the identifier is lowercase or uppercase.

This is useful internally to distinguish whether the identifier is for a module or a variable/type binding.

e.g. https://github.com/rescript-lang/rescript-compiler/blob/17d383f/jscomp/ml/path.ml#L87-L91

let is_uident s =
  assert (s <> "");
  match s.[0] with
  | 'A'..'Z' -> true
  | _ -> false

ReScript supports exotic labels for defining bindings starting with uppercase letters. The ReScript parser removes the surrounding wrapper when parsing exotic labels and forces it to be a Lident node.

let \"Upper" = None

So this produces Lident(name = "Upper") node in OCaml AST.

However, the OCaml type system uses the previously mentioned is_uident for path analysis. As a result, exotic idents that start with an uppercase letter are unintentionally treated as modules internally. These are cases OCaml does not assume, and it ends up raising invariant errors deep in the runtime. (#6539)

OCaml itself has no knowledge of exotic ident, so it cannot determine whether the original declaration was exotic or not. For compatibility reasons, the AST cannot be changed to support it.

In the compiler, knowledge of exotic idents existed only in the printer. Here again, since there is no information passed from the grammar, some heavy logic (e.g. printIdentLike) is needed everywhere to recheck whether a label should be treated as exotic or not.

This change solves the problem by introducing several assumptions.

When parsing exotic idents, the parser passes the surrounding chars as-is. So OCaml will treat it as a Lident too (because '\' <> A .. Z), and we can easily recheck whether an ident name is treated as exotic or not by checking whether the first letter is the backslash.
Extoic uident is explicitly not supported (e.g. module \"lowercase") . There is no way to distinguish between uppercase lident and lowercase uident without breaking change on AST.
It's not the parser's job to provide "pretty labels", so we'll have to do that elsewhere where we need it. For example, the printer normalizes names using the ext_ident.unwrap_exotic utility where necessary.

jscomp/test/key_word_property2.js

jscomp/syntax/src/res_core.ml

cometkim · 2024-04-02T13:22:57Z

@zth @cristianoc can you review this? these changes have a quite broad impact on other modules and may need to be understood by other compiler devs.

mununki · 2024-04-02T13:38:42Z

jscomp/syntax/src/res_scanner.ml

  let startPos = position scanner in
+  let startOff = scanner.offset in
+  let closed = ref false in


Is it really necessary to implement with closed? It seems like I only need to wrap \" and " at the end of my existing implementation if it's not an infix operator.

This makes a path to skip when a token seq like \\\"\n is entered.

isn't \\\"\n catched by this existing line?

| '\n' | '\r' -> (* line break *) let endPos = position scanner in scanner.err ~startPos ~endPos (Diagnostics.message "A quoted identifier can't contain line breaks."); next scanner

Then it will possibly be unsafe to substring

It could obviously be replaced with unwrap_ident, but it has the same kind of assertion exists in it. I did it this way because it is simpler. There is no reason for the scanner to be more inefficient.

zth · 2024-04-29T12:00:50Z

I don't have bandwidth to review this right now, sorry for keeping you waiting. @cristianoc you got time to review this?

Since infix operators are weird constraints, scanner shouldn't touch it. Therefore, the printer still needs to distinguish whether the given ident is infix-like or not.

zth · 2024-05-25T09:42:17Z

cc @cristianoc we need your eyes on this.

cometkim · 2024-05-25T11:01:04Z

No worry, I just requested his review in person

@cristianoc

Duplicates rescript-lang#6658, but with much smaller changes As follows @cristianoc review, I tried to reduce possible affected surface to only exotic uident.

@cristianoc

Duplicates rescript-lang#6658, but with much smaller changes Reflecting @cristianoc ' review, I tried to reduce possible affected surface to only exotic uident.

@cristianoc

Duplicates rescript-lang#6658, but with much smaller changes Reflecting on @cristianoc ' review, I tried to reduce possible affected surface to only exotic uident.

cometkim · 2024-05-25T14:34:32Z

Close this in favor of #6777

But avoiding to use printerIdentLike is still can be an additional optimization (which improve parser's speed about 5~10%)
Maybe I'll introduce it in another PR

@cristianoc

Duplicates #6658, but with much smaller changes Reflecting on @cristianoc ' review, I tried to reduce possible affected surface to only exotic uident.

cometkim force-pushed the exotic-ident branch from e875735 to b7269a6 Compare February 28, 2024 20:13

cometkim commented Mar 15, 2024

View reviewed changes

jscomp/test/key_word_property2.js Outdated Show resolved Hide resolved

cometkim force-pushed the exotic-ident branch 5 times, most recently from 406746c to c582cf4 Compare March 26, 2024 18:19

This comment was marked as resolved.

Sign in to view

cometkim commented Mar 27, 2024

View reviewed changes

jscomp/syntax/src/res_core.ml Outdated Show resolved Hide resolved

cometkim mentioned this pull request Apr 2, 2024

Umbrella: Library Mode #6210

Open

6 tasks

cometkim changed the title ~~properly handle exotic ident~~ Fix unhandled cases for exotic idents Apr 2, 2024

cometkim marked this pull request as ready for review April 2, 2024 13:11

mununki reviewed Apr 2, 2024

View reviewed changes

cometkim force-pushed the exotic-ident branch from 21d5974 to b781d6a Compare April 29, 2024 11:53

cometkim force-pushed the exotic-ident branch 2 times, most recently from 7f35700 to 5d38f55 Compare May 23, 2024 21:35

cometkim mentioned this pull request May 25, 2024

enable CI cache for NPM #6767

Closed

cometkim added 11 commits May 25, 2024 18:38

ban empty ident in syntax-level

cdb4d1f

handle exotic ident properly

374be35

fix a crash

d92da3c

fix printer

1c1a353

fix jsx printers

5f862d3

dedupe snippets

be15881

organized reserved word checking

a645dc8

fix res_ast_debugger to unwrap exotic ident names

ad3652f

wip

0b9ce87

move back to previous strategy

496def9

Since infix operators are weird constraints, scanner shouldn't touch it. Therefore, the printer still needs to distinguish whether the given ident is infix-like or not.

fix gentype

e9343e9

changelog

06ec128

cometkim force-pushed the exotic-ident branch from 5d38f55 to 06ec128 Compare May 25, 2024 09:38

cometkim force-pushed the exotic-ident branch 3 times, most recently from cbc01bd to 81f544b Compare May 25, 2024 10:30

cometkim force-pushed the exotic-ident branch from 81f544b to 80a9fea Compare May 25, 2024 12:09

revert changes that unrelated to the main issue

2737656

cometkim force-pushed the exotic-ident branch from 80a9fea to 2737656 Compare May 25, 2024 12:15

cometkim mentioned this pull request May 25, 2024

Fix unhandled cases for exotic idents #6777

Merged

cometkim closed this May 25, 2024

cristianoc pushed a commit that referenced this pull request May 25, 2024

Fix unhandled cases for exotic idents

20d2542

Duplicates #6658, but with much smaller changes Reflecting on @cristianoc ' review, I tried to reduce possible affected surface to only exotic uident.

cometkim deleted the exotic-ident branch May 25, 2024 15:01

cometkim mentioned this pull request May 26, 2024

refactor: clarify uppercase exotic ident path #6779

Merged

cometkim mentioned this pull request Jun 26, 2024

fix reserved words #6831

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix unhandled cases for exotic idents #6658

Fix unhandled cases for exotic idents #6658

Uh oh!

cometkim commented Feb 27, 2024 •

edited

Loading

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

cometkim commented Apr 2, 2024

Uh oh!

mununki Apr 2, 2024

Uh oh!

cometkim Apr 2, 2024

Uh oh!

mununki Apr 2, 2024

Uh oh!

cometkim Apr 2, 2024 •

edited

Loading

Uh oh!

zth commented Apr 29, 2024

Uh oh!

zth commented May 25, 2024

Uh oh!

cometkim commented May 25, 2024

Uh oh!

cometkim commented May 25, 2024

Uh oh!

Uh oh!

Fix unhandled cases for exotic idents #6658

Fix unhandled cases for exotic idents #6658

Uh oh!

Conversation

cometkim commented Feb 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Rationale

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

cometkim commented Apr 2, 2024

Uh oh!

mununki Apr 2, 2024

Choose a reason for hiding this comment

Uh oh!

cometkim Apr 2, 2024

Choose a reason for hiding this comment

Uh oh!

mununki Apr 2, 2024

Choose a reason for hiding this comment

Uh oh!

cometkim Apr 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zth commented Apr 29, 2024

Uh oh!

zth commented May 25, 2024

Uh oh!

cometkim commented May 25, 2024

Uh oh!

cometkim commented May 25, 2024

Uh oh!

Uh oh!

cometkim commented Feb 27, 2024 •

edited

Loading

cometkim Apr 2, 2024 •

edited

Loading