Skip to content
This repository was archived by the owner on Jan 25, 2022. It is now read-only.

Numeric separators allowed in extended Unicode escape sequences? #25

Closed
anba opened this issue Sep 5, 2017 · 15 comments
Closed

Numeric separators allowed in extended Unicode escape sequences? #25

anba opened this issue Sep 5, 2017 · 15 comments

Comments

@anba
Copy link

anba commented Sep 5, 2017

The current proposal allows "\u{10_ffff}". Is this intentional?

@mathiasbynens
Copy link
Member

Not intentional, per September 2017 TC39 meeting.

rwaldron added a commit that referenced this issue Nov 27, 2017
https://tc39.github.io/ecma262/#sec-static-semantics-sv

> The SV of UnicodeEscapeSequence::u{CodePoint} is the UTF16Encoding of the MV of CodePoint.

And CodePoint is:

CodePoint ::
  HexDigits (but only if MV of HexDigits ≤ 0x10FFFF)

-------------------------------------------------------------

This patch explicitly defines the MV of HexDigits as:

- The MV of HexDigits :: HexDigits NumericLiteralSeparator HexDigit is (the MV of HexDigits * 16) plus the MV of HexDigit.

It also adds similar definitions for:  DecimalDigits, BinaryDigits and OctalDigits
@rwaldron
Copy link
Collaborator

@anba @mathiasbynens 39d0bc0

The solution I chose is to explicitly define the MV such that NLS has no impact on the resulting value.

@mathiasbynens
Copy link
Member

With the change, e.g. '\u{10_FFFF}' is now explicitly allowed and equivalent to '\u{10FFFF}'.

I’d have preferred to disallow it in escape sequences, following the “no real benefit there” logic in https://github.com/tc39/proposal-numeric-separator#octal-literal, but don’t feel strongly about it. What do you think, @anba?

@rwaldron
Copy link
Collaborator

rwaldron commented Nov 29, 2017

More on this:

  • The SV of UnicodeEscapeSequence :: u { CodePoint } is the UTF16Encoding of the MV of CodePoint.

CodePoint ::
HexDigits but only if MV of HexDigits ≤ 0x10FFFF

HexDigits ::
HexDigit
HexDigits NumericLiteralSeparator opt HexDigit

And the NLS spec adds:

  • The NumericLiteralSeparator :: _ has no MV.
  • The MV of HexDigits :: HexDigits NumericLiteralSeparator HexDigit is (the MV of HexDigits × 16) plus the MV of HexDigit.

@rwaldron
Copy link
Collaborator

To be clear, this was only one of 3 possible solutions that @leobalter and I discussed. We agreed that an explicit Static Semantics could be written as well.

@leobalter
Copy link
Member

I don't have any strong opinion if we should disallow NLS in escape sequences or not. My goal is only to fetch the correct spec for what ends up as the chosen one.

One feedback that might be helpful is collecting implementors PoV. It disallowing NLS in escape sequences is easier to implement, I'm all in.

I'd be happy to help writing an explicit error to disallow.

@littledan
Copy link
Member

I don't feel really strongly, but my intuition was the same as @mathiasbynens 's--I don't see a strong reason to permit numeric separator in Unicode literals. It wasn't clear to me during the Stage 3 presentation that you were proposing to allow it--I misunderstood that the fix was to disallow it, and allowing numeric separators there was an accident.

@ajklein
Copy link

ajklein commented Nov 29, 2017

FWIW, my understanding of the presentation matched @littledan's.

@rwaldron
Copy link
Collaborator

Reopening for explicit Static Semantics to produce a SyntaxError whenever _ appears in "\u{10_ffff}"

@littledan
Copy link
Member

littledan commented Jan 31, 2018

Ping, it would be good to get clarity on this issue before integrating with the BigInt specification (or implementing and shipping).

@leobalter
Copy link
Member

@littledan I believe the decision is throwing when a Num. separator appears in escape sequences. We might need to update the specs to match that.

@leobalter
Copy link
Member

leobalter commented Feb 2, 2018

This seems ugly, but we could get a fix to the escape sequence by changing CodePoint to:

CodePoint ::
  CodePointDigits "but only if MV of CodePointDigits ≤ 0x10FFFF"

CodePointDigits ::
  HexDigit
  CodePointDigits HexDigit

or:

CodePoint ::
  HexDigits "but only if MV of CodePointDigits ≤ 0x10FFFF"

HexIntegerLiteral ::
  0x HexDigits [+NSL]
  0X HexDigits [+NSL]

HexDigits[NSL] ::
  HexDigit
  [~NSL] HexDigits HexDigit
  [+NSL] HexDigits NumericLiteralSeparator_opt HexDigit

I'm not sure if I like adding more grammar production flags.

@rwaldron
Copy link
Collaborator

rwaldron commented Feb 2, 2018

I believe the decision is throwing when a Num. separator appears in escape sequences.

Confirm, these are the desired semantics.

@rwaldron
Copy link
Collaborator

rwaldron commented Feb 2, 2018

@leobalter I've updated the spec to include your first option.

NumericLiteralSeparator is not allowed

Tests to follow.

@rwaldron rwaldron closed this as completed Feb 2, 2018
@littledan
Copy link
Member

The fix in 06b17eb LGTM. (I was surprised that CodePoints is under the template literal heading, but seems like that's what string literals, RegExps, etc cross-reference).

Does MV need an update to explain what the value of CodePointDigits is, or is that considered clear from analogy with its other productions?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants