Skip to content
This repository was archived by the owner on Jan 25, 2022. It is now read-only.

Should we support Unicode escapes in group names, RegExp style? #23

Closed
littledan opened this issue Mar 30, 2017 · 4 comments
Closed

Should we support Unicode escapes in group names, RegExp style? #23

littledan opened this issue Mar 30, 2017 · 4 comments

Comments

@littledan
Copy link
Member

E.g., see these tests in V8: https://cs.chromium.org/chromium/src/v8/test/mjsunit/harmony/regexp-named-captures.js?q=regexp-named+package:%5Echromium$&l=86 . These should be supported by the standard, as it's analogous to identifiers and properties.

@schuay
Copy link

schuay commented Mar 31, 2017

Is it worth complicating the spec and implementations to support a feature that, most likely, no-one will use?

  • Is there even a valid use-case for something like /(?<\u{03C0}>a)/?
  • How would this interact with unicode mode?
  • Would they also be valid in replacer string capture references? V8 doesn't do this currently and it would complicate things significantly.

+1 from me for leaving the spec as-is and removing existing support from V8.

@mathiasbynens
Copy link
Member

I agree with @schuay. The benefits of matching Identifier do not outweigh the added complexity. Let’s keep it simple for now — we can reconsider allowing this in the future if there is a strong need for it.

@schuay
Copy link

schuay commented Mar 31, 2017

Dan pointed out that escape sequences in replacer strings would already be handled by regular string literal syntax.

@littledan
Copy link
Member Author

Actually, I think this already falls out of the current semantics of the specification. The syntax is based on IdentiferName, and taking the StringValue of that production. Both of these will explicitly allow Unicode escapes.

An issue with the current spec, though, is that we use UnicodeEscapeSequence rather than RegExpUnicodeEscapeSequence. For example, \u{1234} is allowed only with Unicode mode turned on, and disallowed with it turned off.

It seems surprising and a little overly complicated to not follow the RegExp syntax for Unicode escapes when in a RegExp. So I think we should split out the grammar for identifiers here, unless we want to fully remove the feature. If we do this splitting out, we need to make sure to maintain the errors for bad identifiers.

@littledan littledan changed the title Should support Unicode escapes in group names Should we support Unicode escapes in group names, RegExp style? Mar 31, 2017
kisg pushed a commit to paul99/v8mips that referenced this issue Apr 7, 2017
Update docs and tests for recent changes in the spec for unicode escapes
in capture group names.

tc39/proposal-regexp-named-groups#23

BUG=v8:5437

Review-Url: https://codereview.chromium.org/2788423003
Cr-Commit-Position: refs/heads/master@{#44474}
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants