Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES6 regular expressions unicode mode: case insensitivity case folding inconsistency #7708

Closed
xqin opened this issue Jul 13, 2016 · 7 comments
Labels
question Issues that look for answers. v8 engine Issues and PRs related to the V8 dependency.

Comments

@xqin
Copy link

xqin commented Jul 13, 2016

  • Version: v4.1.0
  • Platform: Windows 7 x64
/(\.[\u4e00-\u9fa5A-Z])+/i.test('a.c');  //true

/(\.[A-Z\u4e00-\u9fa5])+/i.test('a.c');   //true
  • Version: v6.3.0
  • Platform: Windows 7 x64
/(\.[\u4e00-\u9fa5A-Z])+/i.test('a.c');  //false

/(\.[A-Z\u4e00-\u9fa5])+/i.test('a.c');   //true

why?

@addaleax addaleax added the v8 engine Issues and PRs related to the V8 dependency. label Jul 13, 2016
@vsemozhetbyt
Copy link
Contributor

It seems here is the turning point:

/(\.[\u00ffA-Z])+/i.test('a.c'); // true
/(\.[\u0100A-Z])+/i.test('a.c'); // false

@mscdex
Copy link
Contributor

mscdex commented Jul 13, 2016

I'm not familiar enough with the ES6 regexp changes, but FWIW adding the new /u flag makes the .test() return true.

@mscdex mscdex added the question Issues that look for answers. label Jul 13, 2016
@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Jul 13, 2016

Well, that is weird. Node.js 6.3.0

/(a[\u0100A-Z])+/i.test('aa'); \\ false
/a([\u0100A-Z])+/i.test('aa'); \\ true

/(a[\u0100A-Z])+/i.test('aa'); \\ false
/(a[\u0100A-Z])/i.test('aa');  \\ true

/(a[\u0100A-Z])+/i.test('aa'); \\ false
/(a[\u00ffA-Z])+/i.test('aa'); \\ true

/(a[\u0100A-Z])+/i.test('aa'); \\ false
/(a[A-Z\u0100])+/i.test('aa'); \\ true

/(a[\u0100A-Z])+/i.test('aa'); \\ false
/(a[\u0100a-z])+/i.test('aa'); \\ true

/(a[\u0100A-Z])+/i.test('aa'); \\ false
/(a[\u0100A-Z])+/i.test('AA'); \\ true

@targos
Copy link
Member

targos commented Jul 13, 2016

Someone reported the bug upstream: https://bugs.chromium.org/p/v8/issues/detail?id=5199

@hashseed
Copy link
Member

Issue has been fixed and ported back to V8 5.3 and 5.2. Please comment to the V8 issue if we need to merge this further back.

@ofrobots
Copy link
Contributor

Thank! The older V8 streams (5.1 and older) are no longer accepting merges upstream, so fixes would have to be floated for master (V8 5.1) and v6.x (V8 5.0). I believe @fhinkel is planning to submit PRs for the floats.

@fhinkel
Copy link
Member

fhinkel commented Jul 22, 2016

Yes, working on it.

fhinkel added a commit to fhinkel/node that referenced this issue Jul 22, 2016
Original commit message:
  [regexp] Fix case-insensitive matching for one-byte subjects.

  The bug occurs because we do not canonicalize character class ranges
  before adding case equivalents. While adding case equivalents, we abort
  early for one-byte subject strings, assuming that the ranges are sorted.
  Which they are not.

  R=marja@chromium.org
  BUG=v8:5199

  Review-Url: https://codereview.chromium.org/2159683002
  Cr-Commit-Position: refs/heads/master@{nodejs#37833}

Fixes: nodejs#7708
fhinkel added a commit to fhinkel/node that referenced this issue Jul 22, 2016
Original commit message:
  [regexp] Fix case-insensitive matching for one-byte subjects.

  The bug occurs because we do not canonicalize character class ranges
  before adding case equivalents. While adding case equivalents, we abort
  early for one-byte subject strings, assuming that the ranges are sorted.
  Which they are not.

  R=marja@chromium.org
  BUG=v8:5199

  Review-Url: https://codereview.chromium.org/2159683002
  Cr-Commit-Position: refs/heads/master@{nodejs#37833}

Fixes: nodejs#7708
ofrobots pushed a commit that referenced this issue Aug 3, 2016
Original commit message:
  [regexp] Fix case-insensitive matching for one-byte subjects.

  The bug occurs because we do not canonicalize character class ranges
  before adding case equivalents. While adding case equivalents, we abort
  early for one-byte subject strings, assuming that the ranges are sorted.
  Which they are not.

  R=marja@chromium.org
  BUG=v8:5199

  Review-Url: https://codereview.chromium.org/2159683002
  Cr-Commit-Position: refs/heads/master@{#37833}

Fixes: #7708
PR-URL: #7834
Ref: #7833
Reviewed-By: targos - Michaël Zasso <mic.besace@gmail.com>
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: jasnell - James M Snell <jasnell@gmail.com>
Reviewed-By: ofrobots - Ali Ijaz Sheikh <ofrobots@google.com>
BethGriggs pushed a commit to ibmruntimes/node that referenced this issue Aug 18, 2016
Original commit message:
  [regexp] Fix case-insensitive matching for one-byte subjects.

  The bug occurs because we do not canonicalize character class ranges
  before adding case equivalents. While adding case equivalents, we abort
  early for one-byte subject strings, assuming that the ranges are sorted.
  Which they are not.

  R=marja@chromium.org
  BUG=v8:5199

  Review-Url: https://codereview.chromium.org/2159683002
  Cr-Commit-Position: refs/heads/master@{#37833}

Fixes: nodejs/node#7708
PR-URL: nodejs/node#7834
Ref: nodejs/node#7833
Reviewed-By: targos - Michaël Zasso <mic.besace@gmail.com>
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: jasnell - James M Snell <jasnell@gmail.com>
Reviewed-By: ofrobots - Ali Ijaz Sheikh <ofrobots@google.com>
ofrobots pushed a commit to ofrobots/node that referenced this issue Aug 25, 2016
Original commit message:
  [regexp] Fix case-insensitive matching for one-byte subjects.

  The bug occurs because we do not canonicalize character class ranges
  before adding case equivalents. While adding case equivalents, we abort
  early for one-byte subject strings, assuming that the ranges are sorted.
  Which they are not.

  R=marja@chromium.org
  BUG=v8:5199

  Review-Url: https://codereview.chromium.org/2159683002
  Cr-Commit-Position: refs/heads/master@{nodejs#37833}

Fixes: nodejs#7708
PR-URL: nodejs#7833
Reviewed-By: targos - Michaël Zasso <mic.besace@gmail.com>
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: ofrobots - Ali Ijaz Sheikh <ofrobots@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Issues that look for answers. v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

No branches or pull requests

8 participants