Fix v-flag bugs #85

JLHwung · 2023-09-19T20:33:02Z

In this PR we reuse the unicode fixtures for the v-flag tests, based the observation that /.../u and /.../v should yield the same result unless set/string properties features are involved.

We also introduce the matches and nonMatches properties to the v-flag fixture runner: They includes the strings that the transpiled regex is supposed to match / reject. It is useful when the transpiled regex is too verbose for proper comprehension.

~~This PR includes commits from #84, I will rebase once that PR is merged.~~

This is a draft PR as I still haven't figured out how to avoid double-bmpify regex strings: In the negative set notation we extract single code points from the UNICODE_SET, which yields surrogate stuffs in the output, but then it was bmp-ified again in the regenerate, yielding longer than necessary results, though it seems correct.

JLHwung · 2023-09-19T20:34:43Z

tests/fixtures/unicode-set.js

@@ -105,6 +105,8 @@ const unicodeSetFixtures = [
 	},
 	{
 		pattern: '[^[a-z][f-h]]',
+		matches: ["A", "\u{12345}"],
+		nonMatches: ["a", "z"],
 		expected: '(?:(?![a-z])[\\s\\S])',


The current transpiled result does not match "\u{12345}".

JLHwung · 2023-09-20T19:56:05Z

rewrite-pattern.js

-						);
+						const negativeSet = UNICODE_SET.clone().remove(singleChars);
+						const bmpOnly = regenerateContainsAstral(negativeSet);
+						update(characterClassItem, negativeSet.toString({ bmpOnly: bmpOnly }));


If the regenerate set spans from code points before surrogate to astral sets, toString({ bmpOnly: false }) returns much more verbose results while toString({ bmpOnly: false }) is already correct: I think it should be fixed in regenerate later.

const regenerate = require('regenerate'); const set = regenerate().addRange(0xd000, 0x10000); console.log(set.toString()); // [\uD000-\uD7FF\uE000-\uFFFF]|\uD800\uDC00|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF] console.log(set.toString({ bmpOnly: true })); // [\uD000-\uFFFF]|\uD800\uDC00

The latter is apparently correct as it matches lone surrogates as well as U+10000. The former seems like [\uD000-\uFFFF]|\uD800\uDC00 is passed to the bmp pass again.

JLHwung · 2023-09-20T19:59:43Z

tests/fixtures/character-class.js

-		expected: '(?:[\\0-JL-\\uD7FF\\uE000-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF]|[\\uD800-\\uDBFF](?![\\uDC00-\\uDFFF])|(?:[^\\uD800-\\uDBFF]|^)[\\uDC00-\\uDFFF])',
+		matches: ["k", "\u212a", "\u{12345}", "\uDAAA", "\uDDDD"],
+		nonMatches: ["K"],
+		expected: '(?:[\\0-JL-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF])',


They are now much shorter and easier to reason about. I also added matches tests so that we are confident that transpiled result is correct.

nicolo-ribaudo

Awesome!

JLHwung marked this pull request as draft September 19, 2023 20:33

JLHwung commented Sep 19, 2023

View reviewed changes

JLHwung marked this pull request as ready for review September 20, 2023 19:42

JLHwung commented Sep 20, 2023

View reviewed changes

nicolo-ribaudo requested a review from mathiasbynens September 20, 2023 20:01

mathiasbynens approved these changes Sep 20, 2023

View reviewed changes

JLHwung added 5 commits September 21, 2023 10:37

replace istanbul ignore comment with node coverage

e30f2eb

support matches and nonMatches in regexp-v-flag tests

d4b5bdb

re-use Unicode fixtures in v-flag tests

2b55fe7

improve handling character class

6cb420e

make node.js 6 happy

8501e5d

JLHwung force-pushed the fix-v-flag-bugs branch from 23feaf3 to 8501e5d Compare September 21, 2023 14:38

JLHwung added 2 commits September 21, 2023 15:57

add dot all flag test cases

d948109

remove duplicate property escapes test

cee1115

nicolo-ribaudo approved these changes Sep 23, 2023

View reviewed changes

nicolo-ribaudo merged commit 91ee342 into mathiasbynens:main Sep 23, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix v-flag bugs #85

Fix v-flag bugs #85

JLHwung commented Sep 19, 2023 •

edited

Loading

JLHwung Sep 19, 2023

JLHwung Sep 20, 2023

JLHwung Sep 20, 2023

nicolo-ribaudo left a comment

Fix v-flag bugs #85

Fix v-flag bugs #85

Conversation

JLHwung commented Sep 19, 2023 • edited Loading

JLHwung Sep 19, 2023

Choose a reason for hiding this comment

JLHwung Sep 20, 2023

Choose a reason for hiding this comment

JLHwung Sep 20, 2023

Choose a reason for hiding this comment

nicolo-ribaudo left a comment

Choose a reason for hiding this comment

JLHwung commented Sep 19, 2023 •

edited

Loading