Add special handling for “global” regexes, just like `@@match` has. #29

ljharb · 2017-11-21T04:57:03Z

Fixes #28.

This PR copies the special "global" handling from @@match, and adds test for non-matching regexes.

It still requires the previousIndex handling, because match returns one result immediately, and matchAll still needs to know when it's received the same result, and to return "done".

Fixes #28.

schuay

Thanks for the changes, two comments from the sidelines.

schuay · 2017-11-21T07:23:33Z

spec.emu

 					1. Return ! CreateIterResultObject(_match_, *false*).
+				1. Else,


Do we still need these lastIndex comparisons? Now that global is stored on the iterator, couldn't this branch be simplified to:

if (!global) Return ! CreateIterResultObject(*null*, *true*).

So in total it'd be something like

if (match == null || !global) { Return ! CreateIterResultObject(*null*, *true*). } else { if (global) AdvanceStringIndex. Return ! CreateIterResultObject(_match_, *false*). }

I believe we do, because a non-global RegExp will only return a single match; but it will return it over and over again. In other words, without those checks, you’d get an infinite iterator for the same match rather than an iterator for a single match.

My point was that we don't need to compare lastIndex anymore since we now know whether the iterator is global/nonglobal by looking at its flags. So in the non-global case, we could just Return ! CreateIterResultObject(*null*, *true*)..

Edit: I just realized my suggestion doesn't cover returning the first result.

In the non-global case, RegExpExec will never return anything but the same match object for a built in regex but this isn’t the case for a regex subclass; I’m pretty sure this check can’t be avoided.

If we still do want to support non-global iteration and not just flip on the global flag, matching split, then could we still simplify the logic along the lines that @schuay points out?: We just need [[Done]], not [[PreviousIndex]]. Couldn't we do,

If O.[[Done]] is true, NOTE: I'm not sure why the current spec text keeps calling exec if done was already set

Return ! CreateIterResultObject(null, true).

Let match be RegExpExec(R, S).

If match is null,

... current text...

If global,

...text proposed in this patch...

Else,

Set O.[[Done]] to true

Return ! CreateIterResultObject(match, false).

With or without this change, the non-global case still seems weird to me, more a bug farm than a feature. It was one thing when it fell out of logic that worked consistently for the global path and the non-global path fell out (as in the current spec text, or in my suggested change); if we had to end up with something like what you have in this patch, I'd be even more skeptical.

Fine for you to merge your patches however you want, but I don't think the logic you have here should be present in the version that gets to stage 3. A custom regexp can add this extra logic itself; no need to build it in already.

The difference between this case and ES2015 RegExp subclassing is that what's added in ES6 is just about exposing logic that works well for the base class. It doesn't expose code paths which are motivated entirely by subclassing, AFAICT.

Everything that looks at Symbol.match or any of the other regex symbols is motivated entirely by subclassing.

That's not what I mean; I mean, it doesn't have particular algorithms which are just there for cases which only other subclasses might want to access.

Per our offline discussion; I've been convinced that implementing this suggestion will not, in fact, preclude either any future built-in "multiple match" flag, nor any current subclassed regex from implementing one. I'll update the PR with that change after verifying that it passes my test cases.

schuay · 2017-11-21T07:28:22Z

spec.emu

 		1. Let _iterator_ be ObjectCreate(<emu-xref href="#%RegExpStringIteratorPrototype%">%RegExpStringIteratorPrototype%</emu-xref>, &laquo; [[IteratingRegExp]], [[IteratedString]], [[PreviousIndex]], [[Done]] &raquo;).
 		1. Set _iterator_.[[IteratingRegExp]] to _R_.
 		1. Set _iterator_.[[IteratedString]] to _S_.
+		1. Set _iterator_.[[Global]] to _global_.


I like this, but we we now have to keep in mind later that there might be a mismatch between flags stored on the iterator and regexp instance.

Such a mismatch would only occur if a RegExp changed its flags mid-exec; is that a use case we need to support?

If so, I’d need to fetch the global and unicode flags during every iteration.

IMHO it shouldn't be supported, just wanted point out the possibility.

For context, this may happen due to RegExp.prototype.compile.

The existing RegExp features are comfortable with reading some flags ahead of time too. For example, [RegExp.prototype[@@replace]](https://tc39.github.io/ecma262/#sec-regexp.prototype-@@replace) reads the global flag once, whereas an overridden RegExp.prototype.exec function could call compile and turn the flag off in the middle of the loop.

Because of that precedent (and because I can't think of a legitimate reason to use the other behavior) I'm happy with pre-reading the flag.

mathiasbynens

LGTM % nits

mathiasbynens · 2017-11-22T06:53:39Z

spec.md

+      </tr>
+      <tr>
+        <td>[[Unicode]]</td>
+        <td>A Boolean value to indicate whether the [[IteratingRegExp]] is in full unicode more or not.</td>


Nit: Unicode (capital letter U)

Thanks, fixed!

mathiasbynens · 2017-11-22T06:54:34Z

spec.md

+      </tr>
+      <tr>
+        <td>[[Done]]</td>
+        <td>Boolean value representing whether the iteration is complete or not.</td>


This should follow the sentence format used for [[Global]] etc. above, i.e.

<td>A Boolean value to indicate whether the iteration is complete or not.</td>

Thanks, fixed!

littledan

I'd like the comment I added about complexity to be addressed. Aside from that, a style nit: It was kind of annoying to review this change, digging through the rendered HTML, and just assuming the markdown wasn't there. It's fine to maintain this style here, but I wouldn't recommend this layout for new proposals. It's easier to do reviews if the rendered spec is checked in in another commit.

ljharb · 2017-12-12T01:23:33Z

There's a .gitattributes file that github isn't respecting that should hide the rendered spec from the diff; but either way that's something that should be discussed outside this repo.

https://github.com/tc39/proposal-string-matchall/pull/29/files#diff-d17ba2470b0d23fe2f52dca6b95ce865 - ie, only spec.emu - is the file that's worth reviewing.

…matches Per #29 (comment)

ljharb · 2017-12-14T08:10:48Z

Updated. @littledan (@mathiasbynens, @bterlson, @schuay) - please take another look! <3

schuay · 2017-12-14T09:59:03Z

spec.emu

@@ -54,24 +54,29 @@ contributors: Jordan Harband
 		1. Let _C_ be ? SpeciesConstructor(_R_, %RegExp%).
 		1. Let _flags_ be ? ToString(? Get(_R_, *"flags"*)).
 		1. Let _matcher_ be ? Construct(_C_, &laquo; _R_, _flags_ &raquo;).
+		1. Let _global_ be ? ToBoolean(? Get(_matcher_, *"global"*)).
+		1. Let _fullUnicode_ be ? ToBoolean(? Get(_matcher_, *"unicode"*).


Out of curiosity: I've always wondered why it's called fullUnicode vs. just unicode.

i have no idea ¯\_(ツ)_/¯ I just copied the naming from https://tc39.github.io/ecma262/#sec-regexpbuiltinexec

schuay

Thanks, from a quick look this seems fine 👍

mathiasbynens · 2017-12-14T12:18:18Z

spec.md

-        <td>[[PreviousIndex]]</td>
-        <td>The index of the previous yielded match object.</td>
+        <td>[[Unicode]]</td>
+        <td>A Boolean value to indicate whether the [[IteratingRegExp]] is in full Unicode more or not.</td>


s/more/mode/

In relation to @schuay’s comment, I don’t think the word “full” adds any value here either.

whoops, thanks.

I took "full unicode" from https://tc39.github.io/ecma262/#sec-regexpbuiltinexec ; i'm fine to remove the terminology from this spot :-)

Agree, s/full//

Done. @bterlson, should I avoid using the variable name fullUnicode as well, and perhaps change https://tc39.github.io/ecma262/#sec-regexpbuiltinexec along with it?

littledan

I really like the new version--much more straightforward!

ljharb · 2017-12-14T22:07:25Z

I agree, thanks for suggesting it and bearing with the discussion :-)

Per tc39/proposal-string-matchall#29

Add special handling for “global” regexes, just like @@match has.

08f6369

Fixes #28.

ljharb added committee feedback spec text labels Nov 21, 2017

ljharb requested review from bterlson, mathiasbynens and littledan November 21, 2017 04:57

schuay reviewed Nov 21, 2017

View reviewed changes

mathiasbynens approved these changes Nov 22, 2017

View reviewed changes

ljharb force-pushed the handle_nonmatching_regexen branch from fdca2a2 to d5d14ca Compare November 22, 2017 16:25

zloirock mentioned this pull request Nov 27, 2017

core-js@3 zloirock/core-js#325

Merged

littledan suggested changes Dec 11, 2017

View reviewed changes

ljharb mentioned this pull request Dec 12, 2017

Should the string case be global? #30

Closed

Remove PreviousIndex checking; ensure only global creates multiple …

af51902

…matches Per #29 (comment)

ljharb force-pushed the handle_nonmatching_regexen branch from 3d4ebbb to a95b607 Compare December 14, 2017 08:09

ljharb added a commit that referenced this pull request Dec 14, 2017

Remove PreviousIndex checking; ensure only global creates multiple …

a95b607

…matches Per #29 (comment)

schuay reviewed Dec 14, 2017

View reviewed changes

schuay approved these changes Dec 14, 2017

View reviewed changes

mathiasbynens reviewed Dec 14, 2017

View reviewed changes

ljharb force-pushed the handle_nonmatching_regexen branch from a95b607 to af51902 Compare December 14, 2017 17:19

littledan approved these changes Dec 14, 2017

View reviewed changes

bterlson approved these changes Jan 8, 2018

View reviewed changes

ljharb merged commit 1484928 into master Jan 8, 2018

ljharb deleted the handle_nonmatching_regexen branch January 8, 2018 23:25

ljharb added a commit to es-shims/String.prototype.matchAll that referenced this pull request Jan 9, 2018

First round of changes for tc39/proposal-string-matchall#29

341442a

ljharb added a commit to es-shims/String.prototype.matchAll that referenced this pull request Jan 9, 2018

Second round of changes for tc39/proposal-string-matchall#29

fcb824d

ljharb added a commit to es-shims/String.prototype.matchAll that referenced this pull request Jan 9, 2018

Merge branch 'handle_nonmatching_regexen'

87251b8

Per tc39/proposal-string-matchall#29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add special handling for “global” regexes, just like `@@match` has. #29

Add special handling for “global” regexes, just like `@@match` has. #29

ljharb commented Nov 21, 2017

schuay left a comment

schuay Nov 21, 2017

ljharb Nov 21, 2017

schuay Nov 21, 2017 •

edited

Loading

ljharb Nov 21, 2017

littledan Dec 11, 2017 •

edited

Loading

littledan Dec 12, 2017

littledan Dec 12, 2017

ljharb Dec 12, 2017

littledan Dec 12, 2017

ljharb Dec 12, 2017

schuay Nov 21, 2017

ljharb Nov 21, 2017

schuay Nov 21, 2017

littledan Nov 21, 2017

mathiasbynens left a comment

mathiasbynens Nov 22, 2017

ljharb Nov 22, 2017

mathiasbynens Nov 22, 2017

ljharb Nov 22, 2017

littledan left a comment

ljharb commented Dec 12, 2017

ljharb commented Dec 14, 2017

schuay Dec 14, 2017

ljharb Dec 14, 2017

schuay left a comment

mathiasbynens Dec 14, 2017

ljharb Dec 14, 2017

bterlson Dec 14, 2017

ljharb Dec 14, 2017

littledan left a comment

ljharb commented Dec 14, 2017

		1. Return ! CreateIterResultObject(_match_, false).
		1. Else,

Add special handling for “global” regexes, just like @@match has. #29

Add special handling for “global” regexes, just like @@match has. #29

Conversation

ljharb commented Nov 21, 2017

schuay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schuay Nov 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

littledan Dec 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mathiasbynens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

littledan left a comment

Choose a reason for hiding this comment

ljharb commented Dec 12, 2017

ljharb commented Dec 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schuay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

littledan left a comment

Choose a reason for hiding this comment

ljharb commented Dec 14, 2017

Add special handling for “global” regexes, just like `@@match` has. #29

Add special handling for “global” regexes, just like `@@match` has. #29

schuay Nov 21, 2017 •

edited

Loading

littledan Dec 11, 2017 •

edited

Loading