Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Add special handling for “global” regexes, just like @@match has. #29

Merged
merged 2 commits into from
Jan 8, 2018

Conversation

ljharb
Copy link
Member

@ljharb ljharb commented Nov 21, 2017

Fixes #28.

cc @schuay

This PR copies the special "global" handling from @@match, and adds test for non-matching regexes.

It still requires the previousIndex handling, because match returns one result immediately, and matchAll still needs to know when it's received the same result, and to return "done".

Copy link

@schuay schuay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, two comments from the sidelines.

spec.emu Outdated
1. Return ! CreateIterResultObject(_match_, *false*).
1. Else,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need these lastIndex comparisons? Now that global is stored on the iterator, couldn't this branch be simplified to:

if (!global) Return ! CreateIterResultObject(*null*, *true*).

So in total it'd be something like

if (match == null || !global) {
  Return ! CreateIterResultObject(*null*, *true*).
} else {
  if (global) AdvanceStringIndex.
  Return ! CreateIterResultObject(_match_, *false*).
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we do, because a non-global RegExp will only return a single match; but it will return it over and over again. In other words, without those checks, you’d get an infinite iterator for the same match rather than an iterator for a single match.

Copy link

@schuay schuay Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that we don't need to compare lastIndex anymore since we now know whether the iterator is global/nonglobal by looking at its flags. So in the non-global case, we could just Return ! CreateIterResultObject(*null*, *true*)..

Edit: I just realized my suggestion doesn't cover returning the first result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the non-global case, RegExpExec will never return anything but the same match object for a built in regex but this isn’t the case for a regex subclass; I’m pretty sure this check can’t be avoided.

Copy link
Member

@littledan littledan Dec 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we still do want to support non-global iteration and not just flip on the global flag, matching split, then could we still simplify the logic along the lines that @schuay points out?: We just need [[Done]], not [[PreviousIndex]]. Couldn't we do,

  1. If O.[[Done]] is true, NOTE: I'm not sure why the current spec text keeps calling exec if done was already set
    1. Return ! CreateIterResultObject(null, true).
  2. Let match be RegExpExec(R, S).
  3. If match is null,
    1. ... current text...
  4. If global,
    1. ...text proposed in this patch...
  5. Else,
    1. Set O.[[Done]] to true
    2. Return ! CreateIterResultObject(match, false).

With or without this change, the non-global case still seems weird to me, more a bug farm than a feature. It was one thing when it fell out of logic that worked consistently for the global path and the non-global path fell out (as in the current spec text, or in my suggested change); if we had to end up with something like what you have in this patch, I'd be even more skeptical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for you to merge your patches however you want, but I don't think the logic you have here should be present in the version that gets to stage 3. A custom regexp can add this extra logic itself; no need to build it in already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between this case and ES2015 RegExp subclassing is that what's added in ES6 is just about exposing logic that works well for the base class. It doesn't expose code paths which are motivated entirely by subclassing, AFAICT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything that looks at Symbol.match or any of the other regex symbols is motivated entirely by subclassing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I mean; I mean, it doesn't have particular algorithms which are just there for cases which only other subclasses might want to access.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our offline discussion; I've been convinced that implementing this suggestion will not, in fact, preclude either any future built-in "multiple match" flag, nor any current subclassed regex from implementing one. I'll update the PR with that change after verifying that it passes my test cases.

spec.emu Outdated
1. Let _iterator_ be ObjectCreate(<emu-xref href="#%RegExpStringIteratorPrototype%">%RegExpStringIteratorPrototype%</emu-xref>, &laquo; [[IteratingRegExp]], [[IteratedString]], [[PreviousIndex]], [[Done]] &raquo;).
1. Set _iterator_.[[IteratingRegExp]] to _R_.
1. Set _iterator_.[[IteratedString]] to _S_.
1. Set _iterator_.[[Global]] to _global_.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, but we we now have to keep in mind later that there might be a mismatch between flags stored on the iterator and regexp instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a mismatch would only occur if a RegExp changed its flags mid-exec; is that a use case we need to support?

If so, I’d need to fetch the global and unicode flags during every iteration.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it shouldn't be supported, just wanted point out the possibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, this may happen due to RegExp.prototype.compile.

The existing RegExp features are comfortable with reading some flags ahead of time too. For example, [RegExp.prototype[@@replace]](https://tc39.github.io/ecma262/#sec-regexp.prototype-@@replace) reads the global flag once, whereas an overridden RegExp.prototype.exec function could call compile and turn the flag off in the middle of the loop.

Because of that precedent (and because I can't think of a legitimate reason to use the other behavior) I'm happy with pre-reading the flag.

Copy link
Member

@mathiasbynens mathiasbynens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % nits

spec.md Outdated
</tr>
<tr>
<td>[[Unicode]]</td>
<td>A Boolean value to indicate whether the [[IteratingRegExp]] is in full unicode more or not.</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Unicode (capital letter U)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed!

spec.md Outdated
</tr>
<tr>
<td>[[Done]]</td>
<td>Boolean value representing whether the iteration is complete or not.</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should follow the sentence format used for [[Global]] etc. above, i.e.

<td>A Boolean value to indicate whether the iteration is complete or not.</td>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed!

@ljharb ljharb force-pushed the handle_nonmatching_regexen branch from fdca2a2 to d5d14ca Compare November 22, 2017 16:25
Copy link
Member

@littledan littledan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like the comment I added about complexity to be addressed. Aside from that, a style nit: It was kind of annoying to review this change, digging through the rendered HTML, and just assuming the markdown wasn't there. It's fine to maintain this style here, but I wouldn't recommend this layout for new proposals. It's easier to do reviews if the rendered spec is checked in in another commit.

@ljharb
Copy link
Member Author

ljharb commented Dec 12, 2017

There's a .gitattributes file that github isn't respecting that should hide the rendered spec from the diff; but either way that's something that should be discussed outside this repo.

https://github.com/tc39/proposal-string-matchall/pull/29/files#diff-d17ba2470b0d23fe2f52dca6b95ce865 - ie, only spec.emu - is the file that's worth reviewing.

@ljharb ljharb force-pushed the handle_nonmatching_regexen branch from 3d4ebbb to a95b607 Compare December 14, 2017 08:09
@ljharb
Copy link
Member Author

ljharb commented Dec 14, 2017

Updated. @littledan (@mathiasbynens, @bterlson, @schuay) - please take another look! <3

@@ -54,24 +54,29 @@ contributors: Jordan Harband
1. Let _C_ be ? SpeciesConstructor(_R_, %RegExp%).
1. Let _flags_ be ? ToString(? Get(_R_, *"flags"*)).
1. Let _matcher_ be ? Construct(_C_, &laquo; _R_, _flags_ &raquo;).
1. Let _global_ be ? ToBoolean(? Get(_matcher_, *"global"*)).
1. Let _fullUnicode_ be ? ToBoolean(? Get(_matcher_, *"unicode"*).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: I've always wondered why it's called fullUnicode vs. just unicode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have no idea ¯\_(ツ)_/¯ I just copied the naming from https://tc39.github.io/ecma262/#sec-regexpbuiltinexec

Copy link

@schuay schuay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, from a quick look this seems fine 👍

spec.md Outdated
<td>[[PreviousIndex]]</td>
<td>The index of the previous yielded match object.</td>
<td>[[Unicode]]</td>
<td>A Boolean value to indicate whether the [[IteratingRegExp]] is in full Unicode more or not.</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/more/mode/

In relation to @schuay’s comment, I don’t think the word “full” adds any value here either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, thanks.

I took "full unicode" from https://tc39.github.io/ecma262/#sec-regexpbuiltinexec ; i'm fine to remove the terminology from this spot :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, s/full//

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. @bterlson, should I avoid using the variable name fullUnicode as well, and perhaps change https://tc39.github.io/ecma262/#sec-regexpbuiltinexec along with it?

@ljharb ljharb force-pushed the handle_nonmatching_regexen branch from a95b607 to af51902 Compare December 14, 2017 17:19
Copy link
Member

@littledan littledan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the new version--much more straightforward!

@ljharb
Copy link
Member Author

ljharb commented Dec 14, 2017

I agree, thanks for suggesting it and bearing with the discussion :-)

@ljharb ljharb merged commit 1484928 into master Jan 8, 2018
@ljharb ljharb deleted the handle_nonmatching_regexen branch January 8, 2018 23:25
ljharb added a commit to es-shims/String.prototype.matchAll that referenced this pull request Jan 9, 2018
ljharb added a commit to es-shims/String.prototype.matchAll that referenced this pull request Jan 9, 2018
ljharb added a commit to es-shims/String.prototype.matchAll that referenced this pull request Jan 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants