-
Notifications
You must be signed in to change notification settings - Fork 12
Add special handling for “global” regexes, just like @@match
has.
#29
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, two comments from the sidelines.
spec.emu
Outdated
1. Return ! CreateIterResultObject(_match_, *false*). | ||
1. Else, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need these lastIndex comparisons? Now that global
is stored on the iterator, couldn't this branch be simplified to:
if (!global) Return ! CreateIterResultObject(*null*, *true*).
So in total it'd be something like
if (match == null || !global) {
Return ! CreateIterResultObject(*null*, *true*).
} else {
if (global) AdvanceStringIndex.
Return ! CreateIterResultObject(_match_, *false*).
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we do, because a non-global RegExp will only return a single match; but it will return it over and over again. In other words, without those checks, you’d get an infinite iterator for the same match rather than an iterator for a single match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was that we don't need to compare lastIndex
anymore since we now know whether the iterator is global/nonglobal by looking at its flags. So in the non-global case, we could just Return ! CreateIterResultObject(*null*, *true*).
.
Edit: I just realized my suggestion doesn't cover returning the first result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the non-global case, RegExpExec will never return anything but the same match object for a built in regex but this isn’t the case for a regex subclass; I’m pretty sure this check can’t be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we still do want to support non-global iteration and not just flip on the global flag, matching split, then could we still simplify the logic along the lines that @schuay points out?: We just need [[Done]], not [[PreviousIndex]]. Couldn't we do,
- If O.[[Done]] is true, NOTE: I'm not sure why the current spec text keeps calling exec if done was already set
- Return ! CreateIterResultObject(null, true).
- Let match be RegExpExec(R, S).
- If match is null,
- ... current text...
- If global,
- ...text proposed in this patch...
- Else,
- Set O.[[Done]] to true
- Return ! CreateIterResultObject(match, false).
With or without this change, the non-global case still seems weird to me, more a bug farm than a feature. It was one thing when it fell out of logic that worked consistently for the global path and the non-global path fell out (as in the current spec text, or in my suggested change); if we had to end up with something like what you have in this patch, I'd be even more skeptical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for you to merge your patches however you want, but I don't think the logic you have here should be present in the version that gets to stage 3. A custom regexp can add this extra logic itself; no need to build it in already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference between this case and ES2015 RegExp subclassing is that what's added in ES6 is just about exposing logic that works well for the base class. It doesn't expose code paths which are motivated entirely by subclassing, AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything that looks at Symbol.match
or any of the other regex symbols is motivated entirely by subclassing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what I mean; I mean, it doesn't have particular algorithms which are just there for cases which only other subclasses might want to access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our offline discussion; I've been convinced that implementing this suggestion will not, in fact, preclude either any future built-in "multiple match" flag, nor any current subclassed regex from implementing one. I'll update the PR with that change after verifying that it passes my test cases.
spec.emu
Outdated
1. Let _iterator_ be ObjectCreate(<emu-xref href="#%RegExpStringIteratorPrototype%">%RegExpStringIteratorPrototype%</emu-xref>, « [[IteratingRegExp]], [[IteratedString]], [[PreviousIndex]], [[Done]] »). | ||
1. Set _iterator_.[[IteratingRegExp]] to _R_. | ||
1. Set _iterator_.[[IteratedString]] to _S_. | ||
1. Set _iterator_.[[Global]] to _global_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, but we we now have to keep in mind later that there might be a mismatch between flags stored on the iterator and regexp instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a mismatch would only occur if a RegExp changed its flags mid-exec; is that a use case we need to support?
If so, I’d need to fetch the global and unicode flags during every iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO it shouldn't be supported, just wanted point out the possibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, this may happen due to RegExp.prototype.compile
.
The existing RegExp features are comfortable with reading some flags ahead of time too. For example, [RegExp.prototype[@@replace]](https://tc39.github.io/ecma262/#sec-regexp.prototype-@@replace) reads the global flag once, whereas an overridden RegExp.prototype.exec
function could call compile
and turn the flag off in the middle of the loop.
Because of that precedent (and because I can't think of a legitimate reason to use the other behavior) I'm happy with pre-reading the flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % nits
spec.md
Outdated
</tr> | ||
<tr> | ||
<td>[[Unicode]]</td> | ||
<td>A Boolean value to indicate whether the [[IteratingRegExp]] is in full unicode more or not.</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Unicode (capital letter U)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed!
spec.md
Outdated
</tr> | ||
<tr> | ||
<td>[[Done]]</td> | ||
<td>Boolean value representing whether the iteration is complete or not.</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should follow the sentence format used for [[Global]]
etc. above, i.e.
<td>A Boolean value to indicate whether the iteration is complete or not.</td>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed!
fdca2a2
to
d5d14ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like the comment I added about complexity to be addressed. Aside from that, a style nit: It was kind of annoying to review this change, digging through the rendered HTML, and just assuming the markdown wasn't there. It's fine to maintain this style here, but I wouldn't recommend this layout for new proposals. It's easier to do reviews if the rendered spec is checked in in another commit.
There's a https://github.com/tc39/proposal-string-matchall/pull/29/files#diff-d17ba2470b0d23fe2f52dca6b95ce865 - ie, only |
3d4ebbb
to
a95b607
Compare
Updated. @littledan (@mathiasbynens, @bterlson, @schuay) - please take another look! <3 |
@@ -54,24 +54,29 @@ contributors: Jordan Harband | |||
1. Let _C_ be ? SpeciesConstructor(_R_, %RegExp%). | |||
1. Let _flags_ be ? ToString(? Get(_R_, *"flags"*)). | |||
1. Let _matcher_ be ? Construct(_C_, « _R_, _flags_ »). | |||
1. Let _global_ be ? ToBoolean(? Get(_matcher_, *"global"*)). | |||
1. Let _fullUnicode_ be ? ToBoolean(? Get(_matcher_, *"unicode"*). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity: I've always wondered why it's called fullUnicode
vs. just unicode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have no idea ¯\_(ツ)_/¯ I just copied the naming from https://tc39.github.io/ecma262/#sec-regexpbuiltinexec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, from a quick look this seems fine 👍
spec.md
Outdated
<td>[[PreviousIndex]]</td> | ||
<td>The index of the previous yielded match object.</td> | ||
<td>[[Unicode]]</td> | ||
<td>A Boolean value to indicate whether the [[IteratingRegExp]] is in full Unicode more or not.</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/more/mode/
In relation to @schuay’s comment, I don’t think the word “full” adds any value here either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, thanks.
I took "full unicode" from https://tc39.github.io/ecma262/#sec-regexpbuiltinexec ; i'm fine to remove the terminology from this spot :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, s/full//
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. @bterlson, should I avoid using the variable name fullUnicode
as well, and perhaps change https://tc39.github.io/ecma262/#sec-regexpbuiltinexec along with it?
a95b607
to
af51902
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the new version--much more straightforward!
I agree, thanks for suggesting it and bearing with the discussion :-) |
Fixes #28.
cc @schuay
This PR copies the special "global" handling from
@@match
, and adds test for non-matching regexes.It still requires the previousIndex handling, because
match
returns one result immediately, andmatchAll
still needs to know when it's received the same result, and to return "done".