Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: IsRegExp: make Symbol.match not be uniquely special #1318

Closed
wants to merge 1 commit into from

Conversation

ljharb
Copy link
Member

@ljharb ljharb commented Sep 29, 2018

Either IsRegExp should check all 4 (soon 5) of the regex symbols, or it should check none.

Currently this check is only used in 4 places: 3 String methods, to throw a TypeError to indicate that they only take a string and not a regex; and in the RegExp constructor to decide whether to extract .constructor/.source/.flags, or to naively stringify.

This will only impact objects that do not use extends RegExp/super to create a "RegExp-like" object.

Some examples of the current inconsistency:

class MyRE extends RegExp {}

// similar results for startsWith and endsWith
'a'.includes({ [Symbol.split]: true, toString() { return 'a'; } }); // `true`
'a'.includes({ [Symbol.search]: true, toString() { return 'a'; } }); // `true`
'a'.includes({ [Symbol.replace]: true, toString() { return 'a'; } }); // `true`
'a'.includes({ [Symbol.match]: true, toString() { return 'a'; } }); // throws
'a'.includes(new MyRE('a', 'g')); // throws

new RegExp(/b/g); // `/b/g`
new RegExp(new MyRE('a', 'g')); // `/a/g`, a regular regex, not an instanceof MyRE
new RegExp({ [Symbol.split]: true, toString() { return 'a'; }, source: 'b', flags: 'g' });  // `/a/`
new RegExp({ [Symbol.search]: true, toString() { return 'a'; }, source: 'b', flags: 'g' });  // `/a/`
new RegExp({ [Symbol.replace]: true, toString() { return 'a'; }, source: 'b', flags: 'g' });  // `/a/`
new RegExp({ [Symbol.match]: true, toString() { return 'a'; }, source: 'b', flags: 'g' }); // `/b/g`

This PR changes the results for an object that defines Symbol.match so that they perform the same as an object defining any of the other regex-related well-known Symbols. It does not impact any operations with regex literals, instances of RegExp, or instances of any subclasses of RegExp.

This has a secondary benefit of making IsRegExp no longer an observable operation.

It is highly unlikely that there is code on the web relying on this behavior; we'd need to determine if that is the case before merging this.

After this PR, these would be the above results:

class MyRE extends RegExp {}

// similar results for startsWith and endsWith
'a'.includes({ [Symbol.split]: true, toString() { return 'a'; } }); // `true`
'a'.includes({ [Symbol.search]: true, toString() { return 'a'; } }); // `true`
'a'.includes({ [Symbol.replace]: true, toString() { return 'a'; } }); // `true`
'a'.includes({ [Symbol.match]: true, toString() { return 'a'; } }); // `true`
'a'.includes(new MyRE('a', 'g')); // throws

new RegExp(/b/g); // `/b/g`
new RegExp(new MyRE('a', 'g')); // `/a/g`, a regular regex, not an instanceof MyRE
new RegExp({ [Symbol.split]: true, toString() { return 'a'; }, source: 'b', flags: 'g' });  // `/a/`
new RegExp({ [Symbol.search]: true, toString() { return 'a'; }, source: 'b', flags: 'g' });  // `/a/`
new RegExp({ [Symbol.replace]: true, toString() { return 'a'; }, source: 'b', flags: 'g' });  // `/a/`
new RegExp({ [Symbol.match]: true, toString() { return 'a'; }, source: 'b', flags: 'g' }); // `/a/`

Either `IsRegExp` should check all 4 (soon 5) of the regex symbols, or
it should check none.

Currently this check is only used in 4 places: 3 String methods, to
throw a TypeError to indicate that they only take a string and not a
regex; and in the `RegExp` constructor to decide whether to extract
`.constructor`/`.source`/`.flags`, or to naively stringify.

This will only impact objects that do not use `extends RegExp`/`super`
to create a "RegExp-like" object.
@ljharb ljharb added normative change Affects behavior required to correctly evaluate some ECMAScript source text needs consensus This needs committee consensus before it can be eligible to be merged. labels Sep 29, 2018
Copy link
Member

@littledan littledan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a nice simplification; the use cases for needing this behavior from RegExp-like classes which don't inherit from RegExp is unclear to me. The PR might be a small optimization. I am not as concerned as @ljharb about the semantics without this patch making Symbol.match "special", however. Cc @gsathya @schuay

Copy link
Member

@mathiasbynens mathiasbynens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why @@match was previously made to be special? The PR makes a lot more sense to me.

(Btw, thanks for including those before/after code snippets! It's often tricky to imagine the complete set of observable effects of any given spec change, and having them included in the description really helps.)

cc @hashseed

@hashseed
Copy link

hashseed commented Oct 1, 2018

I didn't even know about this special behavior or @@match. PR lgtm.

@nicolo-ribaudo
Copy link
Member

nicolo-ribaudo commented Oct 1, 2018

Does this code still work? (It's kinda like thenable for promises)

var re = {
   [Symbol.match]() {
               // Some matching conditions which are easier to write using code than using a RegExp
   	return ["foo", "bar"]
   }
}

console.log("xx".match(re))

@hashseed
Copy link

hashseed commented Oct 1, 2018

Does this code still work?

Yes it should. The spec text for String.prototype.match doesn't check for IsRegExp.

@claudepache
Copy link
Contributor

I think this is going backwards. Efforts were made in the conception of ES6 in order make interfaces more generic and less dependent of internal slots like [[RegExpMatcher]].

According to the spec:

The @@match property is used by the IsRegExp abstract operation to identify objects that have the basic behaviour of regular expressions.

If there were no BC constraints, the IsRegExp() abstract operation would only check the presence of the @@match property, not the [[RegExpMatcher]] internal slot.

That said, IsRegExp() is wrongly named, especially when its functionality is compared with the one of IsArray(). It should have been called IsRegExpLike(), or IsStringMatcher().


As a concrete example, consider a class that does some sort of pattern matching, but is not based on traditional perl-like regexps, and thus is not a subclass of RegExp. E.g.:

class FileMatcher { /* impl */ }
let rx = new FileMatcher("*.js")

For ease of integration, FileMatcher instances may be use as drop-in replacement of RegExp instances for use in String.prototype.match, which is what @@match is for. As it does not make sense to use FileMatcher instances in String.prototype.split, etc., methods named @@split, etc. are not implemented by FileMatcher.


Now, to the point:

Currently this check is only used in 4 places: 3 String methods, to throw a TypeError to indicate that they only take a string and not a regex; and in the RegExp constructor to decide whether to extract .constructor/.source/.flags, or to naively stringify.

A. For String methods such as:

"foo.js".startsWith(rx)

if rx is a RegExp instance, a TypeError is thrown in order to protect the user from an evident footgun. For the same reason, the TypeError must be thrown for a RegExp-like objects that implement a @@match method, such as instances of FileMatcher as above. Use of the (wrongly-named) IsRegExp() check is perfectly justified.

In that particular case, however, it suffices to check the presence (or rather the absence) of a @@match method; there is no BC constraint for being over-cautious and looking for [[RegExpMatcher]]. In other words, the IsRegExp() negative check may be replaced by a @@match negative check.


B. For use in the RegExp constructor, the use of IsRegExp() is probably less justified. Since we are inside with the real RegExp constructor, maybe we could just check for a real RegExp instance, and look only the [[RegExpMatcher]] internal slot? (I’ve not fully analysed that case.)

@allenwb
Copy link
Member

allenwb commented Oct 1, 2018

@claudepache is exactly correct WRT to his analysis, particularly the A cases. This was discussed at TC39 meetings and the consensus was that (except for BC constraints) the existence of a @@match method would be the defining characteristic of an "regular expression".

IsRegExp is used in startsWith and other places to allow for the specification to be extended in the future to allow the use of regular expressions as arguments to such methods.

Regarding the B case. You have to take into account that the argument pattern may or may not be a built-in RegExp object and that you may be inside of a super call to RegExp. Step 5 is what should be the default case but step 4 is needed to account for possible BC usages. I need to think a bit more about the step 2 usage.

@ljharb
Copy link
Member Author

ljharb commented Oct 1, 2018

Regardless of prior consensus, I'm not sure why we'd want to support "Foo-like" patterns that aren't "extends Foo", now that class exists.

Regardless, if we wanted to keep the Symbol.match check, the current implementation is incorrect - it should be checking for matchers, splitters, replacers, or searchers - meaning, this operation should be checking any of the regex well-known Symbols (if the motivation is a "regex-like object"

@devsnek
Copy link
Member

devsnek commented Oct 1, 2018

I would expect match to check for @@match, search to check for @@search, etc. @claudepache's post, especially the point about matching/searching that doesn't behave like perl regexp, seems like a very strong argument for this behaviour to me.

Perhaps having IsMatcher, isSearcher (name needs bikeshedding) abstract ops would be better?

@ljharb
Copy link
Member Author

ljharb commented Oct 1, 2018

@devsnek those 4 operations already do check their symbols; the issue is that there's 3 unrelated String methods that check "match" to throw a "do not use a regex" error; and that the RegExp constructor uses only one of the four special symbols to determine "this is a regex-like".

My intention is to remove the magical nature of the "match" symbol - either by removing the checks for it, or by additionally checking for all 4 (soon 5) regex symbols.

@allenwb
Copy link
Member

allenwb commented Oct 1, 2018

Regardless of prior consensus, I'm not sure why we'd want to support "Foo-like" patterns that aren't "extends Foo", now that class exists.

  1. Because changing the results of prior consensus that has observable specified behavior is a breaking change. "Don't break the web"

  2. Because "Foo-like" behavior (ie duck typing) has nothing to do with how a "Foo-like" object is implemented. RegExp defines both the contract of an ES "regular expression" and provides a specific implementation that conforms to that contract. The ES6 design of RegExp makes it easy to extend the built-in implementation via subclassing but there is no particular reason that a completely new regular expression engine that does not use the built-in engine would need or want to inherit from RegExp.

@ljharb
Copy link
Member Author

ljharb commented Oct 1, 2018

It remains to be seen if there is any code on the web that relies on this particular quirk of Symbol.match on an object that doesn't extends RegExp; obviously if there is, and it can't be evangelized (which is likely possible since it would have been code written in the last 3 years), then we wouldn't be able to make the change.

@allenwb regarding number 2, why is "match" special here and not the other three symbols? What if someone wants a RegExp-like object that's "searchable" or "replaceable" or "splittable" but not "matchable" - why would they be forced to implement "match", and why is that any different than forcing them to extend RegExp?

@allenwb
Copy link
Member

allenwb commented Oct 1, 2018

Regardless, if we wanted to keep the Symbol.match check, the current implementation is incorrect -

No, it is not incorrect. It is by design and in accordance to the TC39 consensus that the implementation of @@match was a sufficient duck type test for an object to be considered a "regular expression" for "string or regular expression overloads". Every "regular expression" object needs to provide a @@match but not necessarily a @@replace. You can define a new kind of regular expression that works with "foo".split but which will not work with "foo".replace.

One of the reasons for only testing @@match is to avoid having to do a multiple property accesses to determine that an object pass to a String/RegExp overload is actually intended to be treated as an regular expression.

Finally, there are other subtilties in IsRegExp. Note that it has a fall back brand test of [[RegExpMatcher]]. that's there for backwards compatibility to with web reality ES3-5 where somebody could have changed the prototype of some RegExp instances to be something other than %RegExp_prototype% but still used them as valid regular expressions by copying the exec method to the alternative prototype. ES5 code that did that would not know that it would also need to copy @@match.

Also note that it is intentional that the test for [[RegExpMatcher]] occurs after the test for @@match. This allows an object to define @@match as a falsy value to avoid doing the legacy [[RegExpMatcher]] brand check.

@ljharb
Copy link
Member Author

ljharb commented Oct 1, 2018

Why Symbol.match and not .exec?

Multiple property accesses would only have to occur if one of them was missing - I'm not suggesting that the presence of all 4 would mean "regex-like", but the presence of any of them. Something that omits Symbol.match but includes Symbol.split still is treated like a regex in String.prototype.split, so it should get the same treatment in the RegExp constructor.

@allenwb
Copy link
Member

allenwb commented Oct 2, 2018

Why Symbol.match and not .exec?

Because it was too likely that exec would be used as a method name in a class that has nothing to do with regular expressions and hence yield false positives. A better test would have been to have a @@exec method and for @@match and friends to be defined in terms of it. But, the legacy definition of those methods explicit invoke exec and (my recollection) there was TC39 resistance to adding an extra level of property lookup/invocation to the built-in RegExp exec method. So we agreed to use @@match for the IsRegExp test because it was the simplest of the RegExp @@methods. (there was also concerns expressed about adding the extra level of indirection from the string methods to the RegExp @@method but we got consensus on those. Basically that sort of indirection is necessary if you are going to allow for multiple work-alike regular expression implementations.

It been well documented starting in ES6 that the IsRegExp test was based upon @@match:

The @@match property is used by the IsRegExp abstract operation to identify objects that have the basic behaviour of regular expressions. The absence of a @@match property or the existence of such a property whose value does not Boolean coerce to true indicates that the object is not intended to be used as a regular expression object.

It is not clear why checking for any of the @@methods is a better test then the above well specified meaning.

Something that omits Symbol.match but includes Symbol.split still is treated like a regex in String.prototype.split, so it should get the same treatment in the RegExp constructor.

Well, not exactly. String split doesn't care whether an object with a @@split method behaves as a regular expression or not it just causes String split to delegate to the @@split method.

As mentioned previously, the primary use of IsRegExp is to reserve for possible future additions of more RegExp overloads on other string methods. It would have been a step too far to reserve and test for things like @@startsWith and that is probably not even how we should implement the most likely extensions. More likely , since they wouldn't have legacy dependencies upon exec they should just use @@match. (BTW, for this very reason I've always been a bit uncomfortable with matchAll being defined in terms of @@matchall. I would have preferred for it to have the algorithm defined within matchAll using @@match.

@ljharb
Copy link
Member Author

ljharb commented Oct 2, 2018

The match algorithm both doesn't provide a way to get one match at a time, and also does not have the proper semantics for a global regex, so that wasn't really an option.

tbh I'd really like to see some concrete use cases in use on production sites - not contrived ones, but actual libraries in common usage - that either subclass RegExp at all, let alone have a RegExp-like non-subclass object - that might help understand why making Symbol.match special is desirable, whether it got consensus in the pre-stage-process or not.

@ljharb ljharb added needs data This PR needs more information; such as web compatibility data, “web reality” (what all engines do)… needs test262 tests The proposal should specify how to test an implementation. Ideally via github.com/tc39/test262 has consensus This has committee consensus. and removed needs consensus This needs committee consensus before it can be eligible to be merged. labels Nov 29, 2018
@ljharb
Copy link
Member Author

ljharb commented Nov 29, 2018

This PR achieved consensus as-is, however, more investigation is required to ensure web compatibility before merging it.

guybedford pushed a commit to guybedford/v8 that referenced this pull request Dec 10, 2018
A spec change to simplify IsRegExp has been proposed:

tc39/ecma262#1318

This CL adds use counters for cases in which the spec change would
alter behavior:

1. o[@@match] is trueish but o is not a JSRegExp
2. o[@@match] is falseish (but not undefined) and o is a JSRegExp

This is the V8 side of required changes.
The Chromium-side CL: https://crrev.com/c/1360730

Drive-by: TNodeify IsRegExp.

Tbr: yangguo@chromium.org
Bug: v8:8522
Change-Id: I3766e02977f256a80d0e59472d3bafa9c692af9e
Reviewed-on: https://chromium-review.googlesource.com/c/1360630
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Sathya Gunasekaran <gsathya@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#58064}
aarongable pushed a commit to chromium/chromium that referenced this pull request Dec 11, 2018
A spec change to simplify IsRegExp has been proposed:

tc39/ecma262#1318

This CL adds use counters for cases in which the spec change would
alter behavior:

1. o[@@match] is trueish but o is not a JSRegExp
2. o[@@match] is falseish (but not undefined) and o is a JSRegExp

This is the Chromium side of required changes.
The V8-side CL: https://crrev.com/c/1360630

Bug: v8:8522
Change-Id: I2ec9b1a5f54f8e70a02e48f280a548871990aabd
Reviewed-on: https://chromium-review.googlesource.com/c/1360730
Reviewed-by: Nico Weber <thakis@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#615507}
SergioOpenPeer pushed a commit to webrtc-uwp/chromium-tools that referenced this pull request May 10, 2019
A spec change to simplify IsRegExp has been proposed:

tc39/ecma262#1318

This CL adds use counters for cases in which the spec change would
alter behavior:

1. o[@@match] is trueish but o is not a JSRegExp
2. o[@@match] is falseish (but not undefined) and o is a JSRegExp

This is the Chromium side of required changes.
The V8-side CL: https://crrev.com/c/1360630

Bug: v8:8522
Change-Id: I2ec9b1a5f54f8e70a02e48f280a548871990aabd
Reviewed-on: https://chromium-review.googlesource.com/c/1360730
Reviewed-by: Nico Weber <thakis@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#615507}
Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
Cr-Mirrored-Commit: 66308504a8e28cd4e6e597842af202a342c71ffc
@ljharb
Copy link
Member Author

ljharb commented Aug 29, 2019

Although I have absolutely no idea why the use counters were this high, based on the results in https://bugs.chromium.org/p/v8/issues/detail?id=8522#c12, this needs to be closed.

If in the future we figure out what's causing the high usage, and can try another use counter, or evangelize away the pattern, we can clean this up, but otherwise it'll have to stay as it is.

@ljharb ljharb closed this Aug 29, 2019
@ljharb ljharb deleted the is_regexp branch August 29, 2019 17:28
@zloirock
Copy link

@ljharb it can be just a core-js feature detection.

@ljharb
Copy link
Member Author

ljharb commented Aug 29, 2019

ah, that makes a ton of sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has consensus This has committee consensus. needs data This PR needs more information; such as web compatibility data, “web reality” (what all engines do)… needs test262 tests The proposal should specify how to test an implementation. Ideally via github.com/tc39/test262 normative change Affects behavior required to correctly evaluate some ECMAScript source text
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants