Normative: IsRegExp: make `Symbol.match` not be uniquely special #1318

ljharb · 2018-09-29T21:58:18Z

Either IsRegExp should check all 4 (soon 5) of the regex symbols, or it should check none.

Currently this check is only used in 4 places: 3 String methods, to throw a TypeError to indicate that they only take a string and not a regex; and in the RegExp constructor to decide whether to extract .constructor/.source/.flags, or to naively stringify.

This will only impact objects that do not use extends RegExp/super to create a "RegExp-like" object.

Some examples of the current inconsistency:

class MyRE extends RegExp {}

// similar results for startsWith and endsWith
'a'.includes({ [Symbol.split]: true, toString() { return 'a'; } }); // `true`
'a'.includes({ [Symbol.search]: true, toString() { return 'a'; } }); // `true`
'a'.includes({ [Symbol.replace]: true, toString() { return 'a'; } }); // `true`
'a'.includes({ [Symbol.match]: true, toString() { return 'a'; } }); // throws
'a'.includes(new MyRE('a', 'g')); // throws

new RegExp(/b/g); // `/b/g`
new RegExp(new MyRE('a', 'g')); // `/a/g`, a regular regex, not an instanceof MyRE
new RegExp({ [Symbol.split]: true, toString() { return 'a'; }, source: 'b', flags: 'g' });  // `/a/`
new RegExp({ [Symbol.search]: true, toString() { return 'a'; }, source: 'b', flags: 'g' });  // `/a/`
new RegExp({ [Symbol.replace]: true, toString() { return 'a'; }, source: 'b', flags: 'g' });  // `/a/`
new RegExp({ [Symbol.match]: true, toString() { return 'a'; }, source: 'b', flags: 'g' }); // `/b/g`

This PR changes the results for an object that defines Symbol.match so that they perform the same as an object defining any of the other regex-related well-known Symbols. It does not impact any operations with regex literals, instances of RegExp, or instances of any subclasses of RegExp.

This has a secondary benefit of making IsRegExp no longer an observable operation.

It is highly unlikely that there is code on the web relying on this behavior; we'd need to determine if that is the case before merging this.

After this PR, these would be the above results:

class MyRE extends RegExp {}

// similar results for startsWith and endsWith
'a'.includes({ [Symbol.split]: true, toString() { return 'a'; } }); // `true`
'a'.includes({ [Symbol.search]: true, toString() { return 'a'; } }); // `true`
'a'.includes({ [Symbol.replace]: true, toString() { return 'a'; } }); // `true`
'a'.includes({ [Symbol.match]: true, toString() { return 'a'; } }); // `true`
'a'.includes(new MyRE('a', 'g')); // throws

new RegExp(/b/g); // `/b/g`
new RegExp(new MyRE('a', 'g')); // `/a/g`, a regular regex, not an instanceof MyRE
new RegExp({ [Symbol.split]: true, toString() { return 'a'; }, source: 'b', flags: 'g' });  // `/a/`
new RegExp({ [Symbol.search]: true, toString() { return 'a'; }, source: 'b', flags: 'g' });  // `/a/`
new RegExp({ [Symbol.replace]: true, toString() { return 'a'; }, source: 'b', flags: 'g' });  // `/a/`
new RegExp({ [Symbol.match]: true, toString() { return 'a'; }, source: 'b', flags: 'g' }); // `/a/`

Either `IsRegExp` should check all 4 (soon 5) of the regex symbols, or it should check none. Currently this check is only used in 4 places: 3 String methods, to throw a TypeError to indicate that they only take a string and not a regex; and in the `RegExp` constructor to decide whether to extract `.constructor`/`.source`/`.flags`, or to naively stringify. This will only impact objects that do not use `extends RegExp`/`super` to create a "RegExp-like" object.

littledan

This seems like a nice simplification; the use cases for needing this behavior from RegExp-like classes which don't inherit from RegExp is unclear to me. The PR might be a small optimization. I am not as concerned as @ljharb about the semantics without this patch making Symbol.match "special", however. Cc @gsathya @schuay

mathiasbynens

Do we know why @@match was previously made to be special? The PR makes a lot more sense to me.

(Btw, thanks for including those before/after code snippets! It's often tricky to imagine the complete set of observable effects of any given spec change, and having them included in the description really helps.)

cc @hashseed

hashseed · 2018-10-01T07:20:17Z

I didn't even know about this special behavior or @@match. PR lgtm.

nicolo-ribaudo · 2018-10-01T07:24:46Z

Does this code still work? (It's kinda like thenable for promises)

var re = {
   [Symbol.match]() {
               // Some matching conditions which are easier to write using code than using a RegExp
   	return ["foo", "bar"]
   }
}

console.log("xx".match(re))

hashseed · 2018-10-01T07:28:32Z

Does this code still work?

Yes it should. The spec text for String.prototype.match doesn't check for IsRegExp.

claudepache · 2018-10-01T09:40:19Z

I think this is going backwards. Efforts were made in the conception of ES6 in order make interfaces more generic and less dependent of internal slots like [[RegExpMatcher]].

According to the spec:

The @@match property is used by the IsRegExp abstract operation to identify objects that have the basic behaviour of regular expressions.

If there were no BC constraints, the IsRegExp() abstract operation would only check the presence of the @@match property, not the [[RegExpMatcher]] internal slot.

That said, IsRegExp() is wrongly named, especially when its functionality is compared with the one of IsArray(). It should have been called IsRegExpLike(), or IsStringMatcher().

As a concrete example, consider a class that does some sort of pattern matching, but is not based on traditional perl-like regexps, and thus is not a subclass of RegExp. E.g.:

class FileMatcher { /* impl */ }
let rx = new FileMatcher("*.js")

For ease of integration, FileMatcher instances may be use as drop-in replacement of RegExp instances for use in String.prototype.match, which is what @@match is for. As it does not make sense to use FileMatcher instances in String.prototype.split, etc., methods named @@split, etc. are not implemented by FileMatcher.

Now, to the point:

Currently this check is only used in 4 places: 3 String methods, to throw a TypeError to indicate that they only take a string and not a regex; and in the RegExp constructor to decide whether to extract .constructor/.source/.flags, or to naively stringify.

A. For String methods such as:

"foo.js".startsWith(rx)

if rx is a RegExp instance, a TypeError is thrown in order to protect the user from an evident footgun. For the same reason, the TypeError must be thrown for a RegExp-like objects that implement a @@match method, such as instances of FileMatcher as above. Use of the (wrongly-named) IsRegExp() check is perfectly justified.

In that particular case, however, it suffices to check the presence (or rather the absence) of a @@match method; there is no BC constraint for being over-cautious and looking for [[RegExpMatcher]]. In other words, the IsRegExp() negative check may be replaced by a @@match negative check.

B. For use in the RegExp constructor, the use of IsRegExp() is probably less justified. Since we are inside with the real RegExp constructor, maybe we could just check for a real RegExp instance, and look only the [[RegExpMatcher]] internal slot? (I’ve not fully analysed that case.)

allenwb · 2018-10-01T18:32:32Z

@claudepache is exactly correct WRT to his analysis, particularly the A cases. This was discussed at TC39 meetings and the consensus was that (except for BC constraints) the existence of a @@match method would be the defining characteristic of an "regular expression".

IsRegExp is used in startsWith and other places to allow for the specification to be extended in the future to allow the use of regular expressions as arguments to such methods.

Regarding the B case. You have to take into account that the argument pattern may or may not be a built-in RegExp object and that you may be inside of a super call to RegExp. Step 5 is what should be the default case but step 4 is needed to account for possible BC usages. I need to think a bit more about the step 2 usage.

ljharb · 2018-10-01T18:36:05Z

Regardless of prior consensus, I'm not sure why we'd want to support "Foo-like" patterns that aren't "extends Foo", now that class exists.

Regardless, if we wanted to keep the Symbol.match check, the current implementation is incorrect - it should be checking for matchers, splitters, replacers, or searchers - meaning, this operation should be checking any of the regex well-known Symbols (if the motivation is a "regex-like object"

devsnek · 2018-10-01T19:48:18Z

I would expect match to check for @@match, search to check for @@search, etc. @claudepache's post, especially the point about matching/searching that doesn't behave like perl regexp, seems like a very strong argument for this behaviour to me.

Perhaps having IsMatcher, isSearcher (name needs bikeshedding) abstract ops would be better?

ljharb · 2018-10-01T20:59:41Z

@devsnek those 4 operations already do check their symbols; the issue is that there's 3 unrelated String methods that check "match" to throw a "do not use a regex" error; and that the RegExp constructor uses only one of the four special symbols to determine "this is a regex-like".

My intention is to remove the magical nature of the "match" symbol - either by removing the checks for it, or by additionally checking for all 4 (soon 5) regex symbols.

allenwb · 2018-10-01T22:44:56Z

Regardless of prior consensus, I'm not sure why we'd want to support "Foo-like" patterns that aren't "extends Foo", now that class exists.

Because changing the results of prior consensus that has observable specified behavior is a breaking change. "Don't break the web"
Because "Foo-like" behavior (ie duck typing) has nothing to do with how a "Foo-like" object is implemented. RegExp defines both the contract of an ES "regular expression" and provides a specific implementation that conforms to that contract. The ES6 design of RegExp makes it easy to extend the built-in implementation via subclassing but there is no particular reason that a completely new regular expression engine that does not use the built-in engine would need or want to inherit from RegExp.

ljharb · 2018-10-01T22:53:45Z

It remains to be seen if there is any code on the web that relies on this particular quirk of Symbol.match on an object that doesn't extends RegExp; obviously if there is, and it can't be evangelized (which is likely possible since it would have been code written in the last 3 years), then we wouldn't be able to make the change.

@allenwb regarding number 2, why is "match" special here and not the other three symbols? What if someone wants a RegExp-like object that's "searchable" or "replaceable" or "splittable" but not "matchable" - why would they be forced to implement "match", and why is that any different than forcing them to extend RegExp?

allenwb · 2018-10-01T23:14:04Z

Regardless, if we wanted to keep the Symbol.match check, the current implementation is incorrect -

No, it is not incorrect. It is by design and in accordance to the TC39 consensus that the implementation of @@match was a sufficient duck type test for an object to be considered a "regular expression" for "string or regular expression overloads". Every "regular expression" object needs to provide a @@match but not necessarily a @@replace. You can define a new kind of regular expression that works with "foo".split but which will not work with "foo".replace.

One of the reasons for only testing @@match is to avoid having to do a multiple property accesses to determine that an object pass to a String/RegExp overload is actually intended to be treated as an regular expression.

Finally, there are other subtilties in IsRegExp. Note that it has a fall back brand test of [[RegExpMatcher]]. that's there for backwards compatibility to with web reality ES3-5 where somebody could have changed the prototype of some RegExp instances to be something other than %RegExp_prototype% but still used them as valid regular expressions by copying the exec method to the alternative prototype. ES5 code that did that would not know that it would also need to copy @@match.

Also note that it is intentional that the test for [[RegExpMatcher]] occurs after the test for @@match. This allows an object to define @@match as a falsy value to avoid doing the legacy [[RegExpMatcher]] brand check.

ljharb · 2018-10-01T23:21:18Z

Why Symbol.match and not .exec?

Multiple property accesses would only have to occur if one of them was missing - I'm not suggesting that the presence of all 4 would mean "regex-like", but the presence of any of them. Something that omits Symbol.match but includes Symbol.split still is treated like a regex in String.prototype.split, so it should get the same treatment in the RegExp constructor.

allenwb · 2018-10-02T00:12:37Z

Why Symbol.match and not .exec?

Because it was too likely that exec would be used as a method name in a class that has nothing to do with regular expressions and hence yield false positives. A better test would have been to have a @@exec method and for @@match and friends to be defined in terms of it. But, the legacy definition of those methods explicit invoke exec and (my recollection) there was TC39 resistance to adding an extra level of property lookup/invocation to the built-in RegExp exec method. So we agreed to use @@match for the IsRegExp test because it was the simplest of the RegExp @@methods. (there was also concerns expressed about adding the extra level of indirection from the string methods to the RegExp @@method but we got consensus on those. Basically that sort of indirection is necessary if you are going to allow for multiple work-alike regular expression implementations.

It been well documented starting in ES6 that the IsRegExp test was based upon @@match:

The @@match property is used by the IsRegExp abstract operation to identify objects that have the basic behaviour of regular expressions. The absence of a @@match property or the existence of such a property whose value does not Boolean coerce to true indicates that the object is not intended to be used as a regular expression object.

It is not clear why checking for any of the @@methods is a better test then the above well specified meaning.

Something that omits Symbol.match but includes Symbol.split still is treated like a regex in String.prototype.split, so it should get the same treatment in the RegExp constructor.

Well, not exactly. String split doesn't care whether an object with a @@split method behaves as a regular expression or not it just causes String split to delegate to the @@split method.

As mentioned previously, the primary use of IsRegExp is to reserve for possible future additions of more RegExp overloads on other string methods. It would have been a step too far to reserve and test for things like @@startsWith and that is probably not even how we should implement the most likely extensions. More likely , since they wouldn't have legacy dependencies upon exec they should just use @@match. (BTW, for this very reason I've always been a bit uncomfortable with matchAll being defined in terms of @@matchall. I would have preferred for it to have the algorithm defined within matchAll using @@match.

ljharb · 2018-10-02T00:15:46Z

The match algorithm both doesn't provide a way to get one match at a time, and also does not have the proper semantics for a global regex, so that wasn't really an option.

tbh I'd really like to see some concrete use cases in use on production sites - not contrived ones, but actual libraries in common usage - that either subclass RegExp at all, let alone have a RegExp-like non-subclass object - that might help understand why making Symbol.match special is desirable, whether it got consensus in the pre-stage-process or not.

ljharb · 2018-11-29T00:17:54Z

This PR achieved consensus as-is, however, more investigation is required to ensure web compatibility before merging it.

@match

A spec change to simplify IsRegExp has been proposed: tc39/ecma262#1318 This CL adds use counters for cases in which the spec change would alter behavior: 1. o[@@match] is trueish but o is not a JSRegExp 2. o[@@match] is falseish (but not undefined) and o is a JSRegExp This is the V8 side of required changes. The Chromium-side CL: https://crrev.com/c/1360730 Drive-by: TNodeify IsRegExp. Tbr: yangguo@chromium.org Bug: v8:8522 Change-Id: I3766e02977f256a80d0e59472d3bafa9c692af9e Reviewed-on: https://chromium-review.googlesource.com/c/1360630 Reviewed-by: Jakob Gruber <jgruber@chromium.org> Reviewed-by: Sathya Gunasekaran <gsathya@chromium.org> Reviewed-by: Mathias Bynens <mathias@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#58064}

@match

A spec change to simplify IsRegExp has been proposed: tc39/ecma262#1318 This CL adds use counters for cases in which the spec change would alter behavior: 1. o[@@match] is trueish but o is not a JSRegExp 2. o[@@match] is falseish (but not undefined) and o is a JSRegExp This is the Chromium side of required changes. The V8-side CL: https://crrev.com/c/1360630 Bug: v8:8522 Change-Id: I2ec9b1a5f54f8e70a02e48f280a548871990aabd Reviewed-on: https://chromium-review.googlesource.com/c/1360730 Reviewed-by: Nico Weber <thakis@chromium.org> Reviewed-by: Marja Hölttä <marja@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#615507}

@match

A spec change to simplify IsRegExp has been proposed: tc39/ecma262#1318 This CL adds use counters for cases in which the spec change would alter behavior: 1. o[@@match] is trueish but o is not a JSRegExp 2. o[@@match] is falseish (but not undefined) and o is a JSRegExp This is the Chromium side of required changes. The V8-side CL: https://crrev.com/c/1360630 Bug: v8:8522 Change-Id: I2ec9b1a5f54f8e70a02e48f280a548871990aabd Reviewed-on: https://chromium-review.googlesource.com/c/1360730 Reviewed-by: Nico Weber <thakis@chromium.org> Reviewed-by: Marja Hölttä <marja@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#615507} Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src Cr-Mirrored-Commit: 66308504a8e28cd4e6e597842af202a342c71ffc

ljharb · 2019-08-29T17:28:30Z

Although I have absolutely no idea why the use counters were this high, based on the results in https://bugs.chromium.org/p/v8/issues/detail?id=8522#c12, this needs to be closed.

If in the future we figure out what's causing the high usage, and can try another use counter, or evangelize away the pattern, we can clean this up, but otherwise it'll have to stay as it is.

zloirock · 2019-08-29T17:38:29Z

@ljharb it can be just a core-js feature detection.

ljharb · 2019-08-29T17:55:55Z

ah, that makes a ton of sense.

ljharb added normative change Affects behavior required to correctly evaluate some ECMAScript source text needs consensus This needs committee consensus before it can be eligible to be merged. labels Sep 29, 2018

ljharb requested review from mathiasbynens and littledan September 29, 2018 21:58

littledan approved these changes Sep 30, 2018

View reviewed changes

mathiasbynens approved these changes Oct 1, 2018

View reviewed changes

ljharb mentioned this pull request Nov 6, 2018

Committee feedback: fallback behavior tc39/proposal-string-matchall#39

Closed

ljharb closed this Aug 29, 2019

ljharb deleted the is_regexp branch August 29, 2019 17:28

ljharb mentioned this pull request Jun 30, 2020

Taking @@matchAll into account for isRegExp as well? #2073

Closed

anba mentioned this pull request Apr 12, 2023

ObjectImplementsTemporalCalendarProtocol: Excessive checks of properties? tc39/proposal-temporal#2552

Closed

Normative: IsRegExp: make Symbol.match not be uniquely special #1318

Normative: IsRegExp: make Symbol.match not be uniquely special #1318

Conversation

ljharb commented Sep 29, 2018

Uh oh!

littledan left a comment

Choose a reason for hiding this comment

Uh oh!

mathiasbynens left a comment

Choose a reason for hiding this comment

Uh oh!

hashseed commented Oct 1, 2018

Uh oh!

nicolo-ribaudo commented Oct 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hashseed commented Oct 1, 2018

Uh oh!

claudepache commented Oct 1, 2018

Uh oh!

allenwb commented Oct 1, 2018

Uh oh!

ljharb commented Oct 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devsnek commented Oct 1, 2018

Uh oh!

ljharb commented Oct 1, 2018

Uh oh!

allenwb commented Oct 1, 2018

Uh oh!

ljharb commented Oct 1, 2018

Uh oh!

allenwb commented Oct 1, 2018

Uh oh!

ljharb commented Oct 1, 2018

Uh oh!

allenwb commented Oct 2, 2018

Uh oh!

ljharb commented Oct 2, 2018

Uh oh!

ljharb commented Nov 29, 2018

Uh oh!

ljharb commented Aug 29, 2019

Uh oh!

zloirock commented Aug 29, 2019

Uh oh!

ljharb commented Aug 29, 2019

Uh oh!

Uh oh!

Normative: IsRegExp: make `Symbol.match` not be uniquely special #1318

Normative: IsRegExp: make `Symbol.match` not be uniquely special #1318

nicolo-ribaudo commented Oct 1, 2018 •

edited

Loading

ljharb commented Oct 1, 2018 •

edited

Loading