-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normative: IsRegExp: make Symbol.match
not be uniquely special
#1318
Conversation
Either `IsRegExp` should check all 4 (soon 5) of the regex symbols, or it should check none. Currently this check is only used in 4 places: 3 String methods, to throw a TypeError to indicate that they only take a string and not a regex; and in the `RegExp` constructor to decide whether to extract `.constructor`/`.source`/`.flags`, or to naively stringify. This will only impact objects that do not use `extends RegExp`/`super` to create a "RegExp-like" object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a nice simplification; the use cases for needing this behavior from RegExp-like classes which don't inherit from RegExp is unclear to me. The PR might be a small optimization. I am not as concerned as @ljharb about the semantics without this patch making Symbol.match
"special", however. Cc @gsathya @schuay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know why @@match
was previously made to be special? The PR makes a lot more sense to me.
(Btw, thanks for including those before/after code snippets! It's often tricky to imagine the complete set of observable effects of any given spec change, and having them included in the description really helps.)
cc @hashseed
I didn't even know about this special behavior or |
Does this code still work? (It's kinda like thenable for promises) var re = {
[Symbol.match]() {
// Some matching conditions which are easier to write using code than using a RegExp
return ["foo", "bar"]
}
}
console.log("xx".match(re)) |
Yes it should. The spec text for |
I think this is going backwards. Efforts were made in the conception of ES6 in order make interfaces more generic and less dependent of internal slots like [[RegExpMatcher]]. According to the spec:
If there were no BC constraints, the IsRegExp() abstract operation would only check the presence of the @@match property, not the [[RegExpMatcher]] internal slot. That said, IsRegExp() is wrongly named, especially when its functionality is compared with the one of IsArray(). It should have been called IsRegExpLike(), or IsStringMatcher(). As a concrete example, consider a class that does some sort of pattern matching, but is not based on traditional perl-like regexps, and thus is not a subclass of class FileMatcher { /* impl */ }
let rx = new FileMatcher("*.js") For ease of integration, FileMatcher instances may be use as drop-in replacement of RegExp instances for use in Now, to the point:
A. For String methods such as: "foo.js".startsWith(rx) if In that particular case, however, it suffices to check the presence (or rather the absence) of a @@match method; there is no BC constraint for being over-cautious and looking for [[RegExpMatcher]]. In other words, the IsRegExp() negative check may be replaced by a @@match negative check. B. For use in the RegExp constructor, the use of IsRegExp() is probably less justified. Since we are inside with the real RegExp constructor, maybe we could just check for a real RegExp instance, and look only the [[RegExpMatcher]] internal slot? (I’ve not fully analysed that case.) |
@claudepache is exactly correct WRT to his analysis, particularly the A cases. This was discussed at TC39 meetings and the consensus was that (except for BC constraints) the existence of a @@match method would be the defining characteristic of an "regular expression". IsRegExp is used in Regarding the B case. You have to take into account that the argument pattern may or may not be a built-in RegExp object and that you may be inside of a super call to RegExp. Step 5 is what should be the default case but step 4 is needed to account for possible BC usages. I need to think a bit more about the step 2 usage. |
Regardless of prior consensus, I'm not sure why we'd want to support "Foo-like" patterns that aren't "extends Foo", now that Regardless, if we wanted to keep the |
I would expect match to check for Perhaps having |
@devsnek those 4 operations already do check their symbols; the issue is that there's 3 unrelated String methods that check "match" to throw a "do not use a regex" error; and that the RegExp constructor uses only one of the four special symbols to determine "this is a regex-like". My intention is to remove the magical nature of the "match" symbol - either by removing the checks for it, or by additionally checking for all 4 (soon 5) regex symbols. |
|
It remains to be seen if there is any code on the web that relies on this particular quirk of Symbol.match on an object that doesn't @allenwb regarding number 2, why is "match" special here and not the other three symbols? What if someone wants a RegExp-like object that's "searchable" or "replaceable" or "splittable" but not "matchable" - why would they be forced to implement "match", and why is that any different than forcing them to extend RegExp? |
No, it is not incorrect. It is by design and in accordance to the TC39 consensus that the implementation of @@match was a sufficient duck type test for an object to be considered a "regular expression" for "string or regular expression overloads". Every "regular expression" object needs to provide a @@match but not necessarily a @@replace. You can define a new kind of regular expression that works with "foo".split but which will not work with "foo".replace. One of the reasons for only testing @@match is to avoid having to do a multiple property accesses to determine that an object pass to a String/RegExp overload is actually intended to be treated as an regular expression. Finally, there are other subtilties in IsRegExp. Note that it has a fall back brand test of [[RegExpMatcher]]. that's there for backwards compatibility to with web reality ES3-5 where somebody could have changed the prototype of some RegExp instances to be something other than %RegExp_prototype% but still used them as valid regular expressions by copying the Also note that it is intentional that the test for [[RegExpMatcher]] occurs after the test for @@match. This allows an object to define @@match as a falsy value to avoid doing the legacy [[RegExpMatcher]] brand check. |
Why Multiple property accesses would only have to occur if one of them was missing - I'm not suggesting that the presence of all 4 would mean "regex-like", but the presence of any of them. Something that omits |
Because it was too likely that It been well documented starting in ES6 that the IsRegExp test was based upon @@match:
It is not clear why checking for any of the @@methods is a better test then the above well specified meaning.
Well, not exactly. String As mentioned previously, the primary use of IsRegExp is to reserve for possible future additions of more RegExp overloads on other string methods. It would have been a step too far to reserve and test for things like @@startsWith and that is probably not even how we should implement the most likely extensions. More likely , since they wouldn't have legacy dependencies upon |
The match algorithm both doesn't provide a way to get one match at a time, and also does not have the proper semantics for a global regex, so that wasn't really an option. tbh I'd really like to see some concrete use cases in use on production sites - not contrived ones, but actual libraries in common usage - that either subclass RegExp at all, let alone have a RegExp-like non-subclass object - that might help understand why making |
This PR achieved consensus as-is, however, more investigation is required to ensure web compatibility before merging it. |
A spec change to simplify IsRegExp has been proposed: tc39/ecma262#1318 This CL adds use counters for cases in which the spec change would alter behavior: 1. o[@@match] is trueish but o is not a JSRegExp 2. o[@@match] is falseish (but not undefined) and o is a JSRegExp This is the V8 side of required changes. The Chromium-side CL: https://crrev.com/c/1360730 Drive-by: TNodeify IsRegExp. Tbr: yangguo@chromium.org Bug: v8:8522 Change-Id: I3766e02977f256a80d0e59472d3bafa9c692af9e Reviewed-on: https://chromium-review.googlesource.com/c/1360630 Reviewed-by: Jakob Gruber <jgruber@chromium.org> Reviewed-by: Sathya Gunasekaran <gsathya@chromium.org> Reviewed-by: Mathias Bynens <mathias@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#58064}
A spec change to simplify IsRegExp has been proposed: tc39/ecma262#1318 This CL adds use counters for cases in which the spec change would alter behavior: 1. o[@@match] is trueish but o is not a JSRegExp 2. o[@@match] is falseish (but not undefined) and o is a JSRegExp This is the Chromium side of required changes. The V8-side CL: https://crrev.com/c/1360630 Bug: v8:8522 Change-Id: I2ec9b1a5f54f8e70a02e48f280a548871990aabd Reviewed-on: https://chromium-review.googlesource.com/c/1360730 Reviewed-by: Nico Weber <thakis@chromium.org> Reviewed-by: Marja Hölttä <marja@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#615507}
A spec change to simplify IsRegExp has been proposed: tc39/ecma262#1318 This CL adds use counters for cases in which the spec change would alter behavior: 1. o[@@match] is trueish but o is not a JSRegExp 2. o[@@match] is falseish (but not undefined) and o is a JSRegExp This is the Chromium side of required changes. The V8-side CL: https://crrev.com/c/1360630 Bug: v8:8522 Change-Id: I2ec9b1a5f54f8e70a02e48f280a548871990aabd Reviewed-on: https://chromium-review.googlesource.com/c/1360730 Reviewed-by: Nico Weber <thakis@chromium.org> Reviewed-by: Marja Hölttä <marja@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#615507} Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src Cr-Mirrored-Commit: 66308504a8e28cd4e6e597842af202a342c71ffc
Although I have absolutely no idea why the use counters were this high, based on the results in https://bugs.chromium.org/p/v8/issues/detail?id=8522#c12, this needs to be closed. If in the future we figure out what's causing the high usage, and can try another use counter, or evangelize away the pattern, we can clean this up, but otherwise it'll have to stay as it is. |
@ljharb it can be just a |
ah, that makes a ton of sense. |
Either
IsRegExp
should check all 4 (soon 5) of the regex symbols, or it should check none.Currently this check is only used in 4 places: 3 String methods, to throw a TypeError to indicate that they only take a string and not a regex; and in the
RegExp
constructor to decide whether to extract.constructor
/.source
/.flags
, or to naively stringify.This will only impact objects that do not use
extends RegExp
/super
to create a "RegExp-like" object.Some examples of the current inconsistency:
This PR changes the results for an object that defines
Symbol.match
so that they perform the same as an object defining any of the other regex-related well-known Symbols. It does not impact any operations with regex literals, instances of RegExp, or instances of any subclasses of RegExp.This has a secondary benefit of making
IsRegExp
no longer an observable operation.It is highly unlikely that there is code on the web relying on this behavior; we'd need to determine if that is the case before merging this.
After this PR, these would be the above results: