Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting Variable/Extractor matcher hook? #303

Closed
tabatkins opened this issue Oct 2, 2023 · 8 comments · Fixed by #293
Closed

Splitting Variable/Extractor matcher hook? #303

tabatkins opened this issue Oct 2, 2023 · 8 comments · Fixed by #293

Comments

@tabatkins
Copy link
Collaborator

tabatkins commented Oct 2, 2023

Right now, we're using a single Symbol.customMatcher method to handle both Variable Patterns Foo and Extractor Patterns Foo(...). The former can just return true/false, but the latter needs returns either an iterable or false. (And true/iterable both do the obvious thing when used in the opposite context.)

Matthew brought up in the champion meeting today that he was somewhat concerned about the perf impact of this in variable pattern cases - String just checks if your value is a string (or a String), while String(...) will run the pattern on the (primitive, unwrapped if necessary) string, so the String[Symbol.customMatcher] method has to be defined to essentially be:

String[Symbol.customMatcher] = function(subject) {
	if( typeof subject == "string" ) return [subject];
	if( subject instanceof String ) return [String(subject)];
	return false;
}

And incurring the cost of a temp array for the likely much more common String variable matcher didn't seem very great. We resolved in the meeting to make sure that we spec things such that, when used in the pattern, the built-ins' return values are unobservable, so the temp array can be elided entirely by codegen. (You'd only see it if you invoked the method directly.)

But this doesn't help authors, who have to write the less efficient code all the time. Maybe we can solve this more directly, by splitting the hooks into two:

  • Symbol.customMatcher, which is invoked by the variable pattern syntax, and needs to return true or false.
  • Symbol.customExtractor, which is invoked by the extractor pattern syntax, and needs to return an iterable or false.

We check for both hooks on an object, in both cases, but check for the appropriate one first. That is, Foo will invoke customMatcher if it exists; if it doesn't, but customExtractor does, it invokes that (and treats an iterable as a true result); if neither exists, it's just equality-checked, as normal for variables. Similarly, Foo(...) will invoke customExtractor if it exists; if it doesn't, but customMatcher does, it invokes that (and treats true as an empty iterable); if neither exists it's a TypeError.

So that would mean that String would be instead defined as:

String[Symbol.customMatcher] = function(subject) {
	return typeof subject == "string" || subject instanceof String;
}
String[Symbol.customExtractor] = function(subject) {
	if( typeof subject == "string" ) return [subject];
	if( subject instanceof String ) return [String(subject)];
	return false;
}

This lets us avoid having to be clever in codegen for built-ins and also lets authors receive the same benefits - Option.Some can just return a bool. In more complex cases where returning the values for an extractor might be expensive, you can avoid quite a bit of work when not necessary.


For the "auto-create a custom matcher if you didn't provide one in the class block" behavior, I presume we'd only auto-generate the customMatcher one; we can't assume anything meaningful for extractors by default. But if you only provide a customExtractor, should we still auto-define the customMatcher, or just defer to the author's customExtractor? I'm leaning toward no - if the author defines either, we don't generate anything, because they know what they're doing.

@Jack-Works
Copy link
Member

if Function.prototype.customMatcher is frozen, we will have a new fail point for override mistakes!

I support split it into two entry points.

@ljharb
Copy link
Member

ljharb commented Oct 3, 2023

This seems like wildly overcomplicating the proposal; iterating over an unmodified array should be very cheap with current optimizations, and we create extra throwaway containers constantly (including in the entire iterator protocol).

We shouldn't be concerned with potential performance issues, I think, until an implementor has concretely indicated such a concern.

I think one protocol hook point is more than sufficient, and I suspect if we tried to introduce two we'd get a lot of pushback. I'm also not sure why we'd freeze anything.

@tabatkins
Copy link
Collaborator Author

As was noted in the chatroom, the concern I'm quoting does come from a (Firefox) implementor (and Shu indicated a very similar concern in #253, so that's two browsers). Outside of the built-ins, I can also very easily imagine that constructing the iterator contents (beyond the iterator itself) might be a relatively expensive operation for user-defined extractors, and something they might want to avoid when users are just invoking Foo to get a typecheck, which I imagine will be very common.

@ljharb
Copy link
Member

ljharb commented Oct 4, 2023

I'm sure there's many other API designs that don't require creating a second protocol, which is an exceedingly costly thing to do from a language surface area standpoint.

@tabatkins
Copy link
Collaborator Author

I mean, if you would like to suggest one, feel free. The best I could imagine would be to have matchers return a function on success, which is invoked to return the iterable. But that's somewhat awkward (and still costs an arrow function).

@ljharb
Copy link
Member

ljharb commented Oct 5, 2023

I will certainly do so after I'm done traveling post-TC39.

@tabatkins
Copy link
Collaborator Author

Another option, instead of two hooks, is to pass a context argument to the function as a second argument. Probably an options bag, to allow for future expansion. Currently it would just tell you whether you're being invoked with or without an arglist, so you can decide between doing a cheap boolean test or constructing an iterator.

Like:

String[Symbol.customMatcher] = function(subject, {matchType}) {
	return match(matchType) {
		when "no-args" and if(typeof subject == "string" || subject instanceof String): true;
		when "with-args" and if(typeof subject == "string"): [subject];
		when "with-args" and if(subject instanceof String): [String(subject)];
		default: false;
	};
}

Functions that don't care can just not pay attention to the argument; we'll still do the true <-> iterator conversion as necessary.

@ljharb
Copy link
Member

ljharb commented Oct 16, 2023

That seems much simpler to me than a second symbol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants