Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does / need to be escaped? #12

Closed
allenwb opened this issue Jun 13, 2015 · 18 comments
Closed

Does / need to be escaped? #12

allenwb opened this issue Jun 13, 2015 · 18 comments

Comments

@allenwb
Copy link
Member

allenwb commented Jun 13, 2015

/ is not a character matched by the RegExp grammar SyntaxCharacter production. However, it seems like it might be a character that should be escaped. Particularly, if a string is being converted to a RegExp literal:

let str = "3/4/1972";
let rx = eval("/"+RegExp.escape(str)+"/");
@benjamingr
Copy link
Collaborator

Can you think of example code that does not involve eval or find one involves eval that cannot be better expressed?

As far as I'm aware (please correct me if I'm wrong) this is not a problem if instead you did:

let rx = new RexExp(str);

@allenwb
Copy link
Member Author

allenwb commented Jun 13, 2015

We can't ignore the existence of eval or denigrate use cases that use eval. Instead, we need to be thinking:

  • The langauge includes eval and some people use it?.
  • If we provide RegExp.escape will some of these people try to use it in combination with eval?
  • If so, how is it likely to be used.
  • and do those use cases introduce additional requirement that we haven't yet considered

Finally, note that use of eval isn't the only use case for escaping /. For example the implementation of RegExp.prototype.source requires the escaping to / in order to support RegExp.prototype.toString.

In fact, looking at this suggests to me, that there may be additional work that is needed to ensure that EscapeRegExpPattern and RegExp.escape have consistent behaviors. (BTW, this sort of thing is one of the things we mean when we talk about "cross-cutting concerns".)

@benjamingr
Copy link
Collaborator

We're not ignoring the existence of eval but we don't have to explicitly support using it this way. Not because I think I know better than users - but because I'm not convinced this is the way people use it.

If so, how is it likely to be used.

Probably not with / - for the same reason we don't explicitly think how to support eval for property access when adding symbols - it's just not a very convincing use case in my opinion (eval'ing the result of .escape) namely because:

  • Doing this sort of eval is considered an anti-pattern even today.
  • There is an "obvious" alternative (the regexp constructor)
  • The only places I could find that do it is arcane tutorials (like this) - all the resources I'm aware of do not do this.

So, I'm definitely against escaping / for that use case - I am however definitely interested in hearing about cases that are not eval("/" + RegExp.escape(str) + "/") or data indicating that this use case is common enough to consider. If you have conflicting data (showing that the / character is commonly used in these scenarios) I'd definitely like to see it.

@benjamingr
Copy link
Collaborator

In fact, looking at this suggests to me, that there may be additional work that is needed to ensure that EscapeRegExpPattern and RegExp.escape have consistent behaviors. (BTW, this sort of thing is one of the things we mean when we talk about "cross-cutting concerns".)

I have looked at EscapeRegExpPattern as the first option - even considering using it as the base of the algorithm or considering writing .escape in a way that would allow unifying them. However - EscapeRegExpPattern has special cases for LineTerminator, concatenation and other concerns that are not aligned.

If you have any idea how to unify them or work on them together I'd love to hear it. Do you think I should open a separate issue for this?

@zloirock
Copy link

BTW / escaped in polyfill.js. I think, be better escape it for this case.

@benjamingr
Copy link
Collaborator

@zloirock thanks, I wrote the polyfill and then changed my mind about this at one point. I'll update either the spec or polyfill.js as soon as we reach agreement about this.

@zloirock
Copy link

I think it is necessary at first not for eval("/"+RegExp.escape(str)+"/"), but for compilers / generation of code.

@benjamingr
Copy link
Collaborator

Do you have a convincing use case for me illustrating it?

By the way - different languages with eval deal with this differently -
Python escapes every non-alpha character and PHP does not escape it.

On Sun, Jun 14, 2015 at 10:36 AM, Denis Pushkarev notifications@github.com
wrote:

I think it is necessary at first not for eval, but for compilers /
generation of code.


Reply to this email directly or view it on GitHub
#12 (comment)
.

@zloirock
Copy link

I once had a ugly necessity to generate regular expressions on serverside (node) and send it to client as code :) Or, for example, writing plugins for Babel / macros for Sweet.js / something else.

@benjamingr
Copy link
Collaborator

@allenwb re-reading my previous comments I don't think they're convincing. I'll see how much real data I can dig around and come back in a day or two with actual data rather than my own private speculation about what people actually do and don't do.

Any help from your experience about how to go on about this sort of research would be appreciated - right now I'm going through GitHub and looking for how people are using eval with strings that contain "/".

@benjamingr
Copy link
Collaborator

@allenwb so far I'm fairly optimistic about not including "/" but I will have more data on Thursday. You can see the current research and progress at https://github.com/benjamingr/RexExp.escape/blob/master/data/README.md

@Zirak
Copy link

Zirak commented Jun 14, 2015

You might want to approach this from a purely cost effectiveness standpoint: If there exist such things out in the wild, no matter their diversity or significance, then you know the premise of @allenwb's suggestion stands.

Isn't your time better spent doing other things than researching just how much this construct is used?

@benjamingr
Copy link
Collaborator

No, deciding on the escaped character set is a big deal in my opinion and
we should be very careful in what we choose to force the implementations to
escape. So far it looks like uses of eval("..../" + x + and similar
constructs are extremely rare.

I take not breaking existing code and staying compatible very seriously but
I don't want to add an escaped character if I can help it. I'll be smarter
on Wednesday. If you're worried about my time you're welcome to help with
the proposal :)

On Sun, Jun 14, 2015 at 9:32 PM, Zirak notifications@github.com wrote:

You might want to approach this from a purely cost effectiveness
standpoint: If there exist such things out in the wild, no matter their
diversity or significance, then you know the premise @allenwb
https://github.com/allenwb's suggestion stands.

Isn't your time better spent doing other things than researching just how
much this construct is used?


Reply to this email directly or view it on GitHub
#12 (comment)
.

@benjamingr
Copy link
Collaborator

@allenwb I've not found significant evidence of this pattern on GH, 10000 biggest websites according to the alexa list or npm. The usages are sporadic and rare on the web. I'm closing this and removing / from the polyfill - if you still think that it should be escaped (For future use) let me know and we can continue this discussion.

@allenwb
Copy link
Member Author

allenwb commented Jun 16, 2015

@benjamingr I think this is a topic that should be presented to TC39 as an issue during the review of your initial proposal .

I'm particularly concerned that with your proposal we would now have to different forms of regexp escaping in the spec: that performed by RegExp.escape and that performed by RegExp.protoype.toString/EscapeRegExpPattern. This sort of almost, but not quite, duplicate behavior is a "code smell" for language feature design. Note that one fix may be to change EscapeRegExpPattern since its current spec. doesn't require string identical results across implementations.

Regardless, this is the sort of issue that TC39 likes to hash out and rtry to reach a consensus about.

@benjamingr
Copy link
Collaborator

@allenwb definitely agreed. EscapeRegExpPattern looks very under specified.

Note that unless we end up escaping all alphanumeric characters (as per the alternative here #15 ) they are pretty close - and I think we might be able to get EscapeRegExpPattern to reuse the logic specified here (which would also mean it would be guaranteed to produce the same results in all engines)

@benjamingr benjamingr reopened this Jun 16, 2015
@benjamingr
Copy link
Collaborator

@allenwb we now have more data points about EscapeRegExpPattern https://github.com/benjamingr/RegExp.escape/blob/master/data/EscapeRegExpPattern.md , we're currently looking at what engines do about it (In Firefox, it/s very literal for example).

@EladRK is collecting this data and is working on it, we'll have final data soon. Any comments would be very appreciate.

@benjamingr
Copy link
Collaborator

Superseding this with #29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants