Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly escape closing square brackets in two regular expression literals #141

Merged

Conversation

jfahrenkrug
Copy link
Contributor

The Duktape JavaScript engine is very strict when it comes to regex literals.
It was choking on two regexs in the XRegExp source. This commit correctly escapes
two closing square brackets that should be escaped because they don't have an opening
counterpart and they should be matched as a literal character.

…terals.

The Duktape JavaScript engine is very strict when it comes to regex literals.
It was choking on two regexs in the XRegExp source. This commit correctly escapes
two closing square brackets that should be escaped because they don't have an opening
counterpart and they should be matched as a literal character.
@mathiasbynens
Copy link
Collaborator

mathiasbynens commented Jul 19, 2016

The commit message makes it sound like the ] in this scenario MUST be escaped, which is not the case.

E.g. /]/ is a perfectly valid regular expression in JavaScript:

/]/.test(']')
// → should return `true`

If the Duktape engine cannot handle it, that’s a bug in Duktape.

@svaarala
Copy link

@mathiasbynens Ecmascript E5 doesn't actually allow that regexp format but it's commonly accepted by Javascript engines. ES6 adds unescaped brackets in its Appendix B: http://www.ecma-international.org/ecma-262/6.0/#sec-regular-expressions-patterns.

@svaarala
Copy link

Here's the relevant E5 production from E5 Section 15.10.1:

PatternCharacter ::
    SourceCharacter but not one of
        ^ $ \ . * + ? ( ) [ ] { } |

@mathiasbynens
Copy link
Collaborator

ES6 adds unescaped brackets in its Appendix B: http://www.ecma-international.org/ecma-262/6.0/#sec-regular-expressions-patterns.

Exactly!

@svaarala
Copy link

@mathiasbynens Note that ES6 Annex B is optional and not really strongly recommended:

These features are not considered part of the core ECMAScript language. Programmers should not use or assume the existence of these features and behaviours when writing new ECMAScript code. ECMAScript implementations are discouraged from implementing these features unless the implementation is part of a web browser or is required to run the same legacy ECMAScript code that web browsers encounter.

But I don't mean to be a language lawyer here, just pointing out what the ES5/ES6 specifications say :)

Duktape is supporting the out-in-the-wild non-compliant regexps piece by piece (this one too soon) because there's a lot of code that's almost ES5 but only relies on non-standard regexps - and it's quite difficult to notice because most engines support a lot of regexp formats beyond ES5.

@mathiasbynens
Copy link
Collaborator

I know what the “Annex B” label is supposed to mean based on the spec :) I also know that in the real world, engines implement it whether they’re part of a browser or not.

Since portability is one of Duktape’s goals, IMHO it would make sense to support /]/ as well as other Annex B features.

[…] required to run the same legacy ECMAScript code that web browsers encounter […]

@fatcerberus
Copy link

To be fair, Duktape is not (yet) an Ecmascript 6 engine. It was developed against ES5, where the regex described is indeed invalid according to the spec--and so this technically not a bug. That said, ES6 support is on the horizon, hopefully sooner rather than later ;).

@jfahrenkrug
Copy link
Contributor Author

I didn't mean to make it sounds like it's a bug, I'm sorry about that. If escaping these two square brackets makes it work in the current version of Duktape, I think it would be worth considering. It doesn't hurt to escape those two closing square brackets. IMHO, it's easier to read when they are escaped: We want to match a literal ], so seeing \] tells me right away: "we need to match a literal ]". When I see an unescaped ], I first have to look through the rest of the regex to see if there is a an opening [ counterpart.

@slevithan
Copy link
Owner

Seems reasonable to me. Thanks for the patch and the details in the discussion. I think XRegExp also uses some unescaped {s and }s, but if that's not causing problems then cool.

@slevithan slevithan merged commit ac800e6 into slevithan:master Jul 20, 2016
@jfahrenkrug
Copy link
Contributor Author

Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants