Why not just escape every character? #15

domenic · 2015-06-16T16:51:06Z

Is there any reason to only escape a specific subset? It's harmless to add slashes, right?

benjamingr · 2015-06-16T16:55:17Z

It makes the resulting string longer, other than that it's harmless.

This is what some programming languages (Python escapes non alphanumeric strings) do where others escape a strict set (like C#).

domenic · 2015-06-16T16:57:42Z

Might be worth mentioning this as a design alternative in the readme, with the pro that it's more future-proof.

benjamingr · 2015-06-16T17:01:23Z

Good idea, I'll add that when I'm in front of a computer :) (you're welcome to if you'd like of course).

benjamingr · 2015-06-17T06:08:59Z

Updated the README, I'll leave this open for a week to see if anyone has any further input on it.

benjamingr · 2015-06-19T14:21:40Z

Following the research https://github.com/benjamingr/RegExp.escape/blob/master/data/other_languages/discussions.md it appears that other languages that used to escape every character have either made exceptions (like Python) or changed it (like Perl). The discussion notes contain links to posts with reasons on why changes were made.

mjpieters · 2015-06-19T15:25:48Z

Python's new regex engine (under development) gives you a choice; either escape all non-alphanumerics, or only metacharacters (and NUL), see https://bitbucket.org/mrabarnett/mrab-regex/src/6193ea4246da272cf18a190c46aa116737067780/regex_3/Python/regex.py?at=default#cl-342

In your discussion you mentioned a problem with wide characters; you ran into the Python re limitations with UCS-2 vs. UCS-4 builds (all Python versions up to 3.2 use one or the other based on a compile-time switch), the regular expression engine does not handle codepoints but code_units_, which in a UCS-2 build means 2 per non-BMP character. The escaping is correct for their respective builds.

benjamingr · 2015-06-23T17:10:08Z

I think we're good with not escaping every character. I want to focus on the discussion about big set vs readable set.

benjamingr mentioned this issue Jun 16, 2015

Does / need to be escaped? #12

Closed

benjamingr added the discussion appreciated label Jun 18, 2015

domenic mentioned this issue Jun 19, 2015

Interaction with backreferences / variable-width escape sequences #17

Closed

benjamingr closed this as completed Jun 23, 2015

benjamingr mentioned this issue Jan 29, 2021

RegExp.escape escaping SyntaxCharacter alone is insufficient #48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why not just escape every character? #15

Why not just escape every character? #15

domenic commented Jun 16, 2015

benjamingr commented Jun 16, 2015

domenic commented Jun 16, 2015

benjamingr commented Jun 16, 2015

benjamingr commented Jun 17, 2015

benjamingr commented Jun 19, 2015

mjpieters commented Jun 19, 2015

benjamingr commented Jun 23, 2015

Why not just escape every character? #15

Why not just escape every character? #15

Comments

domenic commented Jun 16, 2015

benjamingr commented Jun 16, 2015

domenic commented Jun 16, 2015

benjamingr commented Jun 16, 2015

benjamingr commented Jun 17, 2015

benjamingr commented Jun 19, 2015

mjpieters commented Jun 19, 2015

benjamingr commented Jun 23, 2015