-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex Escaping #6124
Comments
You can use |
I guess we will go with that solution; we can reopen if we decide to add this function. |
@StefanKarpinski The following code
finds a match, however it should not as we wanted to look for \E. Maybe it is worth to consider reopening this issue and adding a function that escapes string for use in regex? |
Yeah, this could use a more complete solution. It seems to be strictly a feature addition, however. |
Didn't want to go full PR on this (yet), but it seems there are semi-standard ways of doing this. My implementations of three canonical-ish versions follow. First, the Perl version, which escapes everything except word characters: # Perl version:
quotemeta(s::AbstractString) = replace(s, r"(\W)" => s"\\\1") As the docs say, this is the internal version behind the This is simple enought it might simply be a recipe in the docs, I guess. The C++ wrapper of PCRE has an implementation of this (which suggests that it's not available directly from PCRE), and it does the same thing, except it also replaces # pcrecpp version:
function quotemeta(s::AbstractString)
res = replace(s, r"(\W)" => s"\\\1")
replace(res, "\0" => "0")
end Finally, there's the more conservative PHP version, which only escapes the special characters # PHP version:
function quotemeta(s::AbstractString; delim=nothing)
res = replace(s, r"([.\\+*?[^\]$(){}=!<>|:-])" => s"\\\1")
delim ≢ nothing ? replace(res, delim => "\\$delim") : res
end I do think escaping # Julia version?
function quotemeta(s::AbstractString)
res = replace(s, r"([.\\+*?[^\]$(){}=!<>|:-])" => s"\\\1")
replace(res, "\0" => "\\0")
end |
The Python version escapes |
Escaping whitespace does seem to make sense (though we might as well use |
So, including these extra precautionary characters from the Python version (but escaping all whitespace, not just the characters listed): function quotemeta(s::AbstractString)
res = replace(s, r"([()[\]{}?*+\-|^\$\\.&~#\s=!<>|:])" => s"\\\1")
replace(res, "\0" => "\\0")
end |
… though I guess if PCRE doesn't ignore other whitespace than the above in |
@mlhetland Did you want to make a PR for this? If not, I can. Cheers! |
Feel free to!-) I’d suggest looking into that last issue – i.e. what PCRE ignores in x-mode vs what is matched by |
Ping! Why is the PR still on hold? |
See also: #29643 |
Closing as we have a version of this (though perhaps not the final version, per that PR) |
I don't understand the practice of closing an issue before the PR has been merged. |
I think the point is that there is another function (in Base) that solves the problem, even though it's up in the air whether the one from the PR should be added in addition. |
Se, specifically, this comment, about |
That comment was left two days ago, for a PR that's been open since 2018. Does It's another thing if you said "we don't care about this feature." Then it would make more sense to me. |
I have nothing to do with the decisions, here. I just implemented the function in the PR, because I also needed it (two years ago). Just tried to explain my impression of why the issue was closed – but I agree that until As for what |
I can't find a function to escape a string for use in regular expression matching. That is: f such that match(Regex(f(s1)),s2)!=nothing iff s1==s2.
Python has such a function (re.escape) as do Ruby (Regexp.escape) and MATLAB (regextranslate). R, on the the other hand does not.
Assuming the community decides to add, escape_regex seems like a clear name but bothers me slightly because it's acting on a string rather than a Regex. escape_regexstring seems more accurate but needlessly verbose.
I'm happy to contribute the patch if the community agrees this is worthwhile.
Thoughts?
The text was updated successfully, but these errors were encountered: