-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a function to escape strings for use in regular expressions #29643
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that you can't do correct escaping with regex replacement. I could be wrong, but I suspect you need a state machine to do this fully correctly.
Regexen and state machines are equivalent, of course, but I guess your objection goes beyond that. However, look at (e.g.) the Python implementation – it just uses straight-up character substitution: |
I think the travis failure is unrelated -- maybe some temporary network issue since the failing tests are related to Sockets? We've been using a slightly simpler version (almost exactly what Python does) in an internal project without issues for a while. Are there any other inputs we should test this with? I threw a few strings containing metacharacters at it as well as some strange stuff from BLoNS and it seems to be working. |
Hey guys, what's holding this up? It seems to have had popular demand, and I found myself re-implementing this myself lately. Also, wouldn't a better name be |
Needs to be rebased (it's quite conflicted at this point) and reviewed. Although note that a different escaping approach was used in #23422 and we should probably use the same approach for this kind of functionality. |
It's about 20 lines of code, include documentation, it should be fairly easy to merge or just rewrite. I would do it myself, but I don't know much about Julia's conventions yet (but I'm willing to do it with guidance). |
It seems it might need to be replaced, if it is to conform to #23422? |
There’s also the issue I mentioned with my original code: We might not want to use |
|
||
## escaping ## | ||
""" | ||
regex_escape(s::AbstractString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better call this escape_regex
, for consistency with escape_string
?
The `regex_escape` function allows you to escape a string for use in constructing a regular | ||
expression. All whitespace and PCRE metacharacters are escaped. | ||
|
||
```julia-repl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be a doctest?
I think #31989 is further along now, I'll close this PR in favor of that |
One disadvantage of dropping this for #31989 is that we no longer cover the original functionality, of producing a string where regex stuff is escaped. It only covers the cases where you want a fully formed regex, which isn’t what I needed, and which isn’t what other languages do. The new functionality of composing regexen etc. helps, of course, but there may still be cases where you’ll want to simply escape the various character sequences without actually producing a regex object, in which case you’d have to reimplement this. |
@mlhetland Good point -- I should have looked more closely at that PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still want this, given that we also now have (undocumented) Base.wrap_string
?
``` | ||
""" | ||
function regex_escape(s::AbstractString) | ||
res = replace(s, r"([()[\]{}?*+\-|^\$\\.&~#\s=!<>|:])" => s"\\\1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why =!<>|:
(which aren't in the python list)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe my original was based on PHP, but adding some precautionary characters from Python, and escaping whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(But I haven’t looked at wrap_string
. Documenting that might be a reasonable alternative. 🤷♂️)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(My comments end with this version, BTW.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like wrap_string
uses the Perl strategy (with \Q
)? I just thought that was overkill, but given that it’s already used, that might be the way to go, dropping this version. (Just my two cents.)
Any updates on this? Very short function, but nontrivial to get it right. |
This uses the implementation from @mlhetland that was mentioned in #6124.