-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SubstitutionString can't contain escape sequences \n, \t, etc. during replace #27125
Comments
The problem is that julia> s"\n".string == "\\n"
true The same thing applies to regexes, but it's less visible since PCRE interprets escapes like |
How about calling --- a/base/strings/io.jl
+++ b/base/strings/io.jl
@@ -316,19 +316,21 @@ end
General unescaping of traditional C and Unicode escape sequences. Reverse of
[`escape_string`](@ref).
"""
-unescape_string(s::AbstractString) = sprint(unescape_string, s, sizehint=lastindex(s))
+unescape_string(s::AbstractString, keep_esc::AbstractArray{<:AbstractChar}=Char[]) = sprint(unescape_string, s, keep_esc; sizehint=lastindex(s))
"""
unescape_string(io, str::AbstractString) -> Nothing
Unescapes sequences and prints result to `io`. See also [`escape_string`](@ref).
"""
-function unescape_string(io, s::AbstractString)
+function unescape_string(io, s::AbstractString, keep_esc::AbstractArray{<:AbstractChar}=Char[])
a = Iterators.Stateful(s)
for c in a
if !isempty(a) && c == '\\'
c = popfirst!(a)
- if c == 'x' || c == 'u' || c == 'U'
+ if c in keep_esc
+ print(io, '\\', c)
+ elseif c == 'x' || c == 'u' || c == 'U'
n = k = 0
m = c == 'x' ? 2 :
c == 'u' ? 4 : 8
--- a/base/regex.jl
+++ b/base/regex.jl
@@ -254,7 +254,8 @@ function _replace(io, repl_s::SubstitutionString, str, r, re)
GROUP_CHAR = 'g'
LBRACKET = '<'
RBRACKET = '>'
- repl = repl_s.string
+ keep_esc = [SUB_CHAR, GROUP_CHAR, collect('0':'9')...]
+ repl = unescape_string(repl_s.string, keep_esc)
i = firstindex(repl)
e = lastindex(repl)
while i <= e There's a change in behaviour due to this though: invalid single letter sequences like |
Spot on: a custom unescaping pass is what is needed to make this work correctly. |
Note for future: even after other escape sequences are allowed in a SubstitutionString, octal sequences still can't be used since Related, if the user wants to use the second capture group and then have "45" in the output string, there doesn't currently seem to be a way to do that: To avoid ambiguity then, named capture groups should disallow numerical names, like |
Noting that a way to get around this for \n is to manually entering a return like so:
|
While trying to create documentation for SubstitutionString (#26497), I found that
replace
usingSubstitutionString
s doesn't work if the substition string has a newline or other escape sequence in it.Looking at the code, when the
_replace
function (in regex.jl) sees a backslash, it expects another backslash, a digit or a 'g', and throws an error withreplace_err
if it doesn't see one of those. Another consequence of this is that the replacement string can't contain things like Unicode codepoint sequences (\u0B85
).The text was updated successfully, but these errors were encountered: