-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor shell_escape_winsomely() and add escaping function for CMD.EXE syntax #34111
Changes from all commits
cba9a8a
c09f816
afc8464
6114551
5973452
5c21006
57425e2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -255,60 +255,101 @@ shell_escape_posixly(args::AbstractString...) = | |||||
sprint(print_shell_escaped_posixly, args...) | ||||||
|
||||||
|
||||||
function print_shell_escaped_winsomely(io::IO, args::AbstractString...) | ||||||
first = true | ||||||
for arg in args | ||||||
first || write(io, ' ') | ||||||
first = false | ||||||
# Quote any arg that contains a whitespace (' ' or '\t') or a double quote mark '"'. | ||||||
# It's also valid to quote an arg with just a whitespace, | ||||||
# but the following may be 'safer', and both implementations are valid anyways. | ||||||
quotes = any(c -> c in (' ', '\t', '"'), arg) || isempty(arg) | ||||||
quotes && write(io, '"') | ||||||
backslashes = 0 | ||||||
for c in arg | ||||||
if c == '\\' | ||||||
backslashes += 1 | ||||||
""" | ||||||
shell_escape_wincmd(s::AbstractString) | ||||||
shell_escape_wincmd(io::IO, s::AbstractString) | ||||||
|
||||||
The unexported `shell_escape_wincmd` function escapes Windows | ||||||
`cmd.exe` shell meta characters. It escapes `()!^<>&|` by placing a | ||||||
`^` in front. An `@` is only escaped at the start of the string. Pairs | ||||||
of `"` characters and the strings they enclose are passed through | ||||||
unescaped. Any remaining `"` is escaped with `^` to ensure that the | ||||||
number of unescaped `"` characters in the result remains even. | ||||||
|
||||||
Since `cmd.exe` substitutes variable references (like `%USER%`) | ||||||
_before_ processing the escape characters `^` and `"`, this function | ||||||
makes no attempt to escape the percent sign (`%`). | ||||||
|
||||||
Input strings with ASCII control characters that cannot be escaped | ||||||
(NUL, CR, LF) will cause an `ArgumentError` exception. | ||||||
|
||||||
With an I/O stream parameter `io`, the result will be written there, | ||||||
rather than returned as a string. | ||||||
|
||||||
See also: [`escape_microsoft_c_args`](@ref), [`shell_escape_posixly`](@ref) | ||||||
|
||||||
# Example | ||||||
```jldoctest | ||||||
julia> Base.shell_escape_wincmd("a^\\"^o\\"^u\\"") | ||||||
"a^^\\"^o\\"^^u^\\"" | ||||||
``` | ||||||
""" | ||||||
function shell_escape_wincmd(io::IO, s::AbstractString) | ||||||
# https://stackoverflow.com/a/4095133/1990689 | ||||||
mgkuhn marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
occursin(r"[\r\n\0]", s) && | ||||||
throw(ArgumentError("control character unsupported by CMD.EXE")) | ||||||
i = 1 | ||||||
len = ncodeunits(s) | ||||||
if len > 0 && s[1] == '@' | ||||||
write(io, '^') | ||||||
end | ||||||
while i <= len | ||||||
c = s[i] | ||||||
if c == '"' && (j = findnext('"', s, nextind(s,i))) !== nothing | ||||||
write(io, SubString(s,i,j)) | ||||||
i = j | ||||||
else | ||||||
if c in ('"', '(', ')', '!', '^', '<', '>', '&', '|') | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just realized that
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I hadn't spotted that list before, but it also seems wrong, at least for my use case of command invocation. For example: typing into the command line Reading the text before, it may well be meant in the context of the “completion” function, i.e. it may be a list of characters that the authors of the completion function worried about, and that function may work differently in different contexts. That may not be the same list of characters that we need to escape in command invocation, i.e. when sending strings such as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but note that the list is intentionally incomplete with respect to our usage since There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Jan Erik / Dave Benham description of the CMD.EXE parser does say that under certain circumstances in Phase 7 if the quote flag is off, then when testing if a command token is an internal command, it “break[s] the command token before the first occurrence of [Note that my original approach (which did not factor Microsoft C ARGV quoting and CMD.EXE quoting into separate functions) would have been able to use quotes more aggressively (i.e., quote something for the C library to unquote because CMD.EXE might interpret it as a meta character, not because the C library might), and thereby guarantee far more easily that none of these contexts are ever reached. But I think we are fine. Can you construct any use case where my proposed current implementation is not already doing the right thing?] |
||||||
write(io, '^', c) | ||||||
else | ||||||
# escape all backslashes and the following double quote | ||||||
c == '"' && (backslashes = backslashes * 2 + 1) | ||||||
for j = 1:backslashes | ||||||
# backslashes aren't special here | ||||||
write(io, '\\') | ||||||
end | ||||||
backslashes = 0 | ||||||
write(io, c) | ||||||
end | ||||||
end | ||||||
# escape all backslashes, letting the terminating double quote we add below to then be interpreted as a special char | ||||||
quotes && (backslashes *= 2) | ||||||
for j = 1:backslashes | ||||||
write(io, '\\') | ||||||
end | ||||||
quotes && write(io, '"') | ||||||
i = nextind(s,i) | ||||||
end | ||||||
return nothing | ||||||
end | ||||||
|
||||||
shell_escape_wincmd(s::AbstractString) = sprint(shell_escape_wincmd, s; | ||||||
sizehint = 2*sizeof(s)) | ||||||
|
||||||
""" | ||||||
shell_escaped_winsomely(args::Union{Cmd,AbstractString...})::String | ||||||
|
||||||
Convert the collection of strings `args` into single string suitable for passing as the argument | ||||||
string for a Windows command line. Windows passes the entire command line as a single string to | ||||||
the application (unlike POSIX systems, where the list of arguments are passed separately). | ||||||
Many Windows API applications (including julia.exe), use the conventions of the [Microsoft C | ||||||
runtime](https://docs.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments) to | ||||||
split that command line into a list of strings. This function implements the inverse of such a | ||||||
C runtime command-line parser. It joins command-line arguments to be passed to a Windows console | ||||||
application into a command line, escaping or quoting meta characters such as space, | ||||||
double quotes and backslash where needed. This may be useful in concert with the `windows_verbatim` | ||||||
flag to [`Cmd`](@ref) when constructing process pipelines. | ||||||
escape_microsoft_c_args(args::Union{Cmd,AbstractString...}) | ||||||
escape_microsoft_c_args(io::IO, args::Union{Cmd,AbstractString...}) | ||||||
|
||||||
# Example | ||||||
```jldoctest | ||||||
julia> println(shell_escaped_winsomely("A B\\", "C")) | ||||||
"A B\\" C | ||||||
Convert a collection of string arguments into a string that can be | ||||||
passed to many Windows command-line applications. | ||||||
|
||||||
Microsoft Windows passes the entire command line as a single string to | ||||||
the application (unlike POSIX systems, where the shell splits the | ||||||
command line into a list of arguments). Many Windows API applications | ||||||
(including julia.exe), use the conventions of the [Microsoft C/C++ | ||||||
runtime](https://docs.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments) | ||||||
to split that command line into a list of strings. | ||||||
|
||||||
This function implements an inverse for a parser compatible with these rules. | ||||||
It joins command-line arguments to be passed to a Windows | ||||||
C/C++/Julia application into a command line, escaping or quoting the | ||||||
meta characters space, TAB, double quote and backslash where needed. | ||||||
|
||||||
See also: [`shell_escape_wincmd`](@ref), [`escape_raw_string`](@ref) | ||||||
""" | ||||||
shell_escape_winsomely(args::AbstractString...) = | ||||||
sprint(print_shell_escaped_winsomely, args..., sizehint=(sum(length, args)) + 3*length(args)) | ||||||
function escape_microsoft_c_args(io::IO, args::AbstractString...) | ||||||
# http://daviddeley.com/autohotkey/parameters/parameters.htm#WINCRULES | ||||||
first = true | ||||||
for arg in args | ||||||
if first | ||||||
first = false | ||||||
else | ||||||
write(io, ' ') # separator | ||||||
end | ||||||
if isempty(arg) || occursin(r"[ \t\"]", arg) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
(faster) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Depends on the length. On my computer, if
For
For
So overall, I still prefer using a regular expression here (and I suspect they might be beneficially used in some of the other shell escaping functions as well). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't it more common to have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are talking a maximum of ~50 ns benchmark overhead here on my 8-year-old PC (the time in which a bit travels 10 metres on a cable), which is many orders of magnitude less than the overhead of invoking a new process. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Either is fine with me. |
||||||
# Julia raw strings happen to use the same escaping convention | ||||||
# as the argv[] parser in Microsoft's C runtime library. | ||||||
escape_raw_string(io, arg) | ||||||
else | ||||||
write(io, arg) | ||||||
end | ||||||
end | ||||||
end | ||||||
escape_microsoft_c_args(args::AbstractString...) = | ||||||
sprint(escape_microsoft_c_args, args...; | ||||||
sizehint = (sum(sizeof.(args)) + 3*length(args))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(adopting some of the advisory text from #33474):
julia/base/shell.jl
Lines 324 to 345 in 09c1d61
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I don't quite understand yet that suggested addition and I am not really familiar yet with all the behind-the-scenes processing that may go on during
run(Cmd(Cmd(["cmd /c \"...
on a Windows machine. My own use case and expertise is preparing command lines that get sent via ssh to a Windows server running OpenSSH sshd (e.g., in Distributed), where no libuv processing of command-line arguments passed to Windows processes is involved. I couldn't reproduce yet myself the effect referred to in this suggestion, and therefore would be more comfortable if we left this to a separate PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is none. That's the point of using that syntax and flag: to disable all helpers and get the raw string without libuv processing or such.
Though I would perhaps further clarify here that it may be best passed as
CMD.exe /S /C " prog args "
(with the '/S' and surrounding " pair).