Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Cmd interpolation with environments #37070

Closed
staticfloat opened this issue Aug 17, 2020 · 14 comments
Closed

Fix Cmd interpolation with environments #37070

staticfloat opened this issue Aug 17, 2020 · 14 comments

Comments

@staticfloat
Copy link
Member

I think we can make some improvements to Cmd and environment handling.

Problems:

  • setenv(cmd, "FOO" => "BAR") will unconditionally overwrite all environment settings already set within cmd.

  • `$foo $bar` where foo and bar are both cmd objects with environment variables embedded throws an error.

I propose the following:

  • Change Cmd(cmd::Cmd, env=env) and setenv(cmd::Cmd, env) to merge environments rather than replace them. Functionally, merging can be as simple as vcat(old_env, new_env), since the OS will replace overlapping environments properly. This would act a little strangely with hashing, though, as hash(c1.env) != hash(c2.env) even though dumping the environment of the eventual process would show the same environment. This might be a feature though, as perhaps there are OS bugs where the environment block is not properly parsed or something. We probably shouldn't pretend that a cmd.env is a dictionary when it really isn't, despite most users using it like one.

  • Change cmd_gen() (the Cmd interpolation argument processor) to no longer special-case the first Cmd object it finds, and to instead concatenate environment blocks from all Cmd objects. This would allow you to `$foo $bar` with two different sets of environment variables set. Of course, if foo sets PATH to something, and bar sets it to something else, it would not intelligently combine the two, you'd just get bar's PATH, but that is only to be expected.

These changes will allow us to stop using withenv() in JLL packages, which is thread-unsafe. Right now, the ENV block is the only realistic method to cooperatively layer environments, as it allows a natural way to set default environment values that can be overridden by the user. However, due to its thread-unsafety, it's ultimately undesirable. It would be much better to change the current API from:

# Git_jll.git() sneakily wraps all user code in a `with_env()` so that it has the proper PATH, LD_LIBRARY_PATH, etc...
Git_jll.git() do git_path
    run(setenv(`$git_path ...`, "USER_SPECIFIC_GIT_OPTION" => "1")
end

to something more like:

run(setenv(`$(Git_jll.git()) ...`, "USER_SPECIFIC_GIT_OPTION" => "1")

where the git() function would return a Cmd object with all necessary environs set within it. Without solving this issue, the second syntax will result in an error, as git will be unable to find its dependent libraries.

@staticfloat staticfloat added the triage This should be discussed on a triage call label Aug 17, 2020
@StefanKarpinski
Copy link
Member

I wonder if this should still error if they set the same environment variable with different values. There's no clearly correct ordering in that case.

@staticfloat
Copy link
Member Author

We could do that, which would involved parsing the environment blocks, which I'm fine with. But I also think it should be fine to layer them from left to right, so that the last added one takes precedence over all previous one.

@staticfloat
Copy link
Member Author

Alright, I have addressed one half of this with #37244, however I was stymied in fixing the second half of this due to the following very surprising behavior:

julia> cmd_foo = `echo foo`
       cmd_bar = `echo bar`
       `sh -c "$cmd_foo; $cmd_bar"`
`sh -c 'echo; echo' 'echo; bar' 'foo; echo' 'foo; bar'`

It's doing an outer product between the values within the Cmd objects. Is this really intended, or is this some kind of horrible parsing bug?

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 28, 2020

If you splice two or more arrays of strings into the same word of a command, it produces the outer product of the words like the shell does with { and }. E.g.

julia> names = ["foo", "bar", "baz"]
3-element Vector{String}:
 "foo"
 "bar"
 "baz"

julia> exts = ["txt", "md", "tex"]
3-element Vector{String}:
 "txt"
 "md"
 "tex"

julia> `rm -f $names.$exts`
`rm -f foo.txt foo.md foo.tex bar.txt bar.md bar.tex baz.txt baz.md baz.tex`

Now a Cmd object itself is just a special kind of vector of words, so the same thing applies to splicing two Cmd objects into the same word of another command object like you're doing here. Indeed, when you splice one command into another, you typically want it to be spliced as an array, not a scalar:

julia> opts = `-r -f`
`-r -f`

julia> files = `foo bar baz`
`foo bar baz`

julia> `rm $opts $files`
`rm -r -f foo bar baz`

In this situation you want to splice the commands into another command as scalars, not as arrays of words because they're going into a shell-quoted context. It seems like you should be shell escaping them in order to do that, no?

julia> `sh -c "$(Base.shell_escape(cmd_foo)); $(Base.shell_escape(cmd_bar))"`
`sh -c 'echo foo; echo bar'`

@StefanKarpinski
Copy link
Member

I do agree that the cartesian product behavior is overly featurey and should be removed in 2.0.

@staticfloat
Copy link
Member Author

Ah, I didn't realize the quotes had such an impact. That makes sense then, and I agree that it should be removed in 2.0, but it's easy enough to work around for now. :)

@staticfloat
Copy link
Member Author

I've given up on Cmd interpolation because it seems very difficult to interpolate Cmds within other Cmds. I'm happy with just add addenv() and calling it good.

@StefanKarpinski
Copy link
Member

If you want to interpolate a command into a shell call like this, then you need to shell_escape it first, which converts it to a single string. If you're injecting a command into a another command, it acts like an array because it is one.

@staticfloat
Copy link
Member Author

Right, that makes it difficult to do things like sudo -k /bin/sh -c "$cmd1; $cmd2" where you want the environments of cmd1 and cmd2 to effect the overall Cmd object. While that is a niche use case, it's also the whole point of trying to get interpolation of Cmd objects to bring along their environment, so I'm willing to wait until the array semantics go away before trying to implement this feature.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 30, 2020

The fact that command objects interpolated into other command objects in a scalar context like this does the Cartesian word generation thing is a pretty much an accidental "feature" due to Cmd objects becoming more array-like over time. I highly doubt that anyone is relying on it, and unlike the same behavior for normal vectors of strings, it isn't documented, so we could probably change it.

What you seem to want here is for interpolation of a command object into a scalar command context to automatically quote the interpolated command and keep the environment associated with it. Shell quoting is indeed one way to scalarize a command object, and it would be convenient here, but I'm a little concerned that it's a bit too magical and may not be the "one true way" to handle this situation.

For example, what if you did this instead?

`rm -f $cmd.txt`

Would shell quoting cmd and merging its environment into the larger command's environment and appending .txt to the end of the quoted command text be the right thing to do here? Of course, listing all the words in the command and appending .txt to each of them is also not obviously the right thing to do, but I'd argue that they're both equally un/reasonable here.

@staticfloat
Copy link
Member Author

Would shell quoting cmd and merging its environment into the larger command's environment and appending .txt to the end of the quoted command text be the right thing to do here?

I do think that's the least-surprising thing to do. I don't know any other tool that does the kind of outer product thing that our Cmd objects do, but perhaps if I was more used to that I'd find it more natural.

@StefanKarpinski
Copy link
Member

Shells do it:

$ echo {foo,bar,baz}.{txt,tex,md}
foo.txt foo.tex foo.md bar.txt bar.tex bar.md baz.txt baz.tex baz.md

@staticfloat
Copy link
Member Author

But that's using the wildcard syntax that only does exactly that behavior; you can take arrays and insert them without this behavior as long as you don't use the {} syntax that explicitly does it:

$ X=("foo" "bar" "baz")
$ Y=("txt" "tex" "md")
$ echo "${X[@]}.${Y[@]}"
foo bar baz.txt tex md

The issue is that instead of explicitly requesting combinations, it's happening any time I use a Cmd.

@StefanKarpinski
Copy link
Member

That's just a manifestation of the shell's weak distinction between arrays and strings with spaces. I don't think that what the shell is doing there actually makes much sense to be honest.

@vtjnash vtjnash removed the triage This should be discussed on a triage call label Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants