Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] system(command; args) filter #1614

Open
dkrieger opened this issue Mar 4, 2018 · 4 comments
Open

[feature request] system(command; args) filter #1614

dkrieger opened this issue Mar 4, 2018 · 4 comments

Comments

@dkrieger
Copy link

dkrieger commented Mar 4, 2018

It would be nice to allow external commands that take JSON in and return JSON out to be embedded in a pipeline of jq filters. an example use case would be mapping over an array of objects, converting an object member to a Longest Common Substring -matched element from a reference array using an external command, and picking up jq execution with that output.

"command" would be a string representing the name/path of the command
args would be an object representing arguments to the command.

parsing semantics demonstration:

echo '[10,20,30,40]' | jq 'map(system(mylookup; {"v": false, "f": true, "dict": "mydict.tsv", "o": "json"}))'

would feed each item of our input array to stdin of mylookup -f --dict="mydict.tsv" -o "json"

assumptions of posix compliance could be overridden by prefixing a key with '-', for example to specify args to find. to handle more significant deviations, or even the one just mentioned, users could use string interpolation to build the call using 'command', leaving args blank.

I may be in the minority, but I find the jq syntax for transformations and control flow to be very elegant and easily understandable. I'd rather call complicated routines from jq than vice versa in many cases. It would be trivial to wrap a system filter in a function, then include that as a module.

@charles-dyfis-net
Copy link

charles-dyfis-net commented Mar 31, 2019

I would really hope that we wouldn't support system(cmd) as a conventional single-string-argument call that just invokes ['sh', '-c', cmd]; such a function is almost impossible to use securely.

At bare minimum, any such interface should allow multiple arguments so the OS-level execve() syscall can be directly controlled. To extend the example given in the ticket, consider 'map(execpipe(["bash", "-c", "mylookup \"$1\" | whatever", "_", .]))' -- this way:

  • The user can directly select the shell (or use no shell at all)
  • The user has the ability (which hopefully the documentation should strongly encourage) to, when invoking a shell, pass data (in this case the .) out-of-band from code (in this case, the mylookup "$1" | whatever).

Even then, since jq code has been written historically by folks who assume that the language isn't capable of security-sensitive operations, I'd hope for an explicit --unsafe argument to be required before this functionality or any equivalent is exposed.

@nicowilliams
Copy link
Contributor

Although I agree that system(3) is an awful interface, it's also trivial to add FFIs for, so as it happens I have that already in PR #1843. Help me kick the tires on that!

(I'll see about not using system(3), instead using _spawn() on Windows, or posix_spawn() on Unix. But that's quite a bit more work.)

@nichtich
Copy link

nichtich commented May 5, 2019

In particular this feature would also allow to collect and merge data via web APIs with curl. The --unsafe option is another issue, I'd prefer a flag to enable/disable all external communication via side-channels (including $ENV, env, now, and strptime).

@nicowilliams
Copy link
Contributor

@nichtich yes, the thought has occurred :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants