Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add convenience API to std::process::Command #89004

Closed
wants to merge 1 commit into from

Conversation

matklad
Copy link
Member

@matklad matklad commented Sep 16, 2021

This is similar in spirit to fs::read_to_string -- not strictly
necessary, but very convenient for many use-cases.

Often you need to write essentially a "bash script" in Rust which spawns
other processes. This usually happens in build.rs, tests, or other
auxiliary code. Using std's Command API for this is inconvenient,
primarily because checking for exit code (and utf8-validity, if you need
output) turns one-liner into a paragraph of boilerplate.

While there are crates.io crates to help with this, using them often is
not convenient. In fact, I maintain one such crate (xshell) and, while
I do use it when I am writing something substantial, for smaller things
I tend to copy-paste this std-only snippet:

https://github.com/rust-analyzer/rust-analyzer/blob/ae36af2bd43906ddb1eeff96d76754f012c0a2c7/crates/rust-analyzer/build.rs#L61-L73

So, this PR adds two convenience functions to cover two most common
use-cases:

  • run is like status, except that it check that status is zero
  • read_stdout is like output, except that it checks status and
    decodes to String. It also chomps the last newline, which is modeled
    after shell substitution behavior (echo -n "$(git rev-parse HEAD)")
    and Julia's readchomp

Note that this is not the ideal API. In particular, error messages do
not include the command line or stderr. So, for user-facing commands,
you'd have to write you own code or use a process-spawning library like
duct.

@matklad matklad added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Sep 16, 2021
@rust-highfive
Copy link
Collaborator

r? @yaahc

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 16, 2021
@matklad
Copy link
Member Author

matklad commented Sep 16, 2021

Will file a tracking issue if this is something we'd want to have at all!

@rust-log-analyzer

This comment has been minimized.

@est31
Copy link
Member

est31 commented Sep 16, 2021

cc #84908

@est31
Copy link
Member

est31 commented Sep 16, 2021

What about renaming read_stdout to read_stdout_to_string or something? Because in std::fs, there is a read function, which outputs a binary Vec<u8>, and a read_to_string function which outputs an UTF-8 encoded String. On the other hand, I think it's way easier to retrieve binary outputs of a process than it is to read a file into a Vec<u8> without using fs::read, so maybe there wouldn't be much reason to add read_stdout ever.

@matklad
Copy link
Member Author

matklad commented Sep 17, 2021

Yeah, the name/API here is definitely not "orthogonal". For full generality, we'd need a matrix with binary/text stdour/stderr/both. But, as this is a convenience API, I figured it's better to make the common case short. We have the precedent in missing "_to_string": we have read_line rather than read_line_to_string.

Another thing which is reasonable to have is fn checked_output(&mut) -> io::Result<(Vec<u8>, Vec<u8>)>, but then perhaps we need to add checked_ prefix to read_stdout.

Overall, I feel that there's certainly an unfortunately uncomfortably large design space here, and it's unclear how to best explore it. So, on the meta level, I propose reaching consensus in the PR on that:

  • this is a problem worth solving (I became convinced of that after realising that the status quo hurts versatility quite a bit)
  • the API is of the right general shape (maybe it is not, Tracking Issue for ExitStatusError #84908 is a plausible alternative)

After that, we can merge and iterate on the tracking issue.

@the8472
Copy link
Member

the8472 commented Sep 17, 2021

I think these methods would be better on Output. That way you don't throw away information you might want to check when encountering an non-zero exit code.

@matklad
Copy link
Member Author

matklad commented Sep 19, 2021

I don't think run will work with .output, as it, like .status inherits streams. We can add this to ExitStatus, and that'll give us #84908.

I think I still prefer the API to be available on the command directly. While #84908 is a bit more general, we already have a fully general API for arbitrary custom error management. What we don't have is a sane-defaults one-liner.

@the8472
Copy link
Member

the8472 commented Sep 19, 2021

I don't think run will work with .output, as it, like .status inherits streams.

The it is quite ambiguous here, I had to look at the methods to get what you actually mean 😅

Ok, run() makes sense as-is since it inherits the streams since you don't lose any information. But read_stdout would still make sense on Output so you can split it off to a new line if you also want to get the stderr.

@yaahc
Copy link
Member

yaahc commented Sep 20, 2021

Overall, I feel that there's certainly an unfortunately uncomfortably large design space here, and it's unclear how to best explore it. So, on the meta level, I propose reaching consensus in the PR on that:

  • this is a problem worth solving
  • the API is of the right general shape

I agree that this is a good place to start building consensus. My opinion is that this is a problem worth solving. I've built similar abstractions on top of the Command APIs quite a few times, including most recently for this test suite. However, in these I'm generally quite careful to preserve the full context of the command that was run, why it failed, and all of its output. I see the importance of having some of these APIs be more convenient but I want to make sure we don't oversimplify.

Regarding the currently suggested API's shape:

My first issue is that run seems to be almost identical to .status()?.exit_ok()?; except that it uses a custom error message with less information than ExitStatusError and combines that into an io::Error. I don't think run is appreciably more convenient to use than this alternative, and between the two I'd prefer to keep the error types distinct than merge them.

Second, I agree with @the8472, read_line should be a method on Output not on Command, similar to how I prefer the exit_ok method on ExitStatus to run on Command.

Overall read_line seems more strongly motivated than run, though I really don't love the idea of an API that encourages users to throw away so much context. I worry that throwing away stderr as a convenient default would lead to a bunch of rust CLI tools that have subpar error reporting whenever they interact with subprocesses. My feeling is that if we're going to provide a convenience API like this is should be proactive and opinionated in a way that preserves as much context about errors as possible.

On that note, I'm curious how other language's convenience APIs for capturing the output of a subprocess handle stderr. Does Julia's readchomp throw away stderr? Does this change if the exit status indicates an error? Do they discard the stderr or forward it or something else entirely?

@matklad
Copy link
Member Author

matklad commented Sep 21, 2021

I don't think run is appreciably more convenient to use than this alternative

Yeah, as a designer of the API, I also see how two calls isn't really a big deal, and how this is cleaner and more orthogonal.

However, from an API user point of view, the difference seem rather meaningful. As a user of the API, I want to "run the command", or "get the command's output". It's a single atomic operation. As a metaphor, when running commands in a terminal, I don't think whether I misstyped the command name or the argument name, both are just "command failed to run":

$ cargo tst
error: no such subcommand: `tst`

	Did you mean `test`?

$ carg test
carg: command not found

For use-case I have in mind, one would always just write .status()?.exit_ok()?;, and that would be just boilerplate and noise.

It's not that big deal (although the extra possibility of forgetting to check the exit status is a bit concerning), but, still, it would be odd if shortcut API included more than strictly necessary.

Regarding the context, one interesting bit here is that, by letting the API hang off Command itself, we actually leave ourselves room to add more context later. In particular, we can add the most crucial bit of info, the command line itself, into the error message. In the Output or ExitStatus we don't have access to the Command.

This is similar to fs::read_to_string -- that also conflates three separate error cases (can't open path, can't read from fd, can't decode to str) into just one io::Error, but that is exactly what you need when you operate at a high level of abstraction, where you want to "just read the file".

I sort-of feel that we are conflating three use-cases here:

  • Alice writes a CLI utility to orchestrate processes and wants top-notch error reporting from subprocesses.
  • Bob migrates ad-hoc automation scripts from bash to Rust.
  • Carol writes a generic application and just wants all errors to be maximally useful by default.

I feel that Alice is rather happy -- the current API already allows to do fine-grained error reporting. She only needs read2 stabilized in some form, as she is tired from copy-pasting this bit of tribal knowledge.

Bob has become considerably happier since the introduction of fs::read_to_string, but he cringes every time he needs to shell out to git and do the usual output, check exit status, decode stdout dance.

Carol feels that the language is a joke: seriously, upon failing to read a file you get neither the filename nor backtrace by default?

The proposed API helps Bob a lot. If at some point we'll learn how to add more context to io::Errors out of the box, Carol will also benefit from it. Orthogonality and context factoring apply to Alice use-case, but it is already served by exiting APIs.

On that note, I'm curious how other language's convenience APIs for capturing the output of a subprocess handle stderr.

Good call! It's interesting that Python's check_output and check_call both just inherit stderr. Julia is still compiling Julia behaves the same. And Ruby as well. That's an argument for actually changing the semantics of read_stdout to not just forward to .output but to use a different streams setup:

  • output defaults to capturing both stout&stderr
  • read_stdout should probably default to capturing stdout and inhereting stderr

@matklad
Copy link
Member Author

matklad commented Sep 21, 2021

It's not that big deal

I've realized that that's actually a bigger deal:

use anyhow

// V1
let stdout = command.read_stdout()
    .with_context(|| format!("failed to run {:?}", comand))?;

// V2
let stdout = command.output()
    .with_context(|| format!("failed to run {:?}", comand))?
    .utf8_stdout()
    .with_context(|| format!("failed to run {:?}", comand))?;

@yaahc
Copy link
Member

yaahc commented Oct 5, 2021

The V2 version isn't as bad if you use try blocks but yea I see what you're saying, particularly the bit about wanting errors to be maximally useful by default. I worry about being able to pick a format that actually works for everyone. My fear being that it will be too long or will be missing env information that is relevant, but I think it's worth a try as long as it doesn't affect reporting of existing usages of Command.

If you're going to go with this argument though I don't think this PR should be merged as a half measure. I'd want to see this additional reporting of the command itself added to the new run method before landing it, along with it having read_stdout inherit stderr as you suggested.

This is similar in spirit to `fs::read_to_string` -- not strictly
necessary, but very convenient for many use-cases.

Often you need to write essentially a "bash script" in Rust which spawns
other processes. This usually happens in `build.rs`, tests, or other
auxiliary code. Using std's Command API for this is inconvenient,
primarily because checking for exit code (and utf8-validity, if you need
output) turns one-liner into a paragraph of boilerplate.

While there are crates.io crates to help with this, using them often is
not convenient. In fact, I maintain one such crate (`xshell`) and, while
I do use it when I am writing something substantial, for smaller things
I tend to copy-paste this *std-only* snippet:

https://github.com/rust-analyzer/rust-analyzer/blob/ae36af2bd43906ddb1eeff96d76754f012c0a2c7/crates/rust-analyzer/build.rs#L61-L73

So, this PR adds two convenience functions to cover two most common
use-cases:

* `run` is like `status`, except that it check that `status` is zero
* `read_stdout` is like `output`, except that it checks status and
  decodes to `String`. It also chomps the last newline, which is modeled
  after shell substitution behavior (`echo -n "$(git rev-parse HEAD)"`)
  and Julia's [`readchomp`](https://docs.julialang.org/en/v1/manual/running-external-programs/)

Note that this is not the ideal API. In particular, error messages do
not include the command line or stderr. So, for user-facing commands,
you'd have to write you own code or use a process-spawning library like
`duct`.
@rust-log-analyzer
Copy link
Collaborator

The job x86_64-gnu-llvm-10 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
Successfully built 7a97eca47e99
Successfully tagged rust-ci:latest
Built container sha256:7a97eca47e9905805142b5eff36151ecd31e4c9375040173447e068278ecdab9
Uploading finished image to https://ci-caches.rust-lang.org/docker/dfd7203a0b015711c96f25420d9cb51dd6d72a416dd27c32932eb6b4d7efea10392bba63f0eaa514ea019391488096f30c8a7ead06c758f8f033ddf38b7029a7
upload failed: - to s3://rust-lang-ci-sccache2/docker/dfd7203a0b015711c96f25420d9cb51dd6d72a416dd27c32932eb6b4d7efea10392bba63f0eaa514ea019391488096f30c8a7ead06c758f8f033ddf38b7029a7 Unable to locate credentials
[CI_JOB_NAME=x86_64-gnu-llvm-10]
---
   Compiling object v0.26.2
   Compiling std_detect v0.1.5 (/checkout/library/stdarch/crates/std_detect)
   Compiling hashbrown v0.11.0
   Compiling addr2line v0.16.0
error: 1 positional argument in format string, but no arguments were given
     |
     |
1090 |                 format!("command {:?} produced non-UTF-8 output"),

error[E0308]: mismatched types
    --> /checkout/library/alloc/src/macros.rs:112:9
     |
     |
109  | / macro_rules! format {
110  | |     ($($arg:tt)*) => {{
111  | |         let res = $crate::fmt::format($crate::__export::format_args!($($arg)*));
112  | |         res
     | |         ^^^ expected `&&'static str`, found struct `alloc_crate::string::String`
114  | | }
     | |_- in this expansion of `format!`
     |
    ::: library/std/src/process.rs:1090:17
    ::: library/std/src/process.rs:1090:17
     |
1090 |                   format!("command {:?} produced non-UTF-8 output"),

error[E0308]: mismatched types
    --> /checkout/library/alloc/src/macros.rs:112:9
     |
     |
109  | / macro_rules! format {
110  | |     ($($arg:tt)*) => {{
111  | |         let res = $crate::fmt::format($crate::__export::format_args!($($arg)*));
112  | |         res
     | |         ^^^ expected `&&'static str`, found struct `alloc_crate::string::String`
114  | | }
     | |_- in this expansion of `format!`
     |
    ::: library/std/src/process.rs:1103:25
    ::: library/std/src/process.rs:1103:25
     |
1103 |                           format!("command {:?} exited with non zero status ({})", self, code)

error[E0308]: mismatched types
    --> /checkout/library/alloc/src/macros.rs:112:9
     |
     |
109  | / macro_rules! format {
110  | |     ($($arg:tt)*) => {{
111  | |         let res = $crate::fmt::format($crate::__export::format_args!($($arg)*));
112  | |         res
     | |         ^^^ expected `&&'static str`, found struct `alloc_crate::string::String`
114  | | }
     | |_- in this expansion of `format!`
     |
    ::: library/std/src/process.rs:1105:29
    ::: library/std/src/process.rs:1105:29
     |
1105 |                       None => format!("command {:?} was terminated", self),

For more information about this error, try `rustc --explain E0308`.
error: could not compile `std` due to 4 previous errors
Build completed unsuccessfully in 0:00:20

let output = self.output()?;
self.check_status(output.status)?;
let mut stdout = output.stdout;
if stdout.last() == Some(&b'\n') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use strip.suffix instead? let stdout = stdout.strip_suffix(b'\n').unwrap_or(stdout) or something.

Ok(())
} else {
Err(io::Error::new_const(
io::ErrorKind::Uncategorized,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Uncategorized is right. IMO this should get its own ErrorKind. Perhaps you'd like to steal io::ErrorKind::SubprocessFailed from 88528a1 ?

})
}

fn check_status(&self, status: ExitStatus) -> io::Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this would be usefully pub.

/// # Ok::<(), std::io::Error>(())
/// ```
#[unstable(feature = "command_easy_api", issue = "none")]
pub fn read_stdout(&mut self) -> io::Result<String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshed: I suggest get_stdout for the name? To me, read_stdout fails to suggest that it also waits for termination.

@ijackson
Copy link
Contributor

ijackson commented Oct 6, 2021

I think these methods would be better on Output. That way you don't throw away information you might want to check when encountering an non-zero exit code.

Output has a missing-error-check a beartrap and is, I think, unfixable. So IMO .output() needs to be deprecated, or at least have a big health warning. So it needs to be replaced. I don't think it's worth trying toi improve it much. The .check_status() method can't be on Output anyway because Output doesn't have the command and arguments any more.

@ijackson
Copy link
Contributor

ijackson commented Oct 6, 2021

@matklad Thank you so much for this MR. IMO we absolutely definitely need both of the things you are doing here. See my tracking issue #73131.

I wonder if it would be easier to do your two halves in two separate MRs? I'm hoping run is less controversial. The output handling one is complex because of the need to make choices about UTF-8 or bytes, and naming.

I've made a few comments on details. See also #73126

@ijackson
Copy link
Contributor

ijackson commented Oct 6, 2021

My first issue is that run seems to be almost identical to .status()?.exit_ok()?; except that it uses a custom error message with less information than ExitStatusError and combines that into an io::Error. I don't think run is appreciably more convenient to use than this alternative, and between the two I'd prefer to keep the error types distinct than merge them.

I think run()s implementation here isn't quite right. I think it should have its own error type, not just use String. But the snippet .status()?.exit_ok()? is actually wrong. As you note in #88306 (review) it doesn't print the command that failed. What you have to do instead is some kind of .map_err() and .context() and so on - assuming you are using a portmanteau error type like anyhow or eyre. If you're not you don't have many good options.

I found @matklad's use case personae in #89004 (comment) very convincing. I have been (roughly speaking) all of those people.

Alice is well-served apart from the fact that output() has a lost error beartrap (#73131) and she, by her nature, will probably notice that and DTRT.

Bob and Carol are very ill-served. We should provide facilities for them that DTRT conveniently.

I think being able to conveniently bundle a subprocess failure as an io::Error is quite important f0or that. And this MR demonstrates that it can be done without compromising on error message quality. Of course someone like Alice won't want to use a cooked API like run but Alice will realise that and can use status() and so on.

Second, I agree with @the8472, read_line should be a method on Output not on Command, similar to how I prefer the exit_ok method on ExitStatus to run on Command.

See my complaints in #73131 about Output. I don't think it can be fixed without replacing it.

On that note, I'm curious how other language's convenience APIs for capturing the output of a subprocess handle stderr. Does Julia's readchomp throw away stderr? Does this change if the exit status indicates an error? Do they discard the stderr or forward it or something else entirely?

I can't speak to Julia. I have done a lot of this kind of thing in Perl and quite a bit in Tcl, and also some in Python. Frankly, this area is rather subtle and most languages have made a mess of it. Rust's current Command is not as bad as some, at least.

Perl's backquote operator has the command inherit the script's stderr. If you want to do something else you need to do a lot of hand-coding. It reports errors in a funky and not particularly convenient way (but, frankly, that is typical for Perl). It doesn't trim a final newline but Perl has a chomp operator that does that very conveniently.

Tcl's exec captures stderr by default (there are facilities for redirecting or inheriting it), calling any stderr an error (throwing a Tcl exception). It always calls nonzero exit status an error (and there is no easy way to get the stdout separately from the error in that situation). It unconditionally chomps a final newline.

Python3's subprocess.run fails to call nonzero exit status an exception. If you don't write explicit error handling code (highly unusual in Python) you have a lost output an unchecked exit status bug.

What I would like

  • run() roughly like this, only with a custom error type
  • get_output_bytes() gives you a Vec<u8>
  • get_output() converts to UTF-8, does not strip newline
  • get_output_line() calls it an error if the result is not precisely one newline-terminated line.
  • get_output_read() gives you impl Read which will wait on EOF, and then call any nonzero status an error. That lets you do error-checked streaming reading without risk of deadlock.
  • Strong health warning in the docs for output(), maybe even deprecation

@yaahc yaahc added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 8, 2021
@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 24, 2021
@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 13, 2021
@camelid camelid added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 10, 2021
@matklad
Copy link
Member Author

matklad commented Jan 19, 2022

Yeah, evidently I've run out of steam on this one, so closing :)

@matklad matklad closed this Jan 19, 2022
@ijackson
Copy link
Contributor

ijackson commented Jan 3, 2023

@matklad thanks for your hard work, and sorry if I contributed to bogging it down.

Please comment on my rust-lang/rfcs#3362

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants