Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardise STDOUT output for CLI commands #22

Closed
emmacasolin opened this issue Feb 17, 2022 · 55 comments
Closed

Standardise STDOUT output for CLI commands #22

emmacasolin opened this issue Feb 17, 2022 · 55 comments
Assignees
Labels
development Standard development epic Big issue with multiple subissues r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management

Comments

@emmacasolin
Copy link

emmacasolin commented Feb 17, 2022

Specification

Our new commandments for a better, brighter, command line:

  • Send output to stdout. The primary output for your command should go to stdout. Anything that is machine readable should also go to stdout—this is where piping sends things by default.
  • Send messaging to stderr. Log messages, errors, and so on should all be sent to stderr. This means that when commands are piped together, these messages are displayed to the user and not fed into the next command.
  • When we are displaying output text to indicate success, we need to be brief. Feedback is always preferred, even though deviating from UNIX conventions.
  • The output from stderr can be redirected to /dev/null, but we're going to provide more verbosity options that suppresses all non-essential text (feedback, warnings, etc.).
  • We should ALWAYS be using camelCase for anything going to stdout. As we prioritize parsability, this needs to be the default. No more manually outputting strings like Name: amy.
  • Nested structures should always be avoided in dicts formats wherever possible. Instead, the nested structure should be spread into the data.
  • Streaming with the JSON format should output the objects in JSONL format. To parse it with jq, one needs to use --slurp

STDOUT

Traditionally, we have constructed human readable dictionaries as such:

process.stdout.write(
  binUtils.outputFormatter({
    type: options.format === 'json' ? 'json' : 'list',
    data: [`Root certificate:\t\t${response.getCert()}`],
  }),
);

This is clunky, inconsistent, worse for parsability and more...

In all cases where the output is "semantically" a list, the output should use either json or list. In all other cases, the list option should be replaced with dict. This should be done like this:

process.stdout.write(
  binUtils.outputFormatter({
    type: options.format === 'json' ? 'json' : 'dict',
    data: { rootCertificate: response.getCert() },
  }),
);

The data parameter must be kept as close to the return value of the RPC call possible. This ensures that our output is predictable. Furthermore

STDERR

Log messages, errors, and so on should all be sent to stderr. This means that when commands are piped together, these messages are displayed to the user and not fed into the next command.

Most STDERR messages should either be using the raw or list format. This is because they are intended to be human readable. Under json format, the data should always be { message: "..." }.

process.stderr.write(
  binUtils.outputFormatter({
    type: 'list',
    data: ['Message 1'],
  }),
);
process.stderr.write(
  binUtils.outputFormatter({
    type: 'raw',
    data: 'Message 1',
  }),
);
process.stderr.write(
  binUtils.outputFormatter({
    type: 'json',
    data: { message: "Message 1" },
  }),
);

STDIN

STDIN should always be checked for interactivity before asking prompts. This means that if a Polykey Client session has not already been opened, we should error before asking for a password prompt on a non-interactive terminal.

If input or output is a file, support - to read from stdin or write to stdout. This lets the output of another command be the input of your command and vice versa, without using a temporary file. For example:

cat secret.txt | polykey secret create - vault:secret

Piping

There is currently issues with piping input into Polykey-CLI: #139

TBD...

Additional context

Tasks

  1. Ensure that the output formatter can correctly format output for all possible format options without modifying content
  2. Replace all usage of the list format option with dict (except in situations where the output is more semantically suited to a list)
  3. Ensure that there are no tabs or spaces in json output (as this needs to be machine readable)
  4. Replace usage of the output formatter with just outputting a single string in cases where this is appropriate
  5. Implement -q argument to suppress non-essential output.
  6. Review commands for new specification changes
  • Agent - 7/7
  • Bootstrap - 1/1
  • Identities - 13/13
  • Keys - 12/12
  • Nodes - 6/6
  • Notifications - 3/3
  • Secrets - 12/12
  • Vaults - 12/12
@emmacasolin emmacasolin added the development Standard development label Feb 17, 2022
@CMCDragonkai
Copy link
Member

CMCDragonkai commented Feb 18, 2022

I was thinking that for commands that output a stream of data, the human readable formats don't currently have anything separating each "unit" of output. That is, for dict, we might see something like:

pid	3497062
nodeId	vng8qsa39osoqrj3o1ftnbgig80drej2on6fok35jvsm5457up72g
pid	3497062
nodeId	vng8qsa39osoqrj3o1ftnbgig80drej2on6fok35jvsm5457up72g
pid	3497062
nodeId	vng8qsa39osoqrj3o1ftnbgig80drej2on6fok35jvsm5457up72g

Which represents 3 units of output.

In JSON, this is distinguished simply by the newline:

{"pid":3497062,"nodeId":"vng8qsa39osoqrj3o1ftnbgig80drej2on6fok35jvsm5457up72g"}
{"pid":3497062,"nodeId":"vng8qsa39osoqrj3o1ftnbgig80drej2on6fok35jvsm5457up72g"}
{"pid":3497062,"nodeId":"vng8qsa39osoqrj3o1ftnbgig80drej2on6fok35jvsm5457up72g"}

So for the human readable output, it may be a good idea to have a double line separation between each unit. The end result should look like:

pid	3497062
nodeId	vng8qsa39osoqrj3o1ftnbgig80drej2on6fok35jvsm5457up72g

pid	3497062
nodeId	vng8qsa39osoqrj3o1ftnbgig80drej2on6fok35jvsm5457up72g

pid	3497062
nodeId	vng8qsa39osoqrj3o1ftnbgig80drej2on6fok35jvsm5457up72g

And this would be similar for other kinds of outputs like list:

a
b
c

a
b
c

a
b
c

But to do this, you'd have to add \n after each process.stdout.write, but only when it's not json output, since json output now already has a \n appended afterwards in the output formatter, and double lines is not valid for jsonlines format convention.

Something like this:

for (const _ of [1,2,3,4]) {
  process.stdout.write(binUtils.outputFormatter({ ... }));
  if (options.format !== 'json') process.stdout.write('\n');
}

To make this more robust, we may add a new output formatter async generator (consumer), that takes input and makes these decisions automatically rather than having to put it in each command.

@CMCDragonkai CMCDragonkai added the epic Big issue with multiple subissues label Feb 18, 2022
@CMCDragonkai
Copy link
Member

According to MatrixAI/Polykey#446 (comment) our dictionary formatter doesn't support recursive dictionaries.

I think we need to special case handling of object values, that is if they don't have a proper toString format, then we should be providing a recursive outputting of that object.

It could work through simply calling the outputFormatter again on that.

@CMCDragonkai
Copy link
Member

Actually to handle the additional indentation, one has to keep track of how many levels of indentation required as a prefix to each line. This may require some additional bookkeeping. A sort of indentation counter.

However if the object has a custom toString defined, we just use that. See: https://stackoverflow.com/a/47903498/582917 for the distinction. That way if a custom object has a toString() then it can be used directly.

So generally we would be sending POJOs to the output formatter.

@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management label Jul 10, 2023
@CMCDragonkai
Copy link
Member

This issue is to be moved to Polykey-CLI.

@CMCDragonkai CMCDragonkai transferred this issue from MatrixAI/Polykey Aug 22, 2023
@addievo
Copy link
Contributor

addievo commented Oct 26, 2023

table format needs an update for different types of values

For instance for this value:

NodeId vq480ofld7vrol0ura1viif9474pb7ptrqm9s8ld23vn5t92oif4g, Address 127.0.0.1:55551, bucketIndex 249

The table output, outputs like this:

Private Zenhub Image

Which is not ideal.

@CMCDragonkai
Copy link
Member

Prefer to upload images to GH, not zenhub, otherwise the uploads could break.

You should see that table output should always take an array of dictionaries. Then the initial keys of the first dictionary should be used as the columns.

It should be tab separated as per CLI standardisation between each cell.

@CMCDragonkai
Copy link
Member

There is a bit of a problem with maintaining padding though, if you have to loop over all records to find out the maximum size of each value, which is a bit annoying.

Basically I would want the CLI output to be a valid TSV (tab separated values).

However I'm not sure about tabs and newlines within the data field.

See: http://dataprotocols.org/linear-tsv/ and https://github.com/eBay/tsv-utils/blob/master/docs/comparing-tsv-and-csv.md

@CMCDragonkai
Copy link
Member

I think a quoted version of TSV would be suitable. If the field ends up having a tab or newline, it would be wrapped in quotes and printed out with \t or \n. Similar to how the values were printed in the current agent status.

@CMCDragonkai
Copy link
Member

Imagine:

first	second	third
"abc\t\nfoo"	123	lol

Of course you can see that even using tabs is misaligned. Usually the CLI tool column is supposed to be piped to understand it.

@CMCDragonkai
Copy link
Member

You can do a 1 iteration, scanning over all values and keeping a max-counter for the longest field in each column, then apply appropriate space padding whenever you're outputting them in the CLI.

That should be fast enough, printing anything in table format then becomes a O(2n) process.

@CMCDragonkai
Copy link
Member

For streaming outputs you can keep a running max length and adjust accordingly for all subsequent new data.

@CMCDragonkai CMCDragonkai assigned addievo and unassigned tegefaulkes Oct 26, 2023
@addievo addievo mentioned this issue Oct 31, 2023
11 tasks
@addievo
Copy link
Contributor

addievo commented Oct 31, 2023

@CMCDragonkai the issue with stream based padding, i.e. updating counter for every item in stream is that, that would have to be implemented on a case-to-case basis, since the data is fed into output formatter after it has been all collected and added to an array.

@CMCDragonkai
Copy link
Member

I don't believe that is true.

@addievo
Copy link
Contributor

addievo commented Oct 31, 2023

table has been recently modified in #45

To be TSV compatible, allow custom options, like custom handlers, and optional numbered rows. Also, table is now O(2n) for streams.

@CMCDragonkai
Copy link
Member

Now that #50 got merged, @amydevs can you review the OP spec of this issue and report on how it has been closed, and what has been updated?

@CMCDragonkai CMCDragonkai assigned amydevs and unassigned addievo Nov 8, 2023
@CMCDragonkai
Copy link
Member

This is the original issue for standardising the design of the CLI stdout.

See my notes here for the recent changes: #44 (comment)

There's quite a few important points there, that requires updating the OP spec here.

@CMCDragonkai
Copy link
Member

  1. PK commands are likely to be combined in various ways in the shell. This means things like cmd $(pk ...), cmd <(pk ...), pk ... | cmd, cmd | pk ..., pk ... > ./blah, pk ... <<<blahblah. This means at any point in time where this occurs, we don't actually want to quote things unless the expected output in the case of dict and table fundamentally requires disambiguation between the usage of \n and \t formatting characters.
  2. There is only 3 output formats that make use of formatting characters, and that's list, dict and table. In all 3 cases, they \n, and for dict and table they use \t. Our table format should use the Linear TSV specification: http://dataprotocols.org/linear-tsv/. The dict format however does not an equivalent, and we just use \t to separate between keys and values, and to pad the keys, and each line is then terminated with a \n. Quotation is needed for the keys and values if they contain control or non-printable characters.
  3. Beware of unicode non-printable characters, they tend to appear like \u134908. However a question must be answered here, how do we deal with these? In some cases it make sense to always print them with the \u quotation, otherwise they are not printable in any sense.
  4. List output only uses \n as delimiter. What if the list value itself has a \n internally? Then it likely needs to be quoted too. We could use "\n" in that scenario. Consider that when printing out the list of certificates, the certificate PEM string itself may already have \n in it. When printing, do we print one line at a time for each certificate? It would look strange. Do we just print the \n and thus realise that on the input side, there's no guarantee that 1 line is 1 field.

The main issue I'm wrestling with here is that when quotation is used, it impacts piping, composition, as in it complicates it requiring the input side to realise how to deal with the quoted form.

I think the main problem is that we have a CLI with a STDOUT that could be going to the terminal - an interactive thing, or to non-interactive outputs.

We can actually detect if the output fd is interactive or non-interactive. See: https://github.com/sindresorhus/is-interactive and we can make decisions here.

@CMCDragonkai
Copy link
Member

Another thing is that when the format is changed to JSON, then the expected output is 1 JSON object or multiple JSON objects?

See: http://dataprotocols.org/ndjson/

I think: list, table all mean multiple json objects.

Whereas dict is a single json object.

In what sense do some list outputs translate to a JSON output? Needs to be investigated.

@CMCDragonkai
Copy link
Member

So the output formatter can be applied multiple times, so it is possible to have multiple tables.

@amydevs
Copy link
Contributor

amydevs commented Feb 22, 2024

There are also some intermittent failures of an output formatter that was updated recently. You can look into that as well.

@tegefaulkes Was there a specific command that it failed on?

@tegefaulkes
Copy link
Contributor

It may have been a test in tests/utils.test.ts. It will fail sometimes if you keep running the tests.

@amydevs
Copy link
Contributor

amydevs commented Feb 22, 2024

@tegefaulkes I've pushed a commit to staging that fixes it.

The error happened on seed 1318872640 for bin/utils › encodeEscaped should encode all escapable characters.

The reason was that \ characters needed to be encoded as \\

@amydevs
Copy link
Contributor

amydevs commented Feb 22, 2024

We also need to make standardize whether commands that either succeed or fail, should have an output at all. And if they do have an output, what the output should be if the json output format is selected. An example is the identities allow command, where no output is given on success. The identities invite command on the other hand, has a message that is displayed on success.

I am in favor of having feedback on the success of a command, but there will need to be discussion on the standardization of the JSON output.

@CMCDragonkai
Copy link
Member

Usually status reports for to stderr even on success. This is allow automation of the stdout. That's to separate general messages versus an actual output of the command.

@CMCDragonkai
Copy link
Member

So yes we can definitely have stderr report useful messages. However right now it is only used for exception formatting. But you could create one as a general reporting for stderr. And yes we should align that with error reporting formatting.

@amydevs
Copy link
Contributor

amydevs commented Feb 22, 2024

I also think we should more definitively decide whether our CLI is non-interactive, interactive, or both. We should also have a clear spec for this. This also especially relates to #22.

The AWS cli, being a industry standard example of an non-interactive CLI, will not output anything on successful commands. This is inline with any other CLI tool. If I were to run touch text.txt, there would be no output. Authentication is done by setting envvars/configs/running cli commands that set said config. The expected flow, is that the user gets an authentication token from the service, and copies it in. Therefore, we should not only consider if our command is interactive in the sense of whether the environment is an interactive shell, but also whether interactivity is happening.

The GitLab cli, is completely interactive. Every command has feedback, even things like animated loading spinners. This is much like using apt over apt-get.

The Pulumi cli has is able to do both, so that it can both be used as an interactive CLI, and also as non-interactive for CI purposes. An example of authenticating using both CLI modes:

image
image

@CMCDragonkai
Copy link
Member

If the feedback is on stderr, then I think the distinction isn't necessary.

@CMCDragonkai
Copy link
Member

So the idea is to keep it non-interactive as per awscli, but then provide feedback on stderr for both success and fail. Then we use the -v verbose flags to affect client side reporting.

@CMCDragonkai
Copy link
Member

Meaning -v gets used for both controlling the agent reporting level when we do agent start, but for other commands affects the client reporting level.

Of course default level starts at info. We don't have a silent switch to go down 1 level. Could be done with an additional flag.

@amydevs
Copy link
Contributor

amydevs commented Feb 22, 2024

So the idea is to keep it non-interactive as per awscli, but then provide feedback on stderr for both success and fail. Then we use the -v verbose flags to affect client side reporting.

sounds good. So all feedback should always be kept as stderr?

@amydevs
Copy link
Contributor

amydevs commented Feb 26, 2024

Reference for CLI Guidelines:

https://clig.dev/

@amydevs
Copy link
Contributor

amydevs commented Feb 29, 2024

help should never be written to stderr:

https://news.ycombinator.com/item?id=37682859

@amydevs
Copy link
Contributor

amydevs commented Feb 29, 2024

there needs more discussion on the output of identity get command. At the moment, it's a bit confusing what each line refers to, as the output omits the keys of each value. I think we should use a dict format instead and include the keys.

@amydevs
Copy link
Contributor

amydevs commented Feb 29, 2024

@tegefaulkes ^

@amydevs
Copy link
Contributor

amydevs commented Feb 29, 2024

nevermind, i'm changing it over, because i realised that that is what identities search does at the moment, they should be consistent

@amydevs
Copy link
Contributor

amydevs commented Feb 29, 2024

actually there does need to be discussion.

For long running commands under JSON format, like identities search, the output will be lines of json objects like {}\n{}\n{}.... However, for identities get, since it is not long running, and the value is resolved immediately, we could have the output of the identies be just an array, like [{}, {}, {}, etc]. For long running, should we manually add brackets around the output to have an array of objects? It feels like a bit of a hack, but i'm not sure what would be the best solution. @CMCDragonkai @tegefaulkes

@tegefaulkes
Copy link
Contributor

We use a stream json parser for js-rpc. I think it can be configured to work both ways where it could parse [{},{},...] OR {},{},.... So I think both can be valid.I'm not sure which is preferred however.

@CMCDragonkai
Copy link
Member

Json list format is literally newline separated JSON. Are you talking about human format? Or json format? If json format, then yes array output may be expected for batch commands. Stream commands use json list format with newline separation.

@CMCDragonkai
Copy link
Member

help should never be written to stderr:

https://news.ycombinator.com/item?id=37682859

I like the third comment down. But not sure if commander works that way.

@tegefaulkes
Copy link
Contributor

We should look into how we format and add information to the help text as well. There is a commander function addHelpText('after', 'text') but it's pretty simple. Formatting will have to be done ourselves and it's subject to word wrapping in the terminal. So it can be a little hard to read some times.

@amydevs
Copy link
Contributor

amydevs commented Mar 14, 2024

we have commands that will output multiple objects separated by new lines as such {}\n{}\n. jq has a hard time parsing this. I think we should standardize all our streamed outputs to be finalised as [{},{}] so that jq would be able to parse them without rearranging them too much

@amydevs
Copy link
Contributor

amydevs commented Mar 14, 2024

function outputStreamTransformJson(): TransformStream<any, string> {
  let firstChunk = true;
  return new TransformStream({
    start(controller) {
      controller.enqueue('[');
    },
    transform(chunk, controller) {
      const output = outputFormatterJson(chunk, { trailingNewLine: false });
      if (firstChunk) {
        controller.enqueue(output);
        firstChunk = false;
      } else {
        controller.enqueue(',' + output);
      }
    },
    flush(controller) {
      controller.enqueue(']\n');
    },
  });
}

I've written a utility function that creates a transformstream for this purpose.

@amydevs
Copy link
Contributor

amydevs commented Mar 14, 2024

we have commands that will output multiple objects separated by new lines as such {}\n{}\n. jq has a hard time parsing this. I think we should standardize all our streamed outputs to be finalised as [{},{}] so that jq would be able to parse them without rearranging them too much

it turns out, that jq does have a --stream option, but it will output an entirely [key, value] format. This is obviously inconvenient for most users, so most people probably wont use this.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Mar 14, 2024 via email

@CMCDragonkai
Copy link
Member

@amydevs regarding JSONL - did you review my original comment starting here #22 (comment).

There is a need to have JSONL format - it does make sense for commands that considered "streaming commands" as opposed to batch commands - they do work differently.

Consider the MatrixAI/Polykey#237 as well - that has to factor into streaming commands - so that one can convert to paginated results too.

@CMCDragonkai
Copy link
Member

Close this if done @amydevs

@amydevs
Copy link
Contributor

amydevs commented Mar 27, 2024

Close this if done @amydevs

The only part that hasn't been done is the verbosity of feedback messages. I think it's worth it to have a separate issue and PR for it, so I'll close this for now.

@amydevs amydevs closed this as completed Mar 27, 2024
@CMCDragonkai
Copy link
Member

Can you start a new issue for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development epic Big issue with multiple subissues r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management
Development

No branches or pull requests

5 participants