Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance machine-readable logging, for programmatic invocation #6269

Closed
brainchild0 opened this issue Apr 9, 2020 · 23 comments
Closed

enhance machine-readable logging, for programmatic invocation #6269

brainchild0 opened this issue Apr 9, 2020 · 23 comments

Comments

@brainchild0
Copy link

brainchild0 commented Apr 9, 2020

The new support for defaults files greatly improves the feasibility of invoking Pandoc from a programmatic wrapper, as the entirety of the processing directives may be represented robustly, portably, conveniently, and reproducibly in a YAML-serialized data structure.

Invoking Pandoc in a robust programmatic wrapper still presents a substantial obstacle, however, because of the difficulties surrounding processing messages generated by the application, particularly those describing error conditions.

This obstacle might be diminished markedly by introducing support for error streams, defined as an optional mode in which error details, if any, are to be written as a YAML block, suitable for programmatic interpretation by the wrapper, to the standard error stream. Imagine that one of the options provided by the user is given as fuddle. Then consider the following result being written to standard error:

---
info:
  name: pandoc
  version: [2, 8]
  execpath: /usr/bin/pandoc
result:
  statusok: no
  errors:
    -
      type: unknownoption
      details:
        optionname: fuddle
---

Using an output pipe, combined with checking the exit status, the wrapper could process the specific cause of the error using reliable and clear logic, and if desired, generate an error message appropriate for the context and locale.

A wrapper seeking to offer a broad set of features, such as project files, having captured the above stream, might print to its own standard error:

Unknown option 'fuddle' for project target 'whatisafuddle'.

The message is human readable, accurate, clear, and independent of any that Pandoc might generate during normal use. It also includes contextual details relating to the particular invocation of Pandoc, but unknown directly to Pandoc during such invocation. Finally, it appears in any natural language supported by the wrapper, regardless of which the Pandoc installation is supporting.

Further, the wrapper requires no prior understanding that fuddle is not included in the set of valid options, and still, the effect for the user is an accurate result based on whichever options are understood by the currently-installed version of Pandoc. Similar effects are possible for missing options, ones with invalid values, and so on.

Naturally, if support is needed for the combination of multiple errors into a single response, the structured data model of YAML allows a solution that is easy and direct.

@jgm
Copy link
Owner

jgm commented Apr 9, 2020

Do you know about the --log option? pandoc --log will give you a JSON-readable file with all the errors, warnings, and info messages pandoc would normally stream to the command line.
Example:

[
    {
        "type": "NoLangSpecified",
        "verbosity": "INFO"
    },
    {
        "type": "NoTitleElement",
        "verbosity": "WARNING",
        "fallback": "-"
    }
]

@jgm
Copy link
Owner

jgm commented Apr 9, 2020

And pandoc --log=/dev/stderr will send this log to stderr.

@jgm
Copy link
Owner

jgm commented Apr 9, 2020

Using JSON for this rather than YAML makes sense, as JSON is easier to parse for machines.

@brainchild0
Copy link
Author

brainchild0 commented Apr 10, 2020

Do you know about the --log option? pandoc --log will give you a JSON-readable file

Not as such, but I now recall seeing it when I originally reviewed the documentation. Of course third-party integration depends on detailed documentation. I have not found any documentation of this kind yet, but I would be happy to follow any reference.

And pandoc --log=/dev/stderr will send this log to stderr.

Good, as long as other output is suppressed, or at least other output to the same stream. A programmatic mode would be one in which standard error receives only the structured data and standard output receives only the output document if it was requested as the output target. It would also require that the exit code of the application is 0 if and only if the output was generated without any errors.

@brainchild0
Copy link
Author

By the way, I personally find log a slightly misleading characterization of this feature, though I feel no need to push hard for changing it, especially in light of such a change causing a break in compatibility.

@jgm
Copy link
Owner

jgm commented Apr 10, 2020

So far the only documentation are the source files themselves.
https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Logging.hs
This gives you everything needed to put together documentation for these messages; perhaps you'd like to prepare a PR?

Note also that the --log only includes warning and info messages. If pandoc fails with a hard error, this doesn't currently go to log.

@brainchild0
Copy link
Author

brainchild0 commented Apr 10, 2020

If pandoc fails with a hard error, this doesn't currently go to log.

Reporting hard errors would be central to the objectives that prompted this discussion.

@jgm
Copy link
Owner

jgm commented Apr 10, 2020

We could look into logging the errors as well. The only issue I see there is whether the errors should also be reported to stderr.

  • If they aren't, people might think the command has worked when it hasn't (but maybe we rely on people using --log to be smart enough to check the return status).

  • If they are, then in the event that the log file is set to /dev/stderr, you'd get crosstalk on that stream.

@brainchild0
Copy link
Author

I'm not sure I identify a conflict.

Essentially there are two choices available:

  1. Whether the message output is a normally formatted error message suitable for reading by a human using the command prompt, or is a serialized representation of structured data, suitable for machine processing.
  2. Which file or stream is receiving the above.

A special case might be that both of the options in (1) are requested in separate streams. In all cases, however, the errors would be written somewhere.

@jgm
Copy link
Owner

jgm commented Apr 11, 2020

The issue is that I want human-readable error output to go to stderr in every case.
If you also sent a JSON-formatted log to stderr, you'll get garbled JSON if these interleave.
Personally I don't think it's so important to be able to send --log directly to stderr, though; you can always send it to a temp file and stream this to stderr in your wrapper.

@brainchild0
Copy link
Author

The issue is that I want human-readable error output to go to stderr in every case.

Why?

Personally I don't think it's so important to be able to send --log directly to stderr

I suppose it may not be critical, but since receiving such information is largely the purpose of stderr, how do you find this possibility not important?

@jgm
Copy link
Owner

jgm commented Apr 11, 2020

Why? Because humans are going to be reading it. The --log option creates a machine-readable log; it shouldn't also force machine-readable output from the command line, in my opinion.

@brainchild0
Copy link
Author

brainchild0 commented Apr 11, 2020

Because humans are going to be reading it.

Maybe, or maybe not.

it shouldn't also force machine-readable output from the command line

No, not force, just allow.

Your saying "from the command line" is begging the question. If an application is invoked by another within an automated workflow, then it is not being invoked from the command line.

The purpose of the idea is to facilitate an automated invocation, as by an editor or content manager. Obviously the objective to include such support is not also to eliminate interactive support. But as the possibilities may be general, requiring that all invocations create at least human-readable message output carries no greater promise than that they all create machine-readable message output. As earlier indicated, it would be a choice, of one, the other, or both.

See my earlier comment. Two orthogonal questions are considered: 1) How does the message output look, and 2) where is it sent?

@jgm
Copy link
Owner

jgm commented Apr 12, 2020

Perhaps I read too much into your request -- it has nothing to do with warnings (already handled by --log), but only concerns error messages, and would be satisfied by a command line option that caused error output to be in machine-readable form.

@brainchild0
Copy link
Author

brainchild0 commented Apr 12, 2020

Perhaps I read too much into your request -- it has nothing to do with warnings (already handled by --log), but only concerns error messages,

As far as I am concerned for this request, the value of certain warning messages is uncertain, not to be dismissed summarily, but the necessary inclusion of error messages is clear.

@brainchild0
Copy link
Author

brainchild0 commented Apr 28, 2020

I would suggest that information included in a message for the proposed functionality would be partitioned into the three categories, error, warning, and auxiliary information.

Because this precise distinction is already captured in the existing verbosity field, achieving the functionality might be feasible with very little or even no modification of the current structure, in which an overall message is represented as merely a list of component messages each of a particular type, where each type corresponds to one of the three verbosity levels. However, information about the Pandoc installation, such as version number, should be provided in some part of the overall message, when specifically requested if not by default. This information could easily be placed in a component message labelled info.

@brainchild0
Copy link
Author

brainchild0 commented Apr 29, 2020

For example, under some usage the message might begin as such:

[
    {
        "type": "ExecInfo",
        "verbosity": "INFO",
        "version": [
            2, 
            8
        ],
        "execpath": "/usr/bin/pandoc"
    },

@brainchild0 brainchild0 changed the title error streams enhance machine-readable logging, for programmatic invocation May 3, 2020
@tajmone
Copy link
Contributor

tajmone commented May 23, 2020

JSON-RPC

I like the idea of pandoc embracing new features to allow full programmatic automation. I think that adopting JSON-RPC as the default protocol for automation could be a good idea, leaving room for lot of potential development in the future.

Initially it might support just JSON-RPC over STD-I/O, and later on support might also be added for HTTP and WebSocket, which would allow to seamlessly control pandoc over the Internet, across Intranet servers, etc., using the same protocol for STD-I/O.

JSON-RPC is now being employed by many application as the protocol of choice for inter-applications communications due to universality of JSON and the widespread availability of JSON-RPC libraries, bindings and wrappers for many languages.

For example, many editors are now adding for support LSP (Language Server Protocol) to support syntax highlighters, IntelliSense and refactoring for languages and syntaxes. This means that most editors already support interfacing to third party tools via JSON-RPC, which means that if pandoc supported the protocol it could be tightly integrated with editors and IDE's functionality, and even be used by Language Servers to format documentation extracted from comments.

Various CLI and GUI apps are also shifting toward JSON-RPC to communicate with other apps, because the JSON-RPC 2.0 specs are completely platform-agnostic and allow end users to interface the application with almost anything (including smart household devices).

For example, the aria2 CLI downloader supports interfacing via JSON-RPC over HTTP and WebSocket, which allow controlling the app via a browser WebUI in localhost, or remotely from other machines.

IMO, pandoc could benefit from adopting JSON-RPC in multiple ways, programmatic automation being a notable example. In complex documentation builds, multiple pandoc instances could be controlled via JSON-RPC, allowing fine-grain control over the different stages of the build, and if running pandoc instances could also be queried via JSON-RPC during execution (e.g. to obtain internal info about the current document), paused and restarted, it would be possible to achieve powerful contextual-automation.

Early adoption of JSON-RPC as the standard interface for automation would allow non-destructive future development — i.e. the leap from basic STDERR reporting to the complex scenario describe above would be a backward-compatible extension of the protocol and automation system.

Since JSON is already the format of choice for this type of inter-app communications, adopting JSON-RPC would just require adding the protocol overlay to the plain STD-I/O stream.

@brainchild0
Copy link
Author

@tajmone: What is the advantage of using JSON-RPC for a standalone application, rather than in a distributed one? It seems you may be developing a network application. Perhaps these requirements are outside of the scope of the current issue, or the application generally. With basic support for structured input and output, you would be able to integrate with your own applications and servers.

@tajmone
Copy link
Contributor

tajmone commented May 23, 2020

Perhaps these requirements are outside of the scope of the current issue, or the application generally.

Indeed they are, but I mentioned that taking into account these future considerations could influence these early choices, to avoid having to switch in the future. I personally think that there's great potential in the adoption of JSON-RPC because it would allow inter-server communications (just think of the benefits of cross-repository CI tests and builds).

What is the advantage of using JSON-RPC for a standalone application, rather than in a distributed one?

I provided Aria2 as an example of a standalone application which supports JSON-RPC for automation.

It seems you may be developing a network application.

When I integrate pandoc in a GitHub repository and use Travis CI to test, build and deploy the documentation I am effectively using pandoc in the context of network applications. Hence my invitation to consider the potentiality to extend pandoc in that direction — a topic which is close to the automation error messaging you propose, if we're willing to broaden the view of what the future roadmap might eventual lead to.

Of course, just my personal ideas. Each pandoc user operates in his/her own context, so it's quite natural that everyone uses pandoc differently and has different views on how it could grow. Sharing such views is what ultimately allows to build a general picture of the multiple ways pandoc is being used (often stretching its limits by third party tools and hacks). Hopefully, these insights might contribute to solutions that are not preclusive to any of those uses, but pave the path for maximum usability.

@fakuivan
Copy link

Any updates on this? I'd like to integrate error parsing in a vs code task that uses pandoc. The way errors are printed to the console currently is quite hostile to regex parsing.

@mb21
Copy link
Collaborator

mb21 commented Feb 25, 2022

@fakuivan the --log option doesn't work for your use-case?

pandoc --log=/dev/stderr will send this log to stderr.

@fakuivan
Copy link

fakuivan commented Feb 25, 2022

@mb21 the --log option is completely ignored in the case there's a hard error, like a syntax error or a latex error. I don't understand what the thought was when designing --log, but having it behave differently depending on the input completely defeats its usefulness.

@jgm jgm closed this as not planned Won't fix, can't repro, duplicate, stale Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants