Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Option for Lossyness Reports #3392

Closed
tajmone opened this issue Jan 29, 2017 · 14 comments
Closed

Add Option for Lossyness Reports #3392

tajmone opened this issue Jan 29, 2017 · 14 comments
Milestone

Comments

@tajmone
Copy link
Contributor

tajmone commented Jan 29, 2017

Because pandoc’s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. [...] While conversions from pandoc’s Markdown to all formats aspire to be perfect, conversions from formats more expressive than pandoc’s Markdown can be expected to be lossy.

Currently there is no way to know if a given pandoc conversion is lossy or not. It would be nice to have an option to perform a dry-run conversion and display a report on elements loss, either:

  • The conversion from one format to another did not involve any loss of element, and a standard message of losslessness is displayed (on STDERR); or
  • Conversion to a less expressive format resulted in one or more elements being left out, flattened, assimilated into similar elements, removed, or whatever. A standard message of lossynes is displayed (on STDERR), and (optionally) a resume of the lost elements and their context (on STDOUT).

This option could be helpful to test formats before proceding with the actual conversion. Sometimes we simply get confused about the multiple formats, and might forget that a given element won't render in another format. The lossyness warning would be a better solution than manually checking if every element is present in the final output.

This would also be useful in big projects (especially if script-automated, like API documentation, etc), it would allow users to check (and control) wether elements are lost during the pipeline, and take counter measures if there are --- eg: a pre-conversion check might block a specific release if a lossyness warning is raised by pandoc, allowing maintainers to edit the source docs so that only elements that can make it through the conversion line are used.

@jgm
Copy link
Owner

jgm commented Jan 29, 2017

The new architecture we have in the typeclass branch (future pandoc 2.0) makes it much easier for all readers and writers to issue warnings and info messages. So this opens the way to add informative messages to all readers when there is lossiness. (Of course, adding these would be a nontrivial amount of work.) These info-level messages would be enabled by the --verbose option.

@jgm
Copy link
Owner

jgm commented Jan 29, 2017

Of course comments welcome on the proposed system. What I have now is a logging mechanism with ERROR, WARNING, INFO, and DEBUG levels. The user will be able to select the level of verbosity. I also have a flag to treat warnings as errors; perhaps it would be worth while having another option to treat info messages as errors? Or perhaps lossiness indications should be warnings? (There will likely be many of them.)

@tajmone
Copy link
Contributor Author

tajmone commented Jan 29, 2017

The ideal is to have a system that could please both humans and scripts: the former with readability in mind, the latter intended for parsable output.

Exit Codes

At its very basic, an exit error level 0=lossless, >=1=lossy should satisfy both humans and scripts. Lossyness could be represented by setting flags on exit error. I don’t have a clear picture of all the possible losses an element can undergo in various output formats, but I am assuming these could all be loss cases (and some aproximate descriptors):

  • deletion: the whole element is lost during format translation. (eg: footnotes, a table?)
  • flattening/normalization: the element’s style if discarded but the text retained. (eg: striked text become plain)
  • conversion/assimilation: an element’s style is rendered with an aproximately similar style. (eg: inline code as bold)

Similar information is what the user might be looking for at the highest level, eg: if pandoc reports a >=1 exit code for the convertion, we check the various flags that make up the reternud code to check the presence of the above type of losses. Maybe in a given context any losses that don’t imply deletion of elements are ok and the conversion should go ahead.

So really, the difference between what is a warning or an error might be subjective according to usage context and expectations. But generally, I’d say that deletion of contents are more critical than style changes or removals.

Custom Reader/Writers

From the upcoming 2.0 changes you’ve mentioned, I then assume that also custom readers and writers will be able to employ this system. I’ve worked on a markdown to BBCode custom writer, and implement a manual warning sytem along these lines: table are lost completely, inline code is converted to bold, headers become bold text with different sizes, and so on. So, if this system is to be extended to custom reader and writers then it would need to consider all possible descriptors for lossyness cases.

Reports in JSON + Human Readable Format

As for the verbose report on the details of losses, JSON would be a good format for a scripted automation pipeline, and the same JSON structure could be printed out in human-readable markdown-formatted reported on request.

The JSON report could group losses according to loss-types, and for each loss provide a reference to the line in the original source, the original element, an maybe a string with the starting text that is affected (this is intended only for the human-readable version).

Eg: requesting human-readable report:

LOSSES REPORT:

- deletions (2)
- normalizations (4)
- conversions (11)

# DELETIONS

1.  ELEMENT DELETED: `table`
    LINE(S): 48-67.
    TEXT: "Table of Elements"

… just a speculative example, but it might represent the convience of having some standard to handle both JSON representation and a human readable mardkwon report (that should be easy to read also in terminal, as raw txt).

@jgm
Copy link
Owner

jgm commented Jan 29, 2017 via email

@jgm jgm added this to the pandoc 2.0 milestone Jan 29, 2017
jgm added a commit that referenced this issue Feb 10, 2017
jgm added a commit that referenced this issue Feb 10, 2017
This now contains the Verbosity definition previously
in Options, as well as a new LogMessage datatype that
will eventually be used instead of raw strings for
warnings.

This will enable us, among other things, to provide
machine-readable warnings if desired.

See #3392.
jgm added a commit that referenced this issue Feb 10, 2017
This gives us the possibility of both machine-readable
and human-readable output for log messages.

See #3392.
@jgm
Copy link
Owner

jgm commented Feb 17, 2017

I've added the framework for this (much better warnings about omitted content + machine-readable warnings + an option to generate an error status code if there are warnings).

I've also added more warnings to readers and writers, so one now gets much fuller information (especially with --verbose). However, we're still pretty far from giving complete information about what is omitted/changed.

Eventually we should add warnings to all writers for raw blocks/inlines that are not rendered (because the formats don't match). Currently we've got this for the following writers:
docbook, docx, fb2, haddock, html, icml, latex, man, markdown, opendocument, rtf, texinfo.

To add to the other writers, we need to do a bit of replumbing so that the writers are in PandocMonad.

@jgm
Copy link
Owner

jgm commented Feb 25, 2017

TODO:

Convert these writers to use PandocMonad:

  • asciidoc
  • commonmark
  • context
  • dokuwiki
  • epub
  • mediawiki
  • odt
  • opml
  • org
  • rst
  • tei
  • textile
  • zimwiki

Also:

  • Add warning report when skylighting returns an error.
  • Add a warning category for cases where we're not ignoring content, but interpreting it differently, e.g. underline as emphasis.
  • DocBook reader - warnings for unsupported elements

@nnmrts
Copy link

nnmrts commented Jun 16, 2017

Hi! I hope this is related enough.

I'm currently working on an open source book and it has a build script in it's directory, so users don't have to directly type in the pandoc commands. But the source code is structured like this: Every chapter has it's own markdown file and they get converted to one big markdown file with pandoc in this build script. However, the footnotes in every chapter start at 1 and not at the number from the chapter before + 1. I don't want to change this, that way it's just easier to work with. So when someone executes the build script, pandoc throws a bunch of warnings about duplicate footnotes. The actual output is fine, because pandoc is that smart to fix these footnotes.

But the warnings are still here. They could confuse users and I don't see an option to disable warnings, but in my opinion this is an important feature. At least for me. :D

So yeah, it would be cool, if you could add this feature to your todo-list. :)

@jgm
Copy link
Owner

jgm commented Jun 17, 2017 via email

@nnmrts
Copy link

nnmrts commented Jun 30, 2017

I have checked carefully, the output is totally fine. And all three chapters have identical footnotes. See, it works perfectly, just the warnings could confuse people.

https://github.com/nnmrts/dafern/tree/master/src - these are the source markdown files
https://github.com/nnmrts/dafern/tree/master/build - these are the built files (html, md and pdf)
https://github.com/nnmrts/dafern/blob/master/build.ps1 - this is the build script

The script is spaghetti code, I know. :D

But the command is basically:

pandoc metadata.md chapter1.md chapter2.md chapter3.md -o book.md

The only relevant settings are --atx-headers --wrap=none --preserve-tabs, but I don't think they make a change.

And this already works. The footnotes are correct and then I just convert the book.md to html and pdf and I'm done.

@Wolf-SO
Copy link

Wolf-SO commented Jun 30, 2017

@nnmrts As I see it, the footnotes are not fine. I checked the PDF version and clicked on the 1st footnote in of 1st chapter at

Ich würde mich trotzdem noch darüber beschweren1

and was sent to the footnote of chapter 3

1Das war halt auch einfach nicht so geil.

Maybe you should check this again.

(Übrigens: Spannende Unternehmung, Dein Buch)

@jgm
Copy link
Owner

jgm commented Jun 30, 2017 via email

@nnmrts
Copy link

nnmrts commented Jun 30, 2017

@Wolf-at-SO Also ist anscheinend nur das PDF kaputt. Okay, danke, das ist mir tatsächlich nicht aufgefallen, weil ich kaum auf die Fußnoten draufgeklickt hatte. Umso interessanter, dass die Markdown-Datei funktioniert. Das HTML ist auch kaputt, sehe ich gerade, obwohl ich schwören könnte, dass ich das schon mal genau mit dem Build-Prozess hinbekommen habe. Das ist weird. Naja. (Danke!)

translation:
Oh, okay, the pdf is not fine. Thanks, and sorry. I've never recognized it, because I rarely clicked on the footnotes. Interesting all the more, considering the markdown file is fine. The html file isn't fine too, but if I remember rightly, I already got it to work with the same build script. Well...weird.

@jgm Thank you very much, this will probably help me in the future. But as it seems, I need the warnings now even more than before, until I get my build script to work. :D

So yeah, sorry, I should have checked the other files more carefully. Thanks for the help anyway! :)

EDIT: So, locally my files are great, on github they are all not fine, not even the markdown file. The markdown file I have locally is working, but I haven't changed it since my last commit, so...
I have some bigger issues here...

@nnmrts
Copy link

nnmrts commented Jun 30, 2017

So I fixed it now, using a version-like notation, like [^1.1] in chapter one, or [^3.4] in chapter three. Output is like expected, with incremental and not per-chapter footnotes. Awesome, didn't know that this can be so easy.

Thank you two again! 💓

@jgm
Copy link
Owner

jgm commented Aug 9, 2017

Well, there are still lots more things we could warn about.
But I'm going to close this now, since we have a framework in place which can be incrementally improved.

@jgm jgm closed this as completed Aug 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants