Add Option for Lossyness Reports #3392

tajmone · 2017-01-29T09:03:57Z

Because pandoc’s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. [...] While conversions from pandoc’s Markdown to all formats aspire to be perfect, conversions from formats more expressive than pandoc’s Markdown can be expected to be lossy.

Currently there is no way to know if a given pandoc conversion is lossy or not. It would be nice to have an option to perform a dry-run conversion and display a report on elements loss, either:

The conversion from one format to another did not involve any loss of element, and a standard message of losslessness is displayed (on STDERR); or
Conversion to a less expressive format resulted in one or more elements being left out, flattened, assimilated into similar elements, removed, or whatever. A standard message of lossynes is displayed (on STDERR), and (optionally) a resume of the lost elements and their context (on STDOUT).

This option could be helpful to test formats before proceding with the actual conversion. Sometimes we simply get confused about the multiple formats, and might forget that a given element won't render in another format. The lossyness warning would be a better solution than manually checking if every element is present in the final output.

This would also be useful in big projects (especially if script-automated, like API documentation, etc), it would allow users to check (and control) wether elements are lost during the pipeline, and take counter measures if there are --- eg: a pre-conversion check might block a specific release if a lossyness warning is raised by pandoc, allowing maintainers to edit the source docs so that only elements that can make it through the conversion line are used.

The text was updated successfully, but these errors were encountered:

jgm · 2017-01-29T09:46:09Z

The new architecture we have in the typeclass branch (future pandoc 2.0) makes it much easier for all readers and writers to issue warnings and info messages. So this opens the way to add informative messages to all readers when there is lossiness. (Of course, adding these would be a nontrivial amount of work.) These info-level messages would be enabled by the --verbose option.

jgm · 2017-01-29T09:55:03Z

Of course comments welcome on the proposed system. What I have now is a logging mechanism with ERROR, WARNING, INFO, and DEBUG levels. The user will be able to select the level of verbosity. I also have a flag to treat warnings as errors; perhaps it would be worth while having another option to treat info messages as errors? Or perhaps lossiness indications should be warnings? (There will likely be many of them.)

tajmone · 2017-01-29T10:52:48Z

The ideal is to have a system that could please both humans and scripts: the former with readability in mind, the latter intended for parsable output.

Exit Codes

At its very basic, an exit error level 0=lossless, >=1=lossy should satisfy both humans and scripts. Lossyness could be represented by setting flags on exit error. I don’t have a clear picture of all the possible losses an element can undergo in various output formats, but I am assuming these could all be loss cases (and some aproximate descriptors):

deletion: the whole element is lost during format translation. (eg: footnotes, a table?)
flattening/normalization: the element’s style if discarded but the text retained. (eg: striked text become plain)
conversion/assimilation: an element’s style is rendered with an aproximately similar style. (eg: inline code as bold)

Similar information is what the user might be looking for at the highest level, eg: if pandoc reports a >=1 exit code for the convertion, we check the various flags that make up the reternud code to check the presence of the above type of losses. Maybe in a given context any losses that don’t imply deletion of elements are ok and the conversion should go ahead.

So really, the difference between what is a warning or an error might be subjective according to usage context and expectations. But generally, I’d say that deletion of contents are more critical than style changes or removals.

Custom Reader/Writers

From the upcoming 2.0 changes you’ve mentioned, I then assume that also custom readers and writers will be able to employ this system. I’ve worked on a markdown to BBCode custom writer, and implement a manual warning sytem along these lines: table are lost completely, inline code is converted to bold, headers become bold text with different sizes, and so on. So, if this system is to be extended to custom reader and writers then it would need to consider all possible descriptors for lossyness cases.

Reports in JSON + Human Readable Format

As for the verbose report on the details of losses, JSON would be a good format for a scripted automation pipeline, and the same JSON structure could be printed out in human-readable markdown-formatted reported on request.

The JSON report could group losses according to loss-types, and for each loss provide a reference to the line in the original source, the original element, an maybe a string with the starting text that is affected (this is intended only for the human-readable version).

Eg: requesting human-readable report:

LOSSES REPORT:

- deletions (2)
- normalizations (4)
- conversions (11)

# DELETIONS

1.  ELEMENT DELETED: `table`
    LINE(S): 48-67.
    TEXT: "Table of Elements"

… just a speculative example, but it might represent the convience of having some standard to handle both JSON representation and a human readable mardkwon report (that should be easy to read also in terminal, as raw txt).

jgm · 2017-01-29T19:23:04Z

+++ Tristano Ajmone [Jan 29 17 02:52 ]:

At its very basic, an exit error level 0=lossless, >=1=lossy should satisfy both humans and scripts.

No, because it's standard in unix for 0 to mean "exited without errors"; warnings shouldn't cause non-0 exit codes unless a special flag is used, as I suggested.

to check the presence of the above type of losses. Maybe in a given context any losses that don’t imply deletion of elements are ok and the conversion should go ahead.

The way compilers usually handle this fine-grained discrimination is by allowing each type of warning to be selectively enabled or disabled by a command-line flag.

From the upcoming 2.0 changes you’ve mentioned, I then assume that also custom readers and writers will be able to employ this system. I’ve

I haven't really thought about how to do this, but yes, I think it should be possible to expose these functions to lua.

See #3392.

This now contains the Verbosity definition previously in Options, as well as a new LogMessage datatype that will eventually be used instead of raw strings for warnings. This will enable us, among other things, to provide machine-readable warnings if desired. See #3392.

This gives us the possibility of both machine-readable and human-readable output for log messages. See #3392.

See #3392.

jgm · 2017-02-17T22:43:05Z

I've added the framework for this (much better warnings about omitted content + machine-readable warnings + an option to generate an error status code if there are warnings).

I've also added more warnings to readers and writers, so one now gets much fuller information (especially with --verbose). However, we're still pretty far from giving complete information about what is omitted/changed.

Eventually we should add warnings to all writers for raw blocks/inlines that are not rendered (because the formats don't match). Currently we've got this for the following writers:
docbook, docx, fb2, haddock, html, icml, latex, man, markdown, opendocument, rtf, texinfo.

To add to the other writers, we need to do a bit of replumbing so that the writers are in PandocMonad.

jgm · 2017-02-25T20:53:00Z

nnmrts · 2017-06-16T22:59:13Z

Hi! I hope this is related enough.

I'm currently working on an open source book and it has a build script in it's directory, so users don't have to directly type in the pandoc commands. But the source code is structured like this: Every chapter has it's own markdown file and they get converted to one big markdown file with pandoc in this build script. However, the footnotes in every chapter start at 1 and not at the number from the chapter before + 1. I don't want to change this, that way it's just easier to work with. So when someone executes the build script, pandoc throws a bunch of warnings about duplicate footnotes. The actual output is fine, because pandoc is that smart to fix these footnotes.

But the warnings are still here. They could confuse users and I don't see an option to disable warnings, but in my opinion this is an important feature. At least for me. :D

So yeah, it would be cool, if you could add this feature to your todo-list. :)

jgm · 2017-06-17T05:53:09Z

My guess is that the output is not fine; you'll be getting footnotes, but not the right footnotes, since pandoc will use only one of the footnotes labeled "1". Have you checked carefully? You could try using --file-scope (see the manual). +++ Nano Miratus [Jun 16 17 15:59 ]:

…

Hi! I hope this is related enough. I'm currently working on an open source book and it has a build script in it's directory, so users don't have to directly type in the pandoc commands. But the source code is structured like this: Every chapter has it's own markdown file and they get converted to one big markdown file with pandoc in this build script. However, the footnotes in every chapter start at 1 and not at the number from the chapter before + 1. I don't want to change this, that way it's just easier to work with. So when someone executes the build script, pandoc throws a bunch of warnings about duplicate footnotes. The actual output is fine, because pandoc is that smart to fix these footnotes. But the warnings are still here. They could confuse users and I don't see an option to disable warnings, but in my opinion this is an important feature. At least for me. :D So yeah, it would be cool, if you could add this feature to your todo-list. :) — You are receiving this because you commented. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread. References 1. #3392 (comment) 2. https://github.com/notifications/unsubscribe-auth/AAAL5KqJveo3mtGxy_D5196CopV8kJ5Zks5sEwjCgaJpZM4LwveC

nnmrts · 2017-06-30T13:18:41Z

I have checked carefully, the output is totally fine. And all three chapters have identical footnotes. See, it works perfectly, just the warnings could confuse people.

https://github.com/nnmrts/dafern/tree/master/src - these are the source markdown files
https://github.com/nnmrts/dafern/tree/master/build - these are the built files (html, md and pdf)
https://github.com/nnmrts/dafern/blob/master/build.ps1 - this is the build script

The script is spaghetti code, I know. :D

But the command is basically:

pandoc metadata.md chapter1.md chapter2.md chapter3.md -o book.md

The only relevant settings are --atx-headers --wrap=none --preserve-tabs, but I don't think they make a change.

And this already works. The footnotes are correct and then I just convert the book.md to html and pdf and I'm done.

Wolf-SO · 2017-06-30T13:58:19Z

@nnmrts As I see it, the footnotes are not fine. I checked the PDF version and clicked on the 1st footnote in of 1st chapter at

Ich würde mich trotzdem noch darüber beschweren1

and was sent to the footnote of chapter 3

1Das war halt auch einfach nicht so geil.

Maybe you should check this again.

(Übrigens: Spannende Unternehmung, Dein Buch)

jgm · 2017-06-30T15:26:54Z

If you just want to turn off warnings, you can use `--quiet`.

nnmrts · 2017-06-30T16:36:51Z

@Wolf-at-SO Also ist anscheinend nur das PDF kaputt. Okay, danke, das ist mir tatsächlich nicht aufgefallen, weil ich kaum auf die Fußnoten draufgeklickt hatte. ~~Umso interessanter, dass die Markdown-Datei funktioniert.~~ Das HTML ist auch kaputt, sehe ich gerade, obwohl ich schwören könnte, dass ich das schon mal genau mit dem Build-Prozess hinbekommen habe. Das ist weird. Naja. (Danke!)

_translation:
Oh, okay, the pdf is not fine. Thanks, and sorry. I've never recognized it, because I rarely clicked on the footnotes. ~~Interesting all the more, considering the markdown file is fine.~~ The html file isn't fine too, but if I remember rightly, I already got it to work with the same build script. Well...weird.

@jgm Thank you very much, this will probably help me in the future. But as it seems, I need the warnings now even more than before, until I get my build script to work. :D

So yeah, sorry, I should have checked the other files more carefully. Thanks for the help anyway! :)

EDIT: So, locally my files are great, on github they are all not fine, not even the markdown file. The markdown file I have locally is working, but I haven't changed it since my last commit, so...
I have some bigger issues here...

nnmrts · 2017-06-30T16:56:35Z

So I fixed it now, using a version-like notation, like [^1.1] in chapter one, or [^3.4] in chapter three. Output is like expected, with incremental and not per-chapter footnotes. Awesome, didn't know that this can be so easy.

Thank you two again! 💓

jgm · 2017-08-09T00:01:31Z

Well, there are still lots more things we could warn about.
But I'm going to close this now, since we have a framework in place which can be incrementally improved.

jgm added this to the pandoc 2.0 milestone Jan 29, 2017

jgm added a commit that referenced this issue Feb 9, 2017

LaTeX reader: Issue warnings when skipping unknown latex commands.

87507e1

See #3392.

jgm added a commit that referenced this issue Feb 10, 2017

HTML reader: Added warnings for ignored material.

a84a360

See #3392.

jgm added a commit that referenced this issue Feb 10, 2017

Logging: added ToJSON instance and showLogMessage.

8ad7e2c

This gives us the possibility of both machine-readable and human-readable output for log messages. See #3392.

jgm added a commit that referenced this issue Feb 11, 2017

Added --log option to save log messages in JSON format to a file.

a6c649c

See #3392.

hftf mentioned this issue Feb 24, 2017

Consistent underline for Readers #2270

Merged

jgm closed this as completed Aug 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Option for Lossyness Reports #3392

Add Option for Lossyness Reports #3392

tajmone commented Jan 29, 2017

jgm commented Jan 29, 2017

jgm commented Jan 29, 2017

tajmone commented Jan 29, 2017

jgm commented Jan 29, 2017 via email

jgm commented Feb 17, 2017

jgm commented Feb 25, 2017 •

edited

Loading

nnmrts commented Jun 16, 2017

jgm commented Jun 17, 2017 via email

nnmrts commented Jun 30, 2017

Wolf-SO commented Jun 30, 2017 •

edited

Loading

jgm commented Jun 30, 2017 via email

nnmrts commented Jun 30, 2017 •

edited

Loading

nnmrts commented Jun 30, 2017

jgm commented Aug 9, 2017

Add Option for Lossyness Reports #3392

Add Option for Lossyness Reports #3392

Comments

tajmone commented Jan 29, 2017

jgm commented Jan 29, 2017

jgm commented Jan 29, 2017

tajmone commented Jan 29, 2017

Exit Codes

Custom Reader/Writers

Reports in JSON + Human Readable Format

jgm commented Jan 29, 2017 via email

jgm commented Feb 17, 2017

jgm commented Feb 25, 2017 • edited Loading

nnmrts commented Jun 16, 2017

jgm commented Jun 17, 2017 via email

nnmrts commented Jun 30, 2017

Wolf-SO commented Jun 30, 2017 • edited Loading

jgm commented Jun 30, 2017 via email

nnmrts commented Jun 30, 2017 • edited Loading

nnmrts commented Jun 30, 2017

jgm commented Aug 9, 2017

jgm commented Feb 25, 2017 •

edited

Loading

Wolf-SO commented Jun 30, 2017 •

edited

Loading

nnmrts commented Jun 30, 2017 •

edited

Loading