fatal error when an ODT image cannot be fetched #4559

danse · 2018-04-17T09:48:15Z

when an ODT file contains an image that cannot be fetched and the --extract-media option is used, a runtime error prevents pandoc from writing any converted output. The following line is printed:

pandoc: media/.: openBinaryFile: inappropriate type (Is a directory)

The text was updated successfully, but these errors were encountered:

danse · 2018-04-17T09:57:07Z

the file 4559.odt.zip causes the error with pandoc version 2.1.4. to replicate the error rename the file to .odt and run the following:

 $ pandoc 4559.odt --extract-media media
[WARNING] Could not fetch resource './ObjectReplacements/Object 1': replacing image with description
pandoc: media/.: openBinaryFile: inappropriate type (Is a directory)

The warning is already the topic of issue #4344, while here the main problem is that no output file is produced, while it would be possible to handle the error in order to write the available content

jgm · 2018-04-19T18:43:44Z

Maybe @MarLinn can help here. I don't really understand the code in the Odt reader.

danse · 2018-04-23T15:19:28Z

the error is triggered when the ODT reader is used, but i think that it's triggered by extractMedia in Text.Pandoc.Class.

my current guess is that the ODT reader leaves an empty file path in the media bag when it fails dereferencing some media contents, and this causes an inappropriate type error in the body of writeMedia

jgm · 2018-04-23T18:40:36Z

It's probably relevant that in this case "Object 1" is a math formula (mathml), not an image. The ODT reader may be assuming that anything that looks like this refers to an image?

danse · 2018-04-24T07:41:25Z

good point, that could be the cause of the error. anyway the biggest problem here is that the error is fatal to the whole conversion document, so a big document containing a single failing reference like this will get all lost. if the exception is really triggered within extractMedia, it might be worth to make the function more robust by handling the case of empty paths, so that no reader can cause a similar fatal error in the future

jgm · 2018-04-24T17:38:12Z

It's probably worth putting a trace in extractMedia to inspect the contents of the mediabag when it's run with this odt. Then we'll understand better what is happening. Francesco Occhipinti <notifications@github.com> writes:

…

good point, that could be the cause of the error. anyway the biggest problem here is that the error is fatal to the whole conversion document, so a big document containing a single failing reference like this will get all lost. if the exception is really triggered within `extractMedia`, it might be worth to make the function more robust by handling the case of empty paths, so that no reader can cause a similar fatal error in the future -- You are receiving this because you commented. Reply to this email directly or view it on GitHub: #4559 (comment)

danse · 2018-04-24T17:46:17Z

i'm not used to debugging with haskell, thanks for the hint, it can be very
helpful in the future.

through experiments i found out that catching exceptions on the last line of writeMedia would enable us to save the rest of the converted document in similar cases.

i also verified that the error is caused by an empty subpath argument passed to writeMedia. we don't want to store malformed paths in the media bag in the first place, and this could be checked into fillMediaBag.

this case is more complex because the Odt reader uses insertMedia directly within Readers.Odt.ContentReader, and i think that the empty path is stored there. i think that the warning is shown by fillMediaBag and the media bag is not modified there, but it's modified independently in Readers.Odt.ContentReader.

so to summarise, catching exceptions at the end of writeMedia makes pandoc more robust and i will propose a pull request doing that. i wouldn't know how to cleanly keep empty paths out of the media bag since insertMedia is pure and i think that silently dropping empty paths wouldn't be that helpful in the long run, we want to show a warning when that happens.

i would close this issue once we manage to save the rest of the document, and open a new one dedicated to improve the conversion of this media element

if we do not catch these errors, any malformed entry in a media bag could cause the loss of a whole document output. an example of malformed entry is an entry with an empty file path

danse · 2018-05-02T19:13:53Z

in #4619 i propose a solution to recover the rest of the document when this error happens

if we do not catch these errors, any malformed entry in a media bag could cause the loss of a whole document output. an example of malformed entry is an entry with an empty file path

mb21 added format:ODT reader labels Apr 19, 2018

danse mentioned this issue May 3, 2018

catch IO errors when writing media files, closes #4559 #4619

Merged

jgm closed this as completed in 59f0c1d May 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fatal error when an ODT image cannot be fetched #4559

fatal error when an ODT image cannot be fetched #4559

danse commented Apr 17, 2018

danse commented Apr 17, 2018

jgm commented Apr 19, 2018

danse commented Apr 23, 2018

jgm commented Apr 23, 2018

danse commented Apr 24, 2018

jgm commented Apr 24, 2018 via email

danse commented Apr 24, 2018

danse commented May 2, 2018

fatal error when an ODT image cannot be fetched #4559

fatal error when an ODT image cannot be fetched #4559

Comments

danse commented Apr 17, 2018

danse commented Apr 17, 2018

jgm commented Apr 19, 2018

danse commented Apr 23, 2018

jgm commented Apr 23, 2018

danse commented Apr 24, 2018

jgm commented Apr 24, 2018 via email

danse commented Apr 24, 2018

danse commented May 2, 2018