Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal error when an ODT image cannot be fetched #4559

Closed
danse opened this issue Apr 17, 2018 · 8 comments
Closed

fatal error when an ODT image cannot be fetched #4559

danse opened this issue Apr 17, 2018 · 8 comments

Comments

@danse
Copy link
Contributor

danse commented Apr 17, 2018

when an ODT file contains an image that cannot be fetched and the --extract-media option is used, a runtime error prevents pandoc from writing any converted output. The following line is printed:

pandoc: media/.: openBinaryFile: inappropriate type (Is a directory)
@danse
Copy link
Contributor Author

danse commented Apr 17, 2018

the file 4559.odt.zip causes the error with pandoc version 2.1.4. to replicate the error rename the file to .odt and run the following:

 $ pandoc 4559.odt --extract-media media
[WARNING] Could not fetch resource './ObjectReplacements/Object 1': replacing image with description
pandoc: media/.: openBinaryFile: inappropriate type (Is a directory)

The warning is already the topic of issue #4344, while here the main problem is that no output file is produced, while it would be possible to handle the error in order to write the available content

@jgm
Copy link
Owner

jgm commented Apr 19, 2018

Maybe @MarLinn can help here. I don't really understand the code in the Odt reader.

@danse
Copy link
Contributor Author

danse commented Apr 23, 2018

the error is triggered when the ODT reader is used, but i think that it's triggered by extractMedia in Text.Pandoc.Class.

my current guess is that the ODT reader leaves an empty file path in the media bag when it fails dereferencing some media contents, and this causes an inappropriate type error in the body of writeMedia

@jgm
Copy link
Owner

jgm commented Apr 23, 2018

It's probably relevant that in this case "Object 1" is a math formula (mathml), not an image. The ODT reader may be assuming that anything that looks like this refers to an image?

@danse
Copy link
Contributor Author

danse commented Apr 24, 2018

good point, that could be the cause of the error. anyway the biggest problem here is that the error is fatal to the whole conversion document, so a big document containing a single failing reference like this will get all lost. if the exception is really triggered within extractMedia, it might be worth to make the function more robust by handling the case of empty paths, so that no reader can cause a similar fatal error in the future

@jgm
Copy link
Owner

jgm commented Apr 24, 2018 via email

@danse
Copy link
Contributor Author

danse commented Apr 24, 2018

i'm not used to debugging with haskell, thanks for the hint, it can be very
helpful in the future.

through experiments i found out that catching exceptions on the last line of writeMedia would enable us to save the rest of the converted document in similar cases.

i also verified that the error is caused by an empty subpath argument passed to writeMedia. we don't want to store malformed paths in the media bag in the first place, and this could be checked into fillMediaBag.

this case is more complex because the Odt reader uses insertMedia directly within Readers.Odt.ContentReader, and i think that the empty path is stored there. i think that the warning is shown by fillMediaBag and the media bag is not modified there, but it's modified independently in Readers.Odt.ContentReader.

so to summarise, catching exceptions at the end of writeMedia makes pandoc more robust and i will propose a pull request doing that. i wouldn't know how to cleanly keep empty paths out of the media bag since insertMedia is pure and i think that silently dropping empty paths wouldn't be that helpful in the long run, we want to show a warning when that happens.

i would close this issue once we manage to save the rest of the document, and open a new one dedicated to improve the conversion of this media element

danse added a commit to italia/pandoc that referenced this issue May 2, 2018
if we do not catch these errors, any malformed entry in a media bag
could cause the loss of a whole document output. an example of
malformed entry is an entry with an empty file path
danse added a commit to italia/pandoc that referenced this issue May 2, 2018
if we do not catch these errors, any malformed entry in a media bag
could cause the loss of a whole document output. an example of
malformed entry is an entry with an empty file path
@danse
Copy link
Contributor Author

danse commented May 2, 2018

in #4619 i propose a solution to recover the rest of the document when this error happens

danse added a commit to italia/pandoc that referenced this issue May 3, 2018
if we do not catch these errors, any malformed entry in a media bag
could cause the loss of a whole document output. an example of
malformed entry is an entry with an empty file path
danse added a commit to italia/pandoc that referenced this issue May 4, 2018
if we do not catch these errors, any malformed entry in a media bag
could cause the loss of a whole document output. an example of
malformed entry is an entry with an empty file path
@jgm jgm closed this as completed in 59f0c1d May 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants