Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reference] fix image embedding in DOCX output #40

Closed
wants to merge 7 commits into from

Conversation

vsmalladi
Copy link
Collaborator

@vsmalladi vsmalladi commented Jul 22, 2017

Do not merge: See discussion below. Pull request is now for reference.

Fix export of svg images to eps images for word document. All links currently have to be local
manuscript.docx

import subprocess
import os
import sys
from pandocfilters import toJSONFilter, Para, Image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is pandocfilters getting installed? Should it be added to environment.yml?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be. I just forgot it. Will add it in.

except OSError:
mtime = -1
if mtime < os.path.getmtime(src):
cmd_line = ['inkscape', option[0], eps_name, file_name]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method depend on inkscape? That's a pretty heavy dependency? Can inkscape be installed via conda?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhimmel Yest this method does depend on inkscape. There seems to have been a working conda install, but that seems broken now inkscape-feedstock.

@@ -5,4 +5,4 @@ The figures can be referenced in the text by using `@fig:label`.

Figure @fig:googletrends shows the interest for "Sci-Hub" and "LibGen" over time.

![Google Trends Search interest for Sci-Hub and LibGen.](https://cdn.rawgit.com/greenelab/scihub/7891082161dbcfcd5eeb1d7b76ee99ab44b95064/explore/trends/google-trends.svg){#fig:googletrends}
![Google Trends Search interest for Sci-Hub and LibGen.](images/google-trends.svg){#fig:googletrends}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the method work with URL images?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhimmel Yes it can work with url's just need to modify the code to regex check if url.

@@ -52,6 +52,8 @@ then
--to=docx \
--filter pandoc-fignos \
--filter pandoc-tablenos \
--filter pandoc-img-glob \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--filter pandoc-img-glob

@dhimmel is restructuring the relative path to the image for docx conversion to absolute paths relative to markdown. Without it I get these errors:

[pandoc warning] Could not find image "images/image2.png", skipping...

@dhimmel
Copy link
Member

dhimmel commented Jul 22, 2017

Fix export of svg images to eps images for word document

Would it be possible to convert to PDF rather than EPS? I think EPS has some issues with support for transparency... See how grid lines are blocked behind rectangles in manuscript.docx?

eps-screenshot

Worst case could always rasterize to PNG.

@dhimmel
Copy link
Member

dhimmel commented Jul 23, 2017

@vsmalladi thanks a lot for helping us find workaround for the SVG issues. Before you go further with this pull request, I should note that this may not be a feature we want to merge for the following reasons:

  1. it's a rather heavy workaround for upstream issues that may end up resolving over time
  2. it only corrects the SVG issues for DOCX output, which is currently optional, and deficient in other ways.
  3. the inkscape dependency is potentially problematic
  4. if broad compatibility in image export is required, authors could always switch to PNG images or generate the compatible formats themselves.

Correct me on any of these considerations if I'm wrong. Also I'd like @agitter's sense whether I'm being too picky here. I am really interested in resolving our image issues. But we have to balance the benefit-to-complexity ratio carefully.

At a minimum, we can leave this PR open and reference it in the README as a workaround for users where this functionality is essential. So why this is in no way a final decision at this point, I wanted to bring it up in case it changes your calculus on how much time to spend here.

@vsmalladi
Copy link
Collaborator Author

@dhimmel

  1. it's a rather heavy workaround for upstream issues that may end up resolving over time

Agreed it is heavy and maybe there is a a better way that would resolve this later

  1. it only corrects the SVG issues for DOCX output, which is currently optional, and deficient in other ways.
  1. if broad compatibility in image export is required, authors could always switch to PNG images or generate the compatible formats themselves.

Actually this can also work for PDF files, I just wanted to try it with DOCX output first. Other implementation have other use cases. So users could change the behavior they would require.

fmt_to_option = {
    "latex": ("--export-pdf","pdf"),
    "beamer": ("--export-pdf","pdf"),
    #use PNG because EMF and WMF break transparency
    "docx": ("--export-png", "png"),
    #because of IE
    "html": ("--export-png", "png")
}

I will explore a little more. But I think its at least a workaround that we can reference.

@vsmalladi
Copy link
Collaborator Author

Updated Document using PDF for images, fixes transparency.
manuscript.docx

@agitter
Copy link
Member

agitter commented Jul 23, 2017

I agree that the SVG issues are important, so thanks for working on this.

Updated Document using PDF for images, fixes transparency. manuscript.docx

@vsmalladi should I be able to see the figure in this latest version? Is the image not embedded in the docx file? In Word 2013 on Windows I get:

@dhimmel I don't have a firm opinion on merging this or the direction we want to go with images. I think it depends in part on whether we expect users will mostly include images using URLs pointing to third party resources or include the images in their manuscript's repository. If they are managing the images themselves in the repo, then we can state the current limitations and suggest suitable file formats and workarounds. If they are going to be incorporating third party images via URL, users might expect that the build system is robust enough to work for most image types. That would involve a heavier build system and perhaps inkscape or some graphics software. As long as we can still set up the environment completely within conda, I'm okay with more dependencies.

@vsmalladi
Copy link
Collaborator Author

vsmalladi commented Jul 23, 2017

@agitter You should be able to see the image. I though it was embedded in the docx file. I uploaded another version. I have had no issues with opening in microsoft365 on mac

manuscript.docx

@agitter
Copy link
Member

agitter commented Jul 23, 2017

@vsmalladi that version of the docx works for me. The icons look like they didn't convert perfectly. They are warped and cropped a little on the right.
image

@vsmalladi
Copy link
Collaborator Author

@agitter I will look into that. Might be an issue with image formatting in word. Hopefully a quick fix.

Also working on seeing how to export to PDF without going to the HTML first.

@agitter
Copy link
Member

agitter commented Jul 23, 2017

Also working on seeing how to export to PDF without going to the HTML first.

@vsmalladi before moving away from wkhtmltopdf, please see some of of the PDF problems we had in #18. wkhtmltopdf isn't a perfect long term solution, but we don't want to to reintroduce those other issues, such as metadata problems.

@vsmalladi
Copy link
Collaborator Author

we don't want to to reintroduce those other issues, such as metadata problems.

@agitter agreed. I will experiment and see what problem is fixes and what new problems it might introduce.

@vsmalladi
Copy link
Collaborator Author

@agitter Fixed the ratio of the images in the word document, but had to specify width in the markdown.

![ORCID icon](images/orcid.svg){height="13px" width="13px"}

manuscript.docx

@agitter
Copy link
Member

agitter commented Jul 24, 2017

The images are broken for me again in this latest version.

@vsmalladi
Copy link
Collaborator Author

@agitter I see what the issue is. The mac word saves the document in compatibility mode, which makes it a doc file compatibility with word 97-2003. If you convert, in word, then the document saves to docx format and zips the images into the document. This is a little annoying but workable.

manuscript.docx

@agitter
Copy link
Member

agitter commented Jul 24, 2017

@vsmalladi I'm not sure what's going on. If I open your latest docx file as a zip file, I see a word/media subdirectory with the four images as pdfs. All four look good. But they still don't appear in the document.

In document.xml I see <pic:cNvPr descr="/Users/Venkat/Project/manubot-rootstock/content/images/orcid.pdf" id="0" name="Picture" />. However, in document.xml.rels I see <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Id="rId22" Target="media/rId22.pdf" />. I won't have time to dig much deeper into this.

@vsmalladi
Copy link
Collaborator Author

@agitter Thanks for looking into this. I will look closer at what is causing this issue.

@vsmalladi
Copy link
Collaborator Author

@agitter What version of word are you using to open the document and what system?

@agitter
Copy link
Member

agitter commented Jul 26, 2017

@vsmalladi I'm on Word 2013 on Windows 10.

I looked at the version that did work to see if anything jumped out in the xml. The figures had been converted to emf files in word/media instead of pdf. document.xml.rels referenced those, e.g.

<Relationship Id="rId7" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.emf"/>

Those are referenced in document.xml, e.g. <a:blip r:embed="rId7"/> but document.xml also still had references to pdfs on your machine, e.g.

<pic:cNvPr id="0" name="Picture" descr="/Users/Venkat/Project/manubot-rootstock/content/images/orcid.pdf"/>

@vsmalladi
Copy link
Collaborator Author

@agitter I am still investigating, but decided to follow up on your emf vs pdf issue. I have made both versions. Let me know if the images can be opened in either one?
manuscript_emf.docx
manuscript_pdf.docx

@agitter
Copy link
Member

agitter commented Jul 28, 2017

@vsmalladi The images in manuscript_emf.docx work for me. The images in manuscript_pdf.docx do not.

@dhimmel
Copy link
Member

dhimmel commented Jul 28, 2017

With LibreOffice on Linux (Version: 5.3.1.2 Build ID: 1:5.3.1-0ubuntu2), the images display in both manuscript_emf.docx and manuscript_pdf.docx for me. Although, both images are horizontally squished.

@agitter
Copy link
Member

agitter commented Jul 28, 2017

I tested manuscript_emf.docx and manuscript_pdf.docx in Google Drive and saw the same thing I had locally. Images in manuscript_pdf.docx were broken. Also, the table was broken in both versions.

These docx issues are making me less excited about supporting that as one of our build formats. I don't think we want to continue debugging multiple docx editors. @vsmalladi do you think we should work toward a specific editor or subset of editors? Or continue to recommend copy/paste from HTML?

@vsmalladi
Copy link
Collaborator Author

@agitter The more and more issues we are seeing I am not sure we can support docx as a format right now. I think we can open and issue and post some examples of how this could be done. And let users decided on their own.

Thoughts?

@agitter
Copy link
Member

agitter commented Jul 29, 2017

@vsmalladi I agree that general docx support is too much to take on based on your experiments. We can illustrate what is known to work - combinations of image types and docx editors - but not support all combinations.

@dhimmel Where should this information go? Would you like to leave this pull request open, create a new issue, or something else?

@dhimmel dhimmel changed the title Fix for image embedding in word document [Reference] fix image embedding in DOCX output Jul 29, 2017
@dhimmel
Copy link
Member

dhimmel commented Jul 29, 2017

Would you like to leave this pull request open, create a new issue, or something else?

I updated the PR title and added a note to the first comment. Adding a note to the DOCX output documentation in the README that references this PR is what I'd recommend.

dhimmel pushed a commit that referenced this pull request Jul 30, 2017
dhimmel pushed a commit that referenced this pull request Jul 30, 2017
This build is based on
247fdbe.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/manubot-rootstock/builds/259134255
https://travis-ci.org/greenelab/manubot-rootstock/jobs/259134256

[ci skip]

The full commit message that triggered this build is copied below:

README: reference DOCX image embedding PR (#43)

Refs #40
dhimmel pushed a commit that referenced this pull request Jul 30, 2017
This build is based on
247fdbe.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/manubot-rootstock/builds/259134255
https://travis-ci.org/greenelab/manubot-rootstock/jobs/259134256

[ci skip]

The full commit message that triggered this build is copied below:

README: reference DOCX image embedding PR (#43)

Refs #40
[{{author.github}}](https://github.com/{{author.github}})
{%- endif %}
{%- if author.twitter is defined %}
· ![Twitter icon](images/twitter.svg){height="13px"}
· ![Twitter icon](images/twitter.svg){height="13px" width="13px"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vsmalladi would specifying width="13px" for these SVGs be valuable independent of the rest of the changes in this PR? These SVGs export to DOCX currently, just they are misshapen?

If these changes are valuable alone, you should open a quick PR with just those changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhimmel I will test them out based on the updates and see if works.

dhimmel pushed a commit that referenced this pull request Oct 5, 2017
Now width and height are both specified at 13 pixels, to constrain the aspect ratio of these SVGs as square. Previously, the icons appeared squished in DOCX exports. See #40
dhimmel pushed a commit that referenced this pull request Oct 5, 2017
This build is based on
aba5246.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/manubot-rootstock/builds/283792648
https://travis-ci.org/greenelab/manubot-rootstock/jobs/283792649

[ci skip]

The full commit message that triggered this build is copied below:

Specify width of front-matter SVGs (#79)

Now width and height are both specified at 13 pixels, to constrain the aspect ratio of these SVGs as square. Previously, the icons appeared squished in DOCX exports. See #40
dhimmel pushed a commit that referenced this pull request Oct 5, 2017
This build is based on
aba5246.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/manubot-rootstock/builds/283792648
https://travis-ci.org/greenelab/manubot-rootstock/jobs/283792649

[ci skip]

The full commit message that triggered this build is copied below:

Specify width of front-matter SVGs (#79)

Now width and height are both specified at 13 pixels, to constrain the aspect ratio of these SVGs as square. Previously, the icons appeared squished in DOCX exports. See #40
dhimmel pushed a commit to greenelab/meta-review that referenced this pull request Oct 23, 2017
Now width and height are both specified at 13 pixels, to constrain the aspect ratio of these SVGs as square. Previously, the icons appeared squished in DOCX exports. See manubot/rootstock#40
@dhimmel
Copy link
Member

dhimmel commented Dec 19, 2019

With Pandoc 2.9 (and possibly before), SVGs are converting to DOCX fine on my system. This probably depends on whether rsvg-convert is available on the system. Closing this PR since it seems that pandoc can now convert SVGs natively without this filter.

@dhimmel dhimmel closed this Dec 19, 2019
@adebali
Copy link
Contributor

adebali commented Jan 5, 2020

With Pandoc 2.9 (and possibly before), SVGs are converting to DOCX fine on my system. This probably depends on whether rsvg-convert is available on the system. Closing this PR since it seems that pandoc can now convert SVGs natively without this filter.

I cannot verify this on macOS. I have pandoc 2.9.1 installed. rsvg-convert is available on the system through librsvg. Still, no SVG image displayed in DOCX.

ploegieku added a commit to ploegieku/2023-functional-homology-paper that referenced this pull request Aug 6, 2024
Now width and height are both specified at 13 pixels, to constrain the aspect ratio of these SVGs as square. Previously, the icons appeared squished in DOCX exports. See manubot/rootstock#40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants