-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Editing reference.docx with Word for Mac 2011 means pandoc generates malformed docx #3322
Comments
What version of pandoc are you using?
+++ rbubley [Dec 20 16 09:38 ]:
… I created a reference.docx with
pandoc --print-default-data-file reference.docx > reference.docx.
Using this to generate a document with pandoc works fine.
I then opened the reference.docx file, dirtied it (by replacing the 'e'
in Hello World with another 'e', and resaved. ([1]reference.docx)
Now generating a document with pandoc produces a malformed docx
([2]test.docx): when opening in Word it says, "The Open-XML file
test.docx cannot be opened because there are problems with the contents
or the file name might contain invalid characters (for example, /).
Details: Microsoft Office cannot open this file because some parts are
missing or invalid."
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, [3]view it on GitHub, or [4]mute the
thread.
References
1. https://github.com/jgm/pandoc/files/664355/reference.docx
2. https://github.com/jgm/pandoc/files/664359/test.docx
3. #3322
4. https://github.com/notifications/unsubscribe-auth/AAAL5EMQuBmOP08MF7xGuNuvSxhrDv1oks5rKBKhgaJpZM4LSGcj
|
|
I have a variation on the same problem I think.
Using Microsoft Word for Mac, Version 15.32 (the latest). Opening the generated file ("attempt to recover"), nothing seems obviously amiss. I extracted the reference.docx contents before/after to compare. Unfortunately, all files essentially have changed (beyond whitespace), so it's not obvious to me what's going on in the contents. I did see that Word removed the file "footnotes.xml.rels", but added "endnotes.xml" instead. I've attached the modified reference.docx. |
@richarddb You shouldn't use footnotes or anything complex in your reference.docx. |
@richarddb I tried creating a document using your attached reference.docx as a reference doc. The document pandoc produced opened fine with MS Word for Mac 15.31. |
I couldn't reproduce what @rbubley reports, either. |
Hmmmm... Mysterious. I've created a short video capture of the process I used, and done so on some isolated example files that may help you reproduce. I've put it all on Dropbox here: https://www.dropbox.com/sh/7yg4r5dqkhna047/AADFvmb4YkUcvwHzKB25YTipa?dl=0 Please give me a shout if there's anything you'd like me to do to debug further. |
I also tried opening the file in Word for Windows. That's at least more helpful in giving an error message. See the screenshot Windows Word Error Message.png -- but it's basically complaining about the "Endnotes 1". |
I have the same problem: if I edit reference.docx MSWord claims that the .docx file produced by pandoc is corrupt. It will then say that there is readable content in the file and I can open the file, rename it and save it and work on it. I want to edit reference.docx to change all the styles but a very simple edit - removing the period from "Hello World" produces the same effect. |
Version of MS Word is not critical, nor is changing the file. I get the same behaviour with MS Word 2011 (Version 14.7.2) and the latest MS Word (Version 15.32). I attach the two corrupt reference.docx files. reference.docx.Word14.7.2.docx |
@andrewderrington I can use your "corrupt" reference.docx to produce a docx using pandoc that Word opens without problems. Perhaps it matters what is in the file you're converting using this reference docx? (I tried it on the pandoc MANUAL.txt.) |
Wow! That sounds like a complicated problem.
I have tried some fairly simple files. Would you like to send me the one you converted successfully? Would you like me to send you one of my simple failures?
I was wondering if it has to do with Word preferences.
Andrew
… On 9 Apr 2017, at 19:50, John MacFarlane ***@***.***> wrote:
@andrewderrington I can use your "corrupt" reference.docx to produce a docx using pandoc that Word opens without problems. Perhaps it matters what is in the file you're converting using this reference docx? (I tried it on the pandoc MANUAL.txt.)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@andrewderrington The file I converted was MANUAL.txt from the pandoc repository. |
I have uploaded them
this command
pandoc -s -S --normalize -f markdown -t odt -o manual.odt manual.txt
produces the attached file, which is deemed to be corrupt by my version of word.
… On 9 Apr 2017, at 20:10, John MacFarlane ***@***.***> wrote:
@andrewderrington <https://github.com/andrewderrington> The file I converted was MANUAL.txt from the pandoc repository.
If you'd like to link to or send the input file you used, the full pandoc command line, and the result, perhaps that would help to diagnose this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#3322 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIBPC5y_XWdAOxmynxd9IR91M9Y5PLuUks5ruS0cgaJpZM4LSGcj>.
|
Sorry! I sent you the wrong command:-
pandoc --filter pandoc-citeproc -s -S --normalize -f markdown -t docx -o zzz.docx zzz.txt
Produces this (corrupt) file
From this text file
Hope that helps.
Andrew
… On 9 Apr 2017, at 20:29, Andrew Derrington ***@***.***> wrote:
I have uploaded them
this command
pandoc -s -S --normalize -f markdown -t odt -o manual.odt manual.txt
produces the attached file, which is deemed to be corrupt by my version of word.
<manual.docx>
> On 9 Apr 2017, at 20:10, John MacFarlane ***@***.*** ***@***.***>> wrote:
>
> @andrewderrington <https://github.com/andrewderrington> The file I converted was MANUAL.txt from the pandoc repository.
> If you'd like to link to or send the input file you used, the full pandoc command line, and the result, perhaps that would help to diagnose this.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub <#3322 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIBPC5y_XWdAOxmynxd9IR91M9Y5PLuUks5ruS0cgaJpZM4LSGcj>.
>
|
Sorry! I sent you the wrong command:-
pandoc --filter pandoc-citeproc -s -S --normalize -f markdown -t docx -o zzz.docx zzz.txt
Produces this (corrupt) file
From this text file
Hope that helps.
Andrew
On 9 Apr 2017, at 20:29, Andrew Derrington ***@***.*** ***@***.***>> wrote:
I have uploaded them
this command
pandoc -s -S --normalize -f markdown -t odt -o manual.odt manual.txt
produces the attached file, which is deemed to be corrupt by my version of word.
<manual.docx>
> On 9 Apr 2017, at 20:10, John MacFarlane ***@***.*** ***@***.***>> wrote:
>
> @andrewderrington <https://github.com/andrewderrington> The file I converted was MANUAL.txt from the pandoc repository.
> If you'd like to link to or send the input file you used, the full pandoc command line, and the result, perhaps that would help to diagnose this.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub <#3322 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIBPC5y_XWdAOxmynxd9IR91M9Y5PLuUks5ruS0cgaJpZM4LSGcj>.
>
#zzzzzzzzz zzzzzzz zzzzzzzz zzz zzzz
##zzzzzzzzzzzz
zzzz zzzzzzzz zzzzz zzz zzzz zzz zzzzzzzzz zz zzzzzzz z zzzzzzzzzzzzz, zz-zzzz, zzzz-zzzzzzzz, zzz-zzzz, zzzzzzz zzzzzzzzzzz zzzzzzzz zzz zzzz. zzzz z zzzzzzzz zzzzz zzzz zzzzzz zzzzzzzzzzzzz zzz zzzzzzzz zz zzzzzzzzzz zzzzzzzz, zzzzzzz zzzzzzz zzz zzzzz zzzzzzzz zzz zz zzzzzzzz zzzzzzzzzz zz zzzzzzz zzz zzzz zzzzzzzzzzzzzz zz zzzz zz zzzzzzzzzzzz zzzzzzzz.
zzz zzzzz zz z zzzzzzz zzzzzzzz zzzzzzz zz zzz zzzzzzz, zzz zzzzzz zz zzzzzzzzzzz zzz zzzzz zzzzzzzzzzz zz zzzzzzzzz zzz zzz zzzzzzz zz zzz zzzzzzz zzzzzzzzzzz.
zzz zzzzzz zz zzzzzzzzzzz zz z zzzzzzz zzzzzzzz zz zzzzzzzzz zzzzzzz zzz zzzz zzzzz zz zzzzzzz zzzzzzzzzzz zz zzzz zz zzzzzz zz zz zzzzzzz zzzzz zzzzzzzzzzz, zzzzzzz zz zz zzzzzzzzzzzzzz zz z zzzzzzz, zzzzzzzzzzzzzz zz z zzzz, zz zzzzzzzzzzz zz z zzzzzzzzz zzz zzzzz zzzzzzzz. zzz zzzzz zz zzzz zzzzzzzzzzz zzzzzzz zz zzz zzzzzz zz zzzzzzzzzzz zzz zzzzz zzzzzzzz zzzzzzzzzzz zz zzzzzzzzz. zzzzzzzz zzzzzzzzzzz zzzzz zzzz zzzzzzzzzz zzz zzzz zzzzzz zz zzz zzzzzzzz zz zzzz zzzzz zzz zz zz zzzzz zzz zzzzzzz zzzzzzzz zzzz. zzz zzzzzzz zzzzzzzz zz zz zzzz zzz z zzzzzzzz zzzz zzzzz zzzzzzz zzzzzzzzzzz zz, zzz zzzz zzzzzzzzzzz zzzz, zzz zzzzzzzzzz zzzzzzzzzzzzz.
zzzzzzz zzzzzzzzzzz zzzzzz zz zzzzzzz zzzz z zzz zzzz zz zzzzzzz zzz zzzzzzzzzz, zz zzzz zzzzzz zzzzzzzz, 2.45 zzzzzzz zzzz zzzzz zz zzz zzzz zz zzz zzz. zzzzzzzz zzzzzzzz zzzzzzz zzzzzzzzzzzzz zzzz zzzzzz zz zzzzzzz zzz zzzz zzzzzzzz zz zzzzz, zzzz zzzzz zzzzzzzzzzzzzzz zzz zzzzzzzz zzz zzzzzzz zzzz zzzzzzzzz zzzzzzzz zz zz zzzz zzzzzzz zzzzzzzzz. zzz zzzzzzz zzzzzzz zzzzzzzzzzz zz zzzzzz zzzzzzz zzzz zzzzzzz zz zzzzz, zzzzzz zzzzzzzzzz, zzzz zz zz zzzzzzzzz zzzzzzzz. zzzz zzzzz zzzz zzzz zzzzzzz zzzzzzz zzzzz zzzzzzz zzzz zzz zzzzzzzzzzz zzz zzz zzzzzzzzzz zzzzzzzzz, zz zzzzz zzz zzzzzz zzzzzzz zz zzz zzzzzzz zzzzzzzzzzz zzzzzzzzz zzzzz zz zzzzzzzzzz, zz zz zzzzzz zzzz zzz zz zzzz zz zzzz zzzz zz zz zzz zzzz zzzzzzzzzz zz zzz zzzzzzz zzzzz zz zzzz zzzz zzzzzzz zzz zzzzzzzzz zz zzzzzzzz. zzzzzzzzz, zzzzzzzzzz zzz zzzzzz zz zzzzzzz zzzzz zz zzzz zzzzzz zzzz z zzzzz zzzzzzz zzzzzzz zzz zz zzzzzz, zzz zz zzzz zzzzzzzzz zzz zzzz zz zzzz zzzzzzzzzz.
##zzz-zzzz, zzzz-zzzzzzzz zzzzzzzzzz (zzzzz)
zz zzzz zzzzzzzzz z zzz-zzzz zzzzzzzzzz zzzzzzzzz, zzzzz, zzzz zzzzzzzz zzz zzzzzzzzz zzzzzzzzz zz zzz zzzzzz zz zzzzzzz zzzz zzzzzz zzzzzzzz zzzzzzzzzzz zz zzz zzzz. zzzzz zzzzzzzz zzzz zzzzzzzz zzzz zz zzzzzz zzzzzzzzzzz zz zzzzzz z zzzzzzzzz zzzzzzz, zzzzz zzzzzzzz zzz zzz zzzzzzzzzz zz zzz zzzzzzzzzz. zzzz zz zzzzzzzz zzzz zzzzzzzz zzzzzzz zzz zzzzz zzzzzzzzzzz zzzzz zzz zzzzzzz zzzzzzz zzzzzzzzzzz zz zzzzzzzz zzz zzzzzzzzzz zzzz zzzzz.
zzzz zzz zzzzzzzzzzzzzz zz zz zzzzz, zzzzz zzzz zz zzzz zzzzz. zz zzzzzzzz zzz zzz 'zzz zzz' zzzz zzzzz zz zzzzzzz zzzzzzzzzzzz zzzzz zz zz zzz zz £20. zzzzz zzzz zz zzzzzz-zzzzz. zzz zzzzzzzzzzzzzz zz zzz zzzzzzzzzz zz z zzzzzz zzzzzzzzzzzzz zzzzzzz, zz zzzzz zzz zzzzzzz zzzz zzz zzzz zz zzzzzzz zzzzzzzzzzz zzz zzzz zzzzz zzzz zz zzzzzzzzzzzz zzz zzzzzzzzzz zzzzzzz zzzzzzzzzzz zzzzzz zzz zzzzz zzzz. zzzz zzzzzzz zzzz zzz zzzzz zzzzzzz zz zzzzzzzzzzz zzzz zz zzzzzz zz zzzzzzzz zzz zzzzzzzzz zzzzzzz.
##zzzzzzzzz
- zz zzzzzzzz zzzz zzz zzzzzzzz zzzzz zz zzzz zz zzzzzzzzzz zzzzzzzz zz zzzzzzz zzzzz zzzzzzzzz zzz zz zzzzzz zzzz zzzzz zzzzzzz zzzzzzzz zzzz. zzzz zzzzz zzzzzz z zzzz zzzzzzzz zzz zzzzzzzzzzz zzzzzzz zzzzzzzzzz zz zzzzzz zzzzzzzz.
- zzz zzzzzzzz zzzzz zzzz zz zzzzzzzz zz zzzzzz zzzzzzzzzzz zzzzzzzz zzzzzzzzzz zzzzzzzzzz zz z zzz zzzz zzzzz zzzzzzzz zzz zzzzzzzz zzz zzzzz zzzzzzzzzzzzzz zzzzzzzz zzzzzzz zz zzz zzzzz zzzz zzzzz zz zzzzz zzzz zzz zzzzzzzz.
- zzz zzzzzzzz zzzzz zz z zzzzzzzz zzz zzzzz zzzzzzz zzzzzzz. zzzzz zz zzzz zzzzzzz zzzzzzz zzz zzz zzz zzzzz zzzzzz (zzzz zz zzzz zzzzz zzz zzzzz).
- zzzzz zzzz zz zzzz zz zzzzzzzzz zzzzz zzzzzz zzzzzzz zzzzzzzzz zzz zzz *zzzz zzzzz*. zz zzzzzzzz zzzz zzzz zzzzzzzz zzzzzz zzzzzz zzzzzzz zzzzzzzzz zzzzz zz zzzzzzzz zz zzzzz zzz zzzzzzzzz zzzzzzzzzzz zzzz zzzz zzz zzz zzzzzz zzzzzzzz zzzz zz zzzzz zzzzzzzzz zzz zzzz.
|
@andrewderrington when I try that command
on zzz.txt with the contents at the end of your post above, I get
using the "corrupt" reference.docx you uploaded earlier, I also got a docx which opened without problems in Word. I could not open the |
OK, I can reproduce that behaviour and my behaviour. With the reference.docx files I sent you, which have been saved but not edited, if I refer to them explicitly by including " --reference pathname" in the command, I get an openable .docx file but if I rename them as reference.docx and place them in the pandoc directory I get errors. It turns out that if the reference.docx file is called reference.docx and is stored in the pandoc directory I get an unreadable output file whether or not I refer to it explicitly. It's OK to call it reference.docx if it's stored elsewhere. Here are the commands:- These ones produce readable .docx files:- And these ones produce unreadable .docx files. zzz.14.7.2.noref.docx |
It took me a long time to work the above out because there seems to be a memory effect. Once I have produced an unreadable output file, commands and reference.docx files that before had produced readable .docx file produce unreadable .docx files. It seems that I can restore the ability to produce readable .docx files by running pandoc with without referring to a reference.docx file and without a file called reference.docx in the pandoc directory. |
That's very helpful, I think I finally see what is going on here. |
Further to this discussion and resolving this bug: If I place a modified docx template 'reference.docx' into the .pandoc directory, the resulting Word file will be corrupted. This corruption of the generated output occurs even if I concurrently place another docx template in another directory, and explicitly direct Pandoc to use that template (reference-docx='/Users/MyName/.pandoc/templates/reference.docx' My current workaround is not to place a 'reference.docx' file into the default directory at all. Rather, I define a 'reference.docx' in a different location. It is fine, for example, in a subdirectory named "Templates" within the .pandoc default directory, or in the same directory as the the document being converted by Pandoc. The important thing is not to have a 'reference.docx' in the ~/.pandoc directory. I hope this is of help. |
@talazem This is probably fixed in pandoc 2.0, which is currently only available in the nightly-builds. |
I created a reference.docx with
pandoc --print-default-data-file reference.docx > reference.docx
.Using this to generate a document with pandoc works fine.
I then opened the reference.docx file, dirtied it (by replacing the 'e' in Hello World with another 'e', and resaved. (reference.docx)
Now generating a document with pandoc produces a malformed docx (test.docx): when opening in Word it says, "The Open-XML file test.docx cannot be opened because there are problems with the contents or the file name might contain invalid characters (for example, /). Details: Microsoft Office cannot open this file because some parts are missing or invalid."
The text was updated successfully, but these errors were encountered: