-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandoc fails to extract image from docx #7881
Comments
I cannot reproduce this with the given example. If this is not what you meant then please provide the exact command used, as stated in the issue template. |
This is the command I am using: pandoc --toc --self-contained --extract-media ./Docs -s -o README.md -f docx -t gfm+gfm_auto_identifiers ./Docs/test.docx Command generates the md file without failing or giving any warning etc. |
This works for me as well. What is it that you expect to happen, and what is happening instead? |
Note that |
I made couple of more tests. The first example file is downloaded from web word. I created the same example on desktop word and when I run the same command it works as expected. Here is file created by desktop version |
Test <img src="./Docs//media/image.png" test Pandoc generates the above markdown file for the first example file. Please check the |
With your original test.docx, I just tried this with 2.17.1.1 and got:
The image file is present in ./Docs. Everything looks good. |
No, I'm not on Windows; this is likely a subtle issue about Windows paths. |
With your test2.docx, if I just do
then I get
and we find |
test2.docx works correct already on me whether subdirectory or not. I tried test.docx with the following command
No image is extracted and I get the following output:
|
Oh, sorry, I misunderstood. I understand now that your real issue is with test.docx. |
No worries. I am not a native speaker so it could be my writing :) Do you need any information from me to identify the problem? My guess is |
I need to get myself a Windows VM to debug this on. |
PReviously if the directory argument ended in slash, we'd get a doubled slash in the path. This may help with #7881.
I just pushed a change that may fix the double |
I can test on my computer with nightly but your last commit failed |
I noticed something about test.docx and test2.docx.
Note the leading |
This answers the question why Windows pandoc isn't using the SHA1 hash.
|
That leading / was my initial thought as well. That is why I referred to this issue #7511
This is also very strange. As a person who develops on Windows. I am pretty sure that, that is not relative. |
Well I think I can see how to fix this now! |
OK, this still needs testing on your end, but I suspect it will fix the problem. |
It is working. As a side not, when I run You can see the behavior difference by testing test.docx and test2.docx. test.docx puts images under foo, test2.docx puts images under foo/media |
When we use a path based on a sha1 hash, as in this case, it won't have a subdirectory. You only get the subdirectory when we're preserving the original path in the docx container. So this is expected. Glad it's working! |
Explain the problem.
Include the exact command line you used and all inputs necessary to reproduce the issue. Please create as minimal an example as possible, to help the maintainers isolate the problem. Explain the output you received and how it differs from what you expected.
This issue persists
Example file
test.docx
Pandoc version?
What version of pandoc are you using, on what OS?
pandoc.exe 2.17.1.1
Windows 10
The text was updated successfully, but these errors were encountered: