-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert DOCX to Markdown: Ordered List in Markdown keeps increasing across headings #10258
Comments
When one creates this sort of file with Word, it uses different There may be something I'm not understanding right about how these lists are to be encoded, or maybe there's an issue with LibreOffice. See also #7895. |
OK, I think I see what is happening in this case. The list with I think that's why the number resets; because Word interprets this as a new sublist of a higher-level list (the headings). |
Word doesn't seem to use |
Anyway, the fix should involve modifying the code in the docx reader that tracks continuing lists and sets the start number accordingly, and making sure that the number is reset when higher-level list item in the same numId series is encountered. |
@jgm Thank you for the quick response. Originally I got this issue from Google Document: I exported a Google document to Docx, converted to Markdown with Pandoc. Got the issue. Then I tried to remove irrelevant parts (of the Docx) to make the reproducible file for the bug, using LibreOffice Writer (latest version). So I guess this issue is not LibreOffice's. |
We do have code for list items that does what I say above: The problem is that we don't have anything similar for headings, which we don't parse with these fields. The heading element is created here: Somewhere in the code path leading up to this, we need to modify |
+ Remove ListItem constructor from BodyPart. + Changed numbered field of ParagraphStyle to a Maybe Number. + Add Number type to store numbering information. This makes sense because headings can have numbering information, and we sometimes need to know what it is (#10258).
some ideas in issue10258 branch. |
@jgm Hope this issue is simple and can get fixed in the next iteration of Pandoc! 😀 My original intention was to export Google Document into Markdown (Github flavor) in order to import to Obsidian. Along the way, there were some minor issues such as: without the flag |
I think it is now fixed (the fix will be in the next release). For the table issue you mention, try setting |
Explain the problem
In Microsoft Word (DOCX), when starting a new heading (in this specific case: Heading 4), make an ordered list (or numbered list). Definitely, the starting number is reset expectedly. But when converting from DOCX to Markdown, using Pandoc, the numbered list keeps continuing, i.e. keep increasing across headings! This is unexpected.
Expectation: When a new heading starts, the ordered list should be reset (in Markdown). This is to keep it consistent with the DOCX version.
Reproducibility
pandoc --wrap=none --extract-media=./ -f docx -t gfm input.docx -o output.md
The text was updated successfully, but these errors were encountered: