-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Word Document table conversion issue #20
Comments
Markdown tables seem to lack complex table functionality. Maybe complex table's can be converted to html tables (since llm & markdown viewers can usually also work with this). |
pandoc's output is bit better:
mammoth's html output is also correct (except it doesn't detect headers row properly, but it's still usable, ref - mwilliamson/mammoth.js#126). The issue seems to be this - matthewwithanm/python-markdownify#121 We can use the mentioned workaround. We already have pandas in deps. It will give output like this:
|
Thanks @brc-dd! Pandoc works perfectly in my trail. |
Not to hijack this issue but can you please explain what's the difference in terms of quality and features (so excluding programming language, funding institution and community) between this tool and Pandoc? I discovered it recently and my first thought was indeed, what does it do better than what I already know and use, namely pandoc or soffice, and why are those not contributions to such existing FLOSS projects? Edit: seems there is audio transcription, not sure what's the use case in this context though. |
markdown is not suitable for complex table, maybe xml is a good choice. |
This library is great. It would be even more useful if the table conversion is accurate with merged cells.
With the table inside this docx file
I got the parsing results as below:
After rendering in markdown, it's like
The text was updated successfully, but these errors were encountered: