Word Index Entry Render as the HTML to WORD #10171
Replies: 2 comments 1 reply
-
I do not understand the question. |
Beta Was this translation helpful? Give feedback.
-
Hi John, I think I understand this question, since my current task is similar. I'm trying to convert as very large Word document to a Markdown format (Quarto or JupyterBook, etc) and would like to extract Word's index entries. For example, the text "Alan Kay" shows in the Word doc as Ideally, in the case of md or latex output, I would love to be able to generate a document which has the text Here's a snapshot of the XML, with a couple annotations: <w:r w:rsidRPr="00B7562C">
<w:rPr>
<w14:ligatures w14:val="standard"/>
</w:rPr>
<!-- regular text parsed fine, the index entry is immediately following this. -->
<w:t>e have been extremely lucky in our mentors. Jens cut his teeth in the company of the Smalltalk pioneers: Alan Kay</w:t>
</w:r>
<!-- first block seems to be the special { to denote the start of the entry. -->
<w:r w:rsidR="003C10BE" w:rsidRPr="00B7562C">
<w:rPr>
<w14:ligatures w14:val="standard"/>
</w:rPr>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r w:rsidR="003C10BE" w:rsidRPr="00B7562C">
<w:rPr>
<w14:ligatures w14:val="standard"/>
</w:rPr>
<!-- As best I can tell, the index definitions are all defined by `XE "(.*)"` to extract the content.
<w:instrText xml:space="preserve"> XE "Kay, Alan" </w:instrText>
</w:r>
<!-- final block seems to be the special } to denote the end of the entry. -->
<w:r w:rsidR="003C10BE" w:rsidRPr="00B7562C">
<w:rPr>
<w14:ligatures w14:val="standard"/>
</w:rPr>
<w:fldChar w:fldCharType="end"/>
</w:r> Personally, I'd even be fine with outputting the text Edit: I realize that the value of the index entry is also being able to generate the index with the appropriate links back to the source definition. I, personally, don't expect pandoc to handle things in a super generic way. Certainly in an HTML format, it would be relatively straightforward to generate named anchor tags and generate a list of anchors pointing to the sources. But I don't believe there is a universal index format. So my goal is primarily being up to extract and reformat the data for my own needs. I'd be happy to use a lua filter as long as I can figure out exactly what I need to hook into. |
Beta Was this translation helpful? Give feedback.
-
render word default index entry using HTML to Docx conversion. refer to the attached.
commet.docx
Beta Was this translation helpful? Give feedback.
All reactions