Description
Context
For adobe.com migration to Franklin I had a request to bulk modify a set of existing Word documents, adding a missing variant label to some blocks.
I first had a look at existing libraries to execute grep/replace actions (such as https://github.com/nguyenthenguyen/docx) but I quickly got blocked because of the "WordprocessingML fragmentation behaviour" (best explained here https://github.com/lukasjarosch/go-docx#overview).
Long story short, when editing a Word document there will be many circumstances where the text will arbitrary be fragmented in the xml structure.
Steps to reproduce
Example:
1. Create a new Word document
Type in I now now ...
Save it
Extract the XML:
[...]
<w:r>
<w:rPr>
<w:lang w:val="de-CH"/>
</w:rPr>
<w:t>I now now …</w:t>
</w:r>
[...]
All good!
2. Edit the text
Modify the text to I now know ...
(just add a k
)
Save it
Extract the XML:
[...]
<w:r>
<w:rPr>
<w:lang w:val="de-CH"/>
</w:rPr>
<w:t xml:space="preserve">I now </w:t>
</w:r>
<w:r w:rsidR="00D600D1">
<w:rPr>
<w:lang w:val="de-CH"/>
</w:rPr>
<w:t>k</w:t>
</w:r>
<w:r>
<w:rPr>
<w:lang w:val="de-CH"/>
</w:rPr>
<w:t>now …</w:t>
</w:r>
[...]
From that point on, doing a grep on know
will not work anymore.
Solution?
I don't see any simple solution, I checked Word and could not find any command to simplify/remove such fragmentation.
I think we should at minimum communicate about that limitation as it can highly impact bulk editing operations (in the sense that you cannot really ensure results are accurate).