You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For adobe.com migration to Franklin I had a request to bulk modify a set of existing Word documents, adding a missing variant label to some blocks.
I first had a look at existing libraries to execute grep/replace actions (such as https://github.com/nguyenthenguyen/docx) but I quickly got blocked because of the "WordprocessingML fragmentation behaviour" (best explained here https://github.com/lukasjarosch/go-docx#overview).
Long story short, when editing a Word document there will be many circumstances where the text will arbitrary be fragmented in the xml structure.
Steps to reproduce
Example:
1. Create a new Word document
Type in I now now ...
Save it
Extract the XML:
[...]
<w:r>
<w:rPr>
<w:lang w:val="de-CH"/>
</w:rPr>
<w:t>I now now …</w:t>
</w:r>
[...]
All good!
2. Edit the text
Modify the text to I now know ... (just add a k)
Save it
Extract the XML:
From that point on, doing a grep on know will not work anymore.
Solution?
I don't see any simple solution, I checked Word and could not find any command to simplify/remove such fragmentation.
I think we should at minimum communicate about that limitation as it can highly impact bulk editing operations (in the sense that you cannot really ensure results are accurate).
The text was updated successfully, but these errors were encountered:
Yes, the tool works on the XML level and sometimes words are split over multiple XML tags in which case the search and replace doesn't find it. I think it's only an issue with replace, not with replace-links.
So yes if you're using replace to replace text you need to double check the result for now. I think it's potentially possible to fix this, by mapping the text to a single string for the search and then mapping the replacement back to the original XML tags, but that's not trivial.
Replace will now work across multiple tags in the .xml file that
contains the source of the .docx file. The replacement is spread across
those tags on output.
Additional unit tests added.
Fixes#2 and #11
Context
For adobe.com migration to Franklin I had a request to bulk modify a set of existing Word documents, adding a missing variant label to some blocks.
I first had a look at existing libraries to execute grep/replace actions (such as https://github.com/nguyenthenguyen/docx) but I quickly got blocked because of the "WordprocessingML fragmentation behaviour" (best explained here https://github.com/lukasjarosch/go-docx#overview).
Long story short, when editing a Word document there will be many circumstances where the text will arbitrary be fragmented in the xml structure.
Steps to reproduce
Example:
1. Create a new Word document
Type in
I now now ...
Save it
Extract the XML:
All good!
2. Edit the text
Modify the text to
I now know ...
(just add ak
)Save it
Extract the XML:
From that point on, doing a grep on
know
will not work anymore.Solution?
I don't see any simple solution, I checked Word and could not find any command to simplify/remove such fragmentation.
I think we should at minimum communicate about that limitation as it can highly impact bulk editing operations (in the sense that you cannot really ensure results are accurate).
The text was updated successfully, but these errors were encountered: