Skip to content

Limitation of the tool on modified Word documents #2

Closed
@catalan-adobe

Description

@catalan-adobe

Context

For adobe.com migration to Franklin I had a request to bulk modify a set of existing Word documents, adding a missing variant label to some blocks.
I first had a look at existing libraries to execute grep/replace actions (such as https://github.com/nguyenthenguyen/docx) but I quickly got blocked because of the "WordprocessingML fragmentation behaviour" (best explained here https://github.com/lukasjarosch/go-docx#overview).
Long story short, when editing a Word document there will be many circumstances where the text will arbitrary be fragmented in the xml structure.

Steps to reproduce

Example:

1. Create a new Word document

Type in I now now ...
Save it
Extract the XML:

[...]
	      <w:r>
	        <w:rPr>
	          <w:lang w:val="de-CH"/>
	        </w:rPr>
	        <w:t>I now now …</w:t>
	      </w:r>
[...]

All good!

2. Edit the text

Modify the text to I now know ... (just add a k)
Save it
Extract the XML:

[...]
	      <w:r>
	        <w:rPr>
	          <w:lang w:val="de-CH"/>
	        </w:rPr>
	        <w:t xml:space="preserve">I now </w:t>
	      </w:r>
	      <w:r w:rsidR="00D600D1">
	        <w:rPr>
	          <w:lang w:val="de-CH"/>
	        </w:rPr>
	        <w:t>k</w:t>
	      </w:r>
	      <w:r>
	        <w:rPr>
	          <w:lang w:val="de-CH"/>
	        </w:rPr>
	        <w:t>now …</w:t>
	      </w:r>
[...]

From that point on, doing a grep on know will not work anymore.

Solution?

I don't see any simple solution, I checked Word and could not find any command to simplify/remove such fragmentation.
I think we should at minimum communicate about that limitation as it can highly impact bulk editing operations (in the sense that you cannot really ensure results are accurate).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions