Issue with Placeholder Replacement in Runs: Text Gets Split, Leading to Formatting Issues #530

hq-zhonger · 2025-01-14T13:15:47Z

Description:
Hello Unioffice Team,

I have purchased a Unioffice license and I'm developing an automated report generation system based on placeholders ({{variable}}). However, I'm encountering an issue where Unioffice splits text into multiple runs, causing placeholders to be split into separate pieces and making it difficult to correctly replace them.

If I try to concatenate all runs into a single string, perform the replacements, and then write them back, it leads to formatting loss or broken text.

Reproduction Steps:
1.Create a Word template (template.docx) with placeholders like {{application}}.
2.Open the document with Unioffice and read the paragraph runs.
3.Observe that {{application}} might be split into multiple runs ({{app, lic, ation}}).
4.Attempt to replace the placeholder by merging runs, modifying the text, and writing it back.
5.The formatting is lost, and sometimes the text becomes corrupted.

Current Code (Go)

func (mbafc *MBAFCTemplate) FillTemplate() {
for _, para := range mbafc.template.Paragraphs() {
runs := para.Runs()
if len(runs) == 0 {
continue
}

	// **1️⃣ 逐步拼接 Run，找到完整的 `{{变量}}`**
	var buffer string
	var runIndex []int // 记录属于同一个 `{{变量}}` 的 Run 索引
	runMap := make(map[int]string)

	for i, run := range runs {
		text := run.Text()
		buffer += text
		runIndex = append(runIndex, i)
		runMap[i] = text

		// **检查是否有 `{{变量}}` 完整匹配**
		matches, _ := ExtractPlaceholders(buffer)
		if len(matches) > 0 {
			for _, placeholder := range matches {
				replacement := mbafc.getReplacement(placeholder)
				buffer = strings.ReplaceAll(buffer, "{{"+placeholder+"}}", replacement)
			}

			// **2️⃣ 按原 Run 结构写回**
			remainingText := buffer
			for _, idx := range runIndex {
				runs[idx].ClearContent()

				// ✅ **确保不会越界**
				if len(remainingText) > 0 {
					writeLen := min(len(runMap[idx]), len(remainingText))
					runs[idx].AddText(remainingText[:writeLen])
					remainingText = remainingText[writeLen:]
				}
			}

			// **3️⃣ 清空缓存，继续匹配下一个**
			buffer = ""
			runIndex = []int{}
		}
	}
}

}

Problem:
Unioffice splits text into multiple runs, which makes placeholder replacement difficult.
Merging runs to perform replacements leads to formatting loss and potential text corruption.
Expected Behavior (Similar to Python’s docxtpl)
In Python, docxtpl allows me to replace placeholders without breaking formatting:

from docxtpl import DocxTemplate

doc = DocxTemplate("template.docx")
context = {'application': 'My Application', 'version': 'V1.0'}
doc.render(context)
doc.save("output.docx")

Question:
How can I achieve similar behavior in Unioffice?
Is there a way to replace text inside runs without losing formatting, or prevent Unioffice from splitting text into multiple runs in the first place?

Screenshots and Output Files

Thank you! 🚀

The text was updated successfully, but these errors were encountered:

hq-zhonger · 2025-01-14T15:45:21Z

Update:
I have successfully resolved most of the issues, and the placeholder replacement now works correctly without losing formatting or causing duplicate content. However, a small number of paragraphs still experience encoding issues (garbled text) after replacement.

Possible causes:

Encoding issue: Some Run.Text() values may not be in UTF-8, or unioffice might handle character encoding inconsistently.
Run splitting issue: Some paragraphs might have text spread across multiple Run elements, causing problems with text concatenation or splitting.
Any insights or suggestions on how to handle these remaining encoding issues would be greatly appreciated!

func (mbafc *MBAFCTemplate) FillTemplate() {
for _, para := range mbafc.template.Paragraphs() {
runs := para.Runs()
if len(runs) == 0 {
continue
}

	var buffer strings.Builder
	var runIndex []int
	inPlaceholder := false

	for i, run := range runs {
		text := run.Text()
		buffer.WriteString(text)
		runIndex = append(runIndex, i)

		// 检测 `{{` 开头
		if strings.Contains(buffer.String(), "{{") {
			if !inPlaceholder {
				inPlaceholder = true
				runIndex = []int{i} // 记录 `{{` 开始位置
			}
		}

		// 检测 `}}` 结束
		if inPlaceholder && strings.Contains(buffer.String(), "}}") {
			fullText := buffer.String()

			// **提取 `{{变量}}`**
			start := strings.Index(fullText, "{{")
			end := strings.Index(fullText, "}}") + 2
			placeholder := fullText[start+2 : end-2] // 获取 `变量`

			// **获取替换值**
			replacement := mbafc.getReplacement(placeholder)
			fmt.Printf("替换占位符: %s -> %s\n", placeholder, replacement)

			// **替换 `{{变量}}`**
			newText := strings.ReplaceAll(fullText, "{{"+placeholder+"}}", replacement)
			fmt.Printf("newText: %s\n", newText)

			// **清空并写回原 Run**
			remainingText := newText
			for _, idx := range runIndex {
				runs[idx].ClearContent()
				if len(remainingText) > 0 {
					writeLen := min(len(remainingText), 20) // 每个 `Run` 最多写 20 字符，避免意外截断
					runs[idx].AddText(remainingText[:writeLen])
					remainingText = remainingText[writeLen:] // 剩余部分留给下一个 `Run`
				}
			}

			// **如果 `remainingText` 还有内容，写入最后一个 Run**
			if len(remainingText) > 0 && len(runIndex) > 0 {
				lastRun := runs[runIndex[len(runIndex)-1]]
				lastRun.AddText(remainingText)
			}

			// **检查是否正确写入**
			fmt.Printf("写入后的内容: %#v\n", runs[runIndex[0]].Text())

			// **重置状态**
			buffer.Reset()
			runIndex = nil
			inPlaceholder = false
		}
	}
}

}

// ✅ 新增一个 min 函数
func min(a, b int) int {
if a < b {
return a
}
return b
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Placeholder Replacement in Runs: Text Gets Split, Leading to Formatting Issues #530

Issue with Placeholder Replacement in Runs: Text Gets Split, Leading to Formatting Issues #530

hq-zhonger commented Jan 14, 2025

hq-zhonger commented Jan 14, 2025

Issue with Placeholder Replacement in Runs: Text Gets Split, Leading to Formatting Issues #530

Issue with Placeholder Replacement in Runs: Text Gets Split, Leading to Formatting Issues #530

Comments

hq-zhonger commented Jan 14, 2025

hq-zhonger commented Jan 14, 2025