Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Placeholder Replacement in Runs: Text Gets Split, Leading to Formatting Issues #530

Open
hq-zhonger opened this issue Jan 14, 2025 · 1 comment

Comments

@hq-zhonger
Copy link

Description:
Hello Unioffice Team,

I have purchased a Unioffice license and I'm developing an automated report generation system based on placeholders ({{variable}}). However, I'm encountering an issue where Unioffice splits text into multiple runs, causing placeholders to be split into separate pieces and making it difficult to correctly replace them.

If I try to concatenate all runs into a single string, perform the replacements, and then write them back, it leads to formatting loss or broken text.

Reproduction Steps:
1.Create a Word template (template.docx) with placeholders like {{application}}.
2.Open the document with Unioffice and read the paragraph runs.
3.Observe that {{application}} might be split into multiple runs ({{app, lic, ation}}).
4.Attempt to replace the placeholder by merging runs, modifying the text, and writing it back.
5.The formatting is lost, and sometimes the text becomes corrupted.

Current Code (Go)

func (mbafc *MBAFCTemplate) FillTemplate() {
for _, para := range mbafc.template.Paragraphs() {
runs := para.Runs()
if len(runs) == 0 {
continue
}

	// **1️⃣ 逐步拼接 Run,找到完整的 `{{变量}}`**
	var buffer string
	var runIndex []int // 记录属于同一个 `{{变量}}` 的 Run 索引
	runMap := make(map[int]string)

	for i, run := range runs {
		text := run.Text()
		buffer += text
		runIndex = append(runIndex, i)
		runMap[i] = text

		// **检查是否有 `{{变量}}` 完整匹配**
		matches, _ := ExtractPlaceholders(buffer)
		if len(matches) > 0 {
			for _, placeholder := range matches {
				replacement := mbafc.getReplacement(placeholder)
				buffer = strings.ReplaceAll(buffer, "{{"+placeholder+"}}", replacement)
			}

			// **2️⃣ 按原 Run 结构写回**
			remainingText := buffer
			for _, idx := range runIndex {
				runs[idx].ClearContent()

				// ✅ **确保不会越界**
				if len(remainingText) > 0 {
					writeLen := min(len(runMap[idx]), len(remainingText))
					runs[idx].AddText(remainingText[:writeLen])
					remainingText = remainingText[writeLen:]
				}
			}

			// **3️⃣ 清空缓存,继续匹配下一个**
			buffer = ""
			runIndex = []int{}
		}
	}
}

}

Problem:
Unioffice splits text into multiple runs, which makes placeholder replacement difficult.
Merging runs to perform replacements leads to formatting loss and potential text corruption.
Expected Behavior (Similar to Python’s docxtpl)
In Python, docxtpl allows me to replace placeholders without breaking formatting:

from docxtpl import DocxTemplate

doc = DocxTemplate("template.docx")
context = {'application': 'My Application', 'version': 'V1.0'}
doc.render(context)
doc.save("output.docx")

Question:
How can I achieve similar behavior in Unioffice?
Is there a way to replace text inside runs without losing formatting, or prevent Unioffice from splitting text into multiple runs in the first place?

Screenshots and Output Files
image
image
image

Thank you! 🚀

@hq-zhonger
Copy link
Author

Update:
I have successfully resolved most of the issues, and the placeholder replacement now works correctly without losing formatting or causing duplicate content. However, a small number of paragraphs still experience encoding issues (garbled text) after replacement.

Possible causes:

Encoding issue: Some Run.Text() values may not be in UTF-8, or unioffice might handle character encoding inconsistently.
Run splitting issue: Some paragraphs might have text spread across multiple Run elements, causing problems with text concatenation or splitting.
Any insights or suggestions on how to handle these remaining encoding issues would be greatly appreciated!

func (mbafc *MBAFCTemplate) FillTemplate() {
for _, para := range mbafc.template.Paragraphs() {
runs := para.Runs()
if len(runs) == 0 {
continue
}

	var buffer strings.Builder
	var runIndex []int
	inPlaceholder := false

	for i, run := range runs {
		text := run.Text()
		buffer.WriteString(text)
		runIndex = append(runIndex, i)

		// 检测 `{{` 开头
		if strings.Contains(buffer.String(), "{{") {
			if !inPlaceholder {
				inPlaceholder = true
				runIndex = []int{i} // 记录 `{{` 开始位置
			}
		}

		// 检测 `}}` 结束
		if inPlaceholder && strings.Contains(buffer.String(), "}}") {
			fullText := buffer.String()

			// **提取 `{{变量}}`**
			start := strings.Index(fullText, "{{")
			end := strings.Index(fullText, "}}") + 2
			placeholder := fullText[start+2 : end-2] // 获取 `变量`

			// **获取替换值**
			replacement := mbafc.getReplacement(placeholder)
			fmt.Printf("替换占位符: %s -> %s\n", placeholder, replacement)

			// **替换 `{{变量}}`**
			newText := strings.ReplaceAll(fullText, "{{"+placeholder+"}}", replacement)
			fmt.Printf("newText: %s\n", newText)

			// **清空并写回原 Run**
			remainingText := newText
			for _, idx := range runIndex {
				runs[idx].ClearContent()
				if len(remainingText) > 0 {
					writeLen := min(len(remainingText), 20) // 每个 `Run` 最多写 20 字符,避免意外截断
					runs[idx].AddText(remainingText[:writeLen])
					remainingText = remainingText[writeLen:] // 剩余部分留给下一个 `Run`
				}
			}

			// **如果 `remainingText` 还有内容,写入最后一个 Run**
			if len(remainingText) > 0 && len(runIndex) > 0 {
				lastRun := runs[runIndex[len(runIndex)-1]]
				lastRun.AddText(remainingText)
			}

			// **检查是否正确写入**
			fmt.Printf("写入后的内容: %#v\n", runs[runIndex[0]].Text())

			// **重置状态**
			buffer.Reset()
			runIndex = nil
			inPlaceholder = false
		}
	}
}

}

// ✅ 新增一个 min 函数
func min(a, b int) int {
if a < b {
return a
}
return b
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant