feat: 避免清理掉LLM回复分割标记 #50

Gardelll · 2026-01-01T11:56:40Z

Summary by Sourcery

Preserve LLM response splitter markers when protecting and cleaning chat content.

New Features:

Support treating LLM response splitter markers as special blocks to avoid re-protection and unintended cleanup.

Bug Fixes:

Avoid removing bracketed content such as LLM splitter markers when the response splitter is in LLM mode.

sourcery-ai · 2026-01-01T11:56:46Z

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adjusts special-block protection and system-format cleaning logic so that LLM-specific reply split markers like [SPLIT] are preserved when split_mode is 'llm', avoiding them being re-protected or stripped by generic cleanup.

Flow diagram for filter_system_format_content cleaning with LLM split_mode

flowchart TD
    A[Start filter_system_format_content] --> B[Input content]
    B --> C[Apply system format specific cleanup]
    C --> D{Is split_mode llm?}
    D -->|Yes| F[Skip generic &#91;.*?&#93; removal to preserve markers like &#91;SPLIT&#93;]
    D -->|No| E[Remove all patterns matching &#91;.*?&#93; using regex]
    E --> F
    F --> G[Remove @<...> patterns using regex]
    G --> H[Return cleaned_content]

File-Level Changes

Change	Details	Files
Preserve LLM reply split markers when protecting special blocks and cleaning formatted content, conditioned on the response splitter configuration.	Extend the guard in protect_special_blocks to skip matches that are either existing placeholders or the [SPLIT] marker when split_mode is set to 'llm', preventing them from being wrapped again. Change the generic bracketed-text cleanup in filter_system_format_content to run only when split_mode is not 'llm', so that [SPLIT] and similar markers are not removed in LLM split mode.	`src/chat/utils/utils.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

Consider extracting the [SPLIT] marker into a shared constant or into response_splitter config instead of hardcoding the string in protect_special_blocks, so the split token stays consistent across the codebase.
The condition if "__SPECIAL_" in match or global_config.response_splitter.split_mode == "llm" and match == "[SPLIT]": relies on operator precedence; adding parentheses or assigning intermediate booleans would improve readability and reduce the chance of future mistakes when editing this logic.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider extracting the `[SPLIT]` marker into a shared constant or into `response_splitter` config instead of hardcoding the string in `protect_special_blocks`, so the split token stays consistent across the codebase.
- The condition `if "__SPECIAL_" in match or global_config.response_splitter.split_mode == "llm" and match == "[SPLIT]":` relies on operator precedence; adding parentheses or assigning intermediate booleans would improve readability and reduce the chance of future mistakes when editing this logic.

## Individual Comments

### Comment 1
<location> `src/chat/utils/utils.py:967` </location>
<code_context>
     # 在处理完回复格式后，再清理其他简单的格式
     # 新增：移除所有残余的 [...] 格式，例如 [at=...] 等
-    cleaned_content = re.sub(r"\[.*?\]", "", cleaned_content)
+    if global_config.response_splitter.split_mode != "llm":
+        cleaned_content = re.sub(r"\[.*?\]", "", cleaned_content)

     # 移除@格式：@<xxx>
</code_context>

<issue_to_address>
**question (bug_risk):** Guarding the bracket-stripping on `split_mode` is correct but may hide unexpected formats in LLM mode.

Because we now skip `"[... ]"` cleanup when `split_mode == "llm"`, any other bracketed control sequences (not just the splitter markers) will appear in the final output. If you only need to retain the splitter markers, consider narrowing the condition or regex so that only those specific patterns are preserved, while other bracketed metadata is still stripped.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

src/chat/utils/utils.py

LuisKlee

可能暂时不会合并

建议先解决这个可能存在的风险

1

sourcery-ai bot reviewed Jan 1, 2026

View reviewed changes

src/chat/utils/utils.py Outdated Show resolved Hide resolved

LuisKlee requested review from LuisKlee and tt-P607 January 2, 2026 05:07

LuisKlee requested changes Jan 2, 2026

View reviewed changes

LuisKlee self-requested a review January 2, 2026 14:04

This comment was marked as outdated.

Sign in to view

LuisKlee requested review from LuisKlee and sunbiz1024 January 2, 2026 14:32

Gardelll force-pushed the fix-llm-spliter branch from 6786d4f to 1a8f7e2 Compare January 2, 2026 16:38

Gardelll added 4 commits January 15, 2026 23:47

feat: 避免清理掉LLM回复分割标记

e5f82d1

fix: llm 分割模式完全禁用括号替换

3c1ddc3

fix: 只在按标点分割时启用三层防护系统

376c454

fix: 使用 LLM 分割时，排除 SPLIT

9bdcdfe

Gardelll force-pushed the fix-llm-spliter branch from 1a8f7e2 to 9bdcdfe Compare January 15, 2026 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: 避免清理掉LLM回复分割标记 #50

feat: 避免清理掉LLM回复分割标记 #50

Uh oh!

Gardelll commented Jan 1, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Jan 1, 2026 •

edited

Loading

Reviewer's Guide

Flow diagram for filter_system_format_content cleaning with LLM split_mode

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

LuisKlee left a comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: 避免清理掉LLM回复分割标记 #50

Are you sure you want to change the base?

feat: 避免清理掉LLM回复分割标记 #50

Uh oh!

Conversation

Gardelll commented Jan 1, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Flow diagram for filter_system_format_content cleaning with LLM split_mode

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LuisKlee left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gardelll commented Jan 1, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jan 1, 2026 •

edited

Loading