fix: extract malformed tool calls from GLM-4.7 reasoning blocks #6883

ramarivera · 2026-01-04T23:33:26Z

Summary

This PR fixes a bug where GLM-4.7 occasionally outputs tool call XML tags directly within the reasoning_content thinking block instead of using the proper separate tool_calls field. This malformed output causes tool calls to not be executed properly.

Problem Statement

GLM-4.7 with interleaved thinking mode sometimes "leaks" tool call syntax into the reasoning/thinking block:

Example Malformed Output

Thinking:
<invoke name="bash">
  <command>bun test</command>
  <description>Run tests</description>
</invoke>

Or with MCP tools:

Thinking:
<tool_call>pal_thinkdeep<arg_key>step</arg_key><arg_value>...</arg_value>...</tool_call>

Key Issue: These malformed tool calls in reasoning_content are stored as text and never executed. The client receives them as thinking text rather than executable tool calls.

Root Cause Analysis

Enhanced Thinking Mechanism: GLM-4.7 implements a "think before acting" mechanism that sometimes causes tool call syntax to leak into the thinking block
Interleaved Thinking Complexity: When the model thinks before each tool call, there's a higher chance of thinking content "leaking" tool call syntax
API Processing: The OpenAI-compatible SDK looks for function_call items in response output, but malformed tool calls in reasoning_content are just stored as text and never executed
ProviderTransform Behavior: When interleaved.field === "reasoning_content", the transform moves reasoning text but doesn't extract tool calls from malformed reasoning

Session Evidence

Real-world example from a development session:

User correction: "you sent a tool call in a thinking block, try again"
Assistant response contained tool invocation XML in reasoning block
pal_thinkdeep is an MCP server tool, NOT a built-in reasoning mechanism
Key insight: Tool invocations should NEVER appear in thinking blocks—this indicates model output malformation requiring client-side sanitization

🚨 Alert: Sanitized Session Logs for GLM-4.7 Malformed Tool Calls

📢 Callout: These logs prove the issue where GLM-4.7 embeds malformed XML tool calls in reasoning blocks. The fix extracts and parses these into proper tool-call parts, as shown in the test cases. All sensitive data (paths, IDs) has been sanitized with placeholders.

🔍 View Sanitized Session Logs Hierarchy

ramarivera_glm4.7_interleaved_thinking_fix/
└── session_logs/
    └── <session-id>/
        ├── msg_example.json
        │   {
        │     "id": "msg_b4c2bb883001Yd3E3BGGMOn1tK",
        │     "sessionID": "<session-id>",
        │     "role": "assistant",
        │     "time": {
        │       "created": 1766509492355,
        │       "completed": 1766509558418
        │     },
        │     "parentID": "<msg-id>",
        │     "modelID": "glm-4.7",
        │     "providerID": "zai-coding-plan",
        │     "mode": "build",
        │     "agent": "build",
        │     "path": {
        │       "cwd": "<project-root>",
        │       "root": "<project-root>"
        │     },
        │     "cost": 0,
        │     "tokens": {
        │       "input": 32,
        │       "output": 506,
        │       "reasoning": 0,
        │       "cache": {
        │         "read": 163403,
        │         "write": 0
        │       }
        │     },
        │     "finish": "stop"
        │   }
        └── parts/
            └── msg_<msg-id>_1/
                └── part_example.json
                    {
                      "id": "<part-id>",
                      "sessionID": "<session-id>",
                      "messageID": "<msg-id>",
                      "type": "reasoning",
                      "text": "<tool_call>pal_thinkdeep<arg_key>step</arg_key><arg_value>Reviewing Phase 1.8: Workflow Inference implementation for architecture and code quality assessment.\n\nKey implementation files:\n- <project-root>/packages/portal/scripts/generate/workflow/inference.ts\n- <project-root>/packages/portal/scripts/generate/workflow/manifest.ts\n- <project-root>/packages/portal/scripts/generate/workflow/embeddings.ts\n- <project-root>/packages/portal/scripts/generate/workflow/orchestrator.ts\n- <project-root>/packages/portal/scripts/generate/workflow/types.ts\n- <project-root>/packages/portal/scripts/generate/workflow/workflow.test.ts\n\nOpenSpec context: Phase 1.8 is defined in openspec/changes/add-grimoire-portal/tasks.md\n\nCurrent implementation focus: LLM-driven workflow inference with manifest persistence and embeddings integration.\n\nKey concerns to assess:\n1. Service architecture and separation of concerns\n2. Mocking strategy and test quality\n3. Type safety and TypeScript practices\n4. Integration readiness with shared-embeddings package</arg_value><arg_key>step_number</arg_key><arg_value>1</arg_value><arg_key>total_steps</arg_key><arg_value>4</arg_value><arg_key>next_step_required</arg_key><arg_value>true</arg_value><arg_key>findings</arg_key><arg_value>I am reviewing the Phase 1.8: Workflow Inference implementation across multiple service files. The workflow inference service (inference.ts) appears to be the core orchestrator that calls LLM services, reads volume content, and produces workflow manifests. The manifest service (manifest.ts) handles JSON file persistence. The embeddings service (embeddings.ts) uses LanceEmbeddingStore from @grimoire/shared-embeddings to update workflow relationships. The orchestrator (orchestrator.ts) coordinates the full pipeline flow. Types are defined in workflow/types.ts and re-exported in services/types.ts.\n\nThe implementation seems conceptually sound - LLM inference → manifest write → embeddings update. However I nee...</arg_value><arg_key>step</arg_key><arg_value>Reviewing Phase 1.8 implementation</arg_value><arg_key>step_number</arg_key><arg_value>1</arg_value><arg_key>total_steps</arg_key><arg_value>4</arg_value><arg_key>next_step_required</arg_key><arg_value>true</arg_value>",
                      "time": {
                        "start": 1766509536091,
                        "end": 1766509558307
                      }
                    }

Solution Architecture

Location: packages/opencode/src/provider/transform.ts

Added GLM-specific normalization in ProviderTransform.normalizeMessages(), following existing patterns for Claude and Mistral normalization:

Detect malformed tool calls in reasoning_content using regex pattern matching
Extract and parse embedded tool call XML (<tool_call> and <invoke> tags)
Remove tool call artifacts from reasoning text
Add extracted calls as proper tool-call parts in the message
Preserve clean reasoning text without tool call artifacts

Testing

Comprehensive test suite in packages/opencode/test/provider/test_glm47_thinking_fix.test.ts:

✅ Single tool call XML in reasoning - Extracts properly formatted bash command from malformed reasoning
✅ Multiple tool calls in reasoning - Extracts all tool invocations
✅ MCP tools in reasoning - Handles pal_thinkdeep and other MCP server tools correctly
✅ Properly formatted responses - Preserves existing structure without modification

Test Coverage

Real-world malformed <invoke name="bash"> patterns
Complex <tool_call> with multiple arguments
Mixed reasoning content with and without tool calls
Regression tests for properly formatted responses

Risk Assessment

Low Risk:

Defensive sanitization - only modifies content containing malformed syntax
Non-breaking - preserves existing behavior for properly formatted outputs
Isolated - only affects GLM-4.7/GLM-4.6 via z.ai provider
Rollback easy - conditional on provider/model detection

References

Provider: Z.AI (Anthropic-compatible endpoint)
Affected Models: GLM-4.7, GLM-4.6
Pattern: Follows existing Claude/Mistral normalization patterns
Related: OpenAI-compatible SDK response handling

…ing new tests and documentation, and update prompt input placeholders.

Add GLM-specific normalization in ProviderTransform to extract tool call XML from reasoning_content and convert to proper tool-call parts. Supports both <arg_key>/<arg_value> pairs and direct child tags. Includes test cases covering single/multiple tool calls in reasoning and properly formatted responses that should not be affected.

- Add typeof checks to narrow union type before filtering - Remove any type casts for better type safety - Ensures content is an array before calling array methods - Sanitize sensitive information in investigation files (paths, usernames, session IDs)

- Remove PROPOSED_FIX.md (content migrated to PR description) - Remove session logs and example files - Investigation details now in PR description for better context

- Replace 'as any[]' and 'as any' with proper ModelMessage type from ai SDK - Extract createModel() factory function to reduce boilerplate duplication - Use Provider.Model type for proper type safety throughout tests - Keep type narrowing for runtime safety checks All 4 tests pass with full TypeScript type coverage.

- packages/opencode/src/provider/transform.ts - packages/opencode/test/provider/test_glm47_thinking_fix.test.ts Removed unnecessary explanatory comments, replaced let with chained const assignments, avoided any types by using casts, aligned style with project guidelines, and restored necessary test comments that prove the fix's behavior.

- Restored investigation files from a35c5bb with sanitized placeholders - Removed verbatim versions with real paths and IDs - Kept only example files with <placeholder> values for privacy These files prove the GLM-4.7 malformed tool call issue and fix.

- Obliterated PROPOSED_FIX.md as requested - Content migrated to PR description

rekram1-node · 2026-01-05T01:16:37Z

i dont think this is correct fix

ramarivera · 2026-01-05T01:38:52Z

Thanks for the input

ramarivera added 9 commits December 23, 2025 21:09

fix: Extract malformed tool calls from GLM-4.7 thinking blocks by add…

e5ef1a2

…ing new tests and documentation, and update prompt input placeholders.

Merge branch 'dev' into fix/ramarivera_glm4.7_interleaved_thinking_fix

74aad90

chore: remove investigation files from PR

ee9b201

- Remove PROPOSED_FIX.md (content migrated to PR description) - Remove session logs and example files - Investigation details now in PR description for better context

remove: PROPOSED_FIX.md file

a74d78d

- Obliterated PROPOSED_FIX.md as requested - Content migrated to PR description

ramarivera mentioned this pull request Jan 5, 2026

GLM 4.7 on Zai coding plan puts tool calls inside the thinking/reasoning tag. #6708

Open

ramarivera closed this Jan 5, 2026

ramarivera deleted the fix/ramarivera_glm4.7_interleaved_thinking_fix branch January 5, 2026 01:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: extract malformed tool calls from GLM-4.7 reasoning blocks #6883

fix: extract malformed tool calls from GLM-4.7 reasoning blocks #6883

Uh oh!

ramarivera commented Jan 4, 2026 •

edited

Loading

Uh oh!

rekram1-node commented Jan 5, 2026

Uh oh!

ramarivera commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: extract malformed tool calls from GLM-4.7 reasoning blocks #6883

fix: extract malformed tool calls from GLM-4.7 reasoning blocks #6883

Uh oh!

Conversation

ramarivera commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem Statement

Example Malformed Output

Root Cause Analysis

Session Evidence

🚨 Alert: Sanitized Session Logs for GLM-4.7 Malformed Tool Calls

Solution Architecture

Testing

Test Coverage

Risk Assessment

References

Uh oh!

rekram1-node commented Jan 5, 2026

Uh oh!

ramarivera commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ramarivera commented Jan 4, 2026 •

edited

Loading