Add automatic file writing for large MCP tool outputs exceeding 16,000 tokens with compact schema descriptions #2060

Copilot · 2025-10-21T02:04:43Z

Problem

When MCP tools return very large outputs (e.g., generated documentation, data exports, or analysis reports), the content could cause issues with response handling and token limits. Previously, all tool outputs were returned as-is regardless of size.

Solution

Implemented automatic detection and file writing for large tool outputs in the safe_outputs_mcp_server. When any string field in a tool response exceeds 16,000 tokens (~64,000 characters), the system now:

Writes content to a file in /tmp/gh-aw/safe-outputs/ using SHA256 hash + .json extension as filename
Always uses .json extension since MCP tools return JSON data
Returns metadata to the agent in JSON format with compact schema description: { "filename": "hash.json", "description": "[{keys}] (N items)" }
Generates compact schema descriptions that jq and agents can understand:
- For JSON arrays: [{id, name, data}] (2000 items)
- For JSON objects: {key1, key2, ...} or {key1, ..., keyN, ...} (N keys) (truncated at 10 keys)
- For non-JSON content: text content
Modifies the safe output by replacing the large field with a file reference for audit trail

Example

Before (large content would be returned in full):

// Response for 70KB content
{ "result": "success" }
// Content embedded in safe output (problematic)

After (large content automatically handled):

// Response for JSON array with 2000 items
{
  "filename": "bb28168fe5604623b804546db0e8c90eaf9e8dcd0f418761787d5159198b4fd8.json",
  "description": "[{id, name, data}] (2000 items)"
}
// Content saved to: /tmp/gh-aw/safe-outputs/bb28168...d8.json
// Safe output contains: "[Content too large, saved to file: bb28168...d8.json]"

Additional examples:

// JSON object with many keys
{
  "filename": "07b666fe0cc9659d99a52e6ce702c0ebc711bdb0f37dba6fbdad86a311c2c881.json",
  "description": "{key0, key1, key2, ..., key9, ...} (100 keys)"
}

// Non-JSON content
{
  "filename": "180848db337c96a2ef319f810c8bae74337a5c57a29623e1e3f2968595dcafdd.json",
  "description": "text content"
}

Implementation Details

Token threshold: 16,000 tokens using 4 characters per token approximation (consistent with existing parsers)
File naming: SHA256 hash of content prevents collisions and enables content-based lookup
File extension: Always .json since MCP tools return JSON
Schema generation: Compact, human-readable schema that helps jq and AI agents understand the data structure
Location: Same /tmp/gh-aw/safe-outputs/ directory used for other artifacts
No breaking changes: Normal-sized content (<16,000 tokens) continues to work exactly as before

Testing

Added comprehensive test coverage with 4 new test scenarios:

✅ Large text content (70KB) → writes to .json file with schema description
✅ Large JSON array (187KB) → writes to .json file with array schema showing keys and count
✅ Large JSON object → writes to .json file with object schema showing keys
✅ Normal content → returns standard success response

All 568 JavaScript tests and all Go unit tests passing.

Fixes the issue where large tool outputs needed automatic file handling with meaningful schema descriptions for agent understanding.

Original prompt

Update the mcp-server command to automatically write the tool output to a file if it is larger than 16000 tokens (aproximated).

In that case, write file in the log folder (use sha of context + file extension) and return a prompt the agent specifying where the file is. Must be JSON Format like { filename: "....", description: "generated content large!" }

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…ture Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

- Implement token counting using 4 chars = 1 token estimate - Detect content exceeding 16000 tokens in tool outputs - Write large content to files in /tmp/gh-aw/safe-outputs/ with SHA256 hash filenames - Automatically detect file extension (.json, .md, .txt) based on content - Return JSON format: { filename: "hash.ext", description: "generated content large!" } - Replace large field in safe output with file reference - Add comprehensive test coverage for all scenarios Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

All tests passing and manual verification complete Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

pelikhan · 2025-10-21T02:54:56Z

@copilot the MCP tools return JSON so the generate file extension should be .json. The description field should be replaced by a compact schema of the data that the "jq" / agent can understand.

…h compact schema - Always use .json extension for all large content (MCP tools return JSON) - Replace static "generated content large!" with compact schema description - Schema describes structure: arrays show keys and item count, objects show keys - For non-JSON content, description is "text content" - Updated all tests to verify new behavior Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2025-10-21T03:03:59Z

@copilot the MCP tools return JSON so the generate file extension should be .json. The description field should be replaced by a compact schema of the data that the "jq" / agent can understand.

Updated in commit b63c8f8:

File extension is now always .json (since MCP tools return JSON)
Description field now contains a compact schema instead of "generated content large!"
- Arrays show structure and count: [{id, name, data}] (2000 items)
- Objects show keys: {key1, key2, ...} (N keys)
- Non-JSON content: text content

This makes it easier for jq and agents to understand the data structure.

github-actions · 2025-10-21T03:23:32Z

Agentic Changeset Generator triggered by this pull request.

Update CLI and MCP server documentation to reflect recent feature additions: - Add --timeout option documentation for logs command with caching details - Add --parse option documentation for audit command - Add URL support documentation for audit command (cross-repo, GitHub Enterprise) - Document continuation field in MCP server logs tool for pagination - Document large output automatic file handling in MCP server (16K token threshold) These changes document features from PRs #2066, #2064, #2060, #2058, #2052, and #2051. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Copilot AI self-assigned this Oct 21, 2025

Initial plan

3908560

Copilot AI assigned pelikhan Oct 21, 2025

Copilot started work on behalf of pelikhan October 21, 2025 02:05 View session

Copilot AI and others added 3 commits October 21, 2025 02:12

Initial exploration - understanding the safe_outputs_mcp_server struc…

4e56dc2

…ture Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Complete large content handling implementation

b4b9651

All tests passing and manual verification complete Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Update mcp-server command for large output handling~~ Add automatic file writing for large MCP tool outputs exceeding 16,000 tokens Oct 21, 2025

Copilot AI requested a review from pelikhan October 21, 2025 02:24

Copilot finished work on behalf of pelikhan October 21, 2025 02:24

Copilot started work on behalf of pelikhan October 21, 2025 02:55 View session

Copilot AI changed the title ~~Add automatic file writing for large MCP tool outputs exceeding 16,000 tokens~~ Add automatic file writing for large MCP tool outputs exceeding 16,000 tokens with compact schema descriptions Oct 21, 2025

Copilot finished work on behalf of pelikhan October 21, 2025 03:06

pelikhan marked this pull request as ready for review October 21, 2025 03:23

Add changeset for large MCP output handling feature

0777dbc

pelikhan merged commit 2ffd9c5 into main Oct 21, 2025
3 checks passed

pelikhan deleted the copilot/update-mcp-server-command branch October 21, 2025 03:26

github-actions bot mentioned this pull request Oct 21, 2025

[docs] Update documentation for features from 2025-10-21 #2071

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add automatic file writing for large MCP tool outputs exceeding 16,000 tokens with compact schema descriptions #2060

Add automatic file writing for large MCP tool outputs exceeding 16,000 tokens with compact schema descriptions #2060

Uh oh!

Copilot AI commented Oct 21, 2025 •

edited

Loading

Uh oh!

pelikhan commented Oct 21, 2025

Uh oh!

Copilot AI commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add automatic file writing for large MCP tool outputs exceeding 16,000 tokens with compact schema descriptions #2060

Add automatic file writing for large MCP tool outputs exceeding 16,000 tokens with compact schema descriptions #2060

Uh oh!

Conversation

Copilot AI commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Example

Implementation Details

Testing

Uh oh!

pelikhan commented Oct 21, 2025

Uh oh!

Copilot AI commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Oct 21, 2025 •

edited

Loading