Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 21, 2025

Problem

When MCP tools return very large outputs (e.g., generated documentation, data exports, or analysis reports), the content could cause issues with response handling and token limits. Previously, all tool outputs were returned as-is regardless of size.

Solution

Implemented automatic detection and file writing for large tool outputs in the safe_outputs_mcp_server. When any string field in a tool response exceeds 16,000 tokens (~64,000 characters), the system now:

  1. Writes content to a file in /tmp/gh-aw/safe-outputs/ using SHA256 hash + .json extension as filename
  2. Always uses .json extension since MCP tools return JSON data
  3. Returns metadata to the agent in JSON format with compact schema description: { "filename": "hash.json", "description": "[{keys}] (N items)" }
  4. Generates compact schema descriptions that jq and agents can understand:
    • For JSON arrays: [{id, name, data}] (2000 items)
    • For JSON objects: {key1, key2, ...} or {key1, ..., keyN, ...} (N keys) (truncated at 10 keys)
    • For non-JSON content: text content
  5. Modifies the safe output by replacing the large field with a file reference for audit trail

Example

Before (large content would be returned in full):

// Response for 70KB content
{ "result": "success" }
// Content embedded in safe output (problematic)

After (large content automatically handled):

// Response for JSON array with 2000 items
{
  "filename": "bb28168fe5604623b804546db0e8c90eaf9e8dcd0f418761787d5159198b4fd8.json",
  "description": "[{id, name, data}] (2000 items)"
}
// Content saved to: /tmp/gh-aw/safe-outputs/bb28168...d8.json
// Safe output contains: "[Content too large, saved to file: bb28168...d8.json]"

Additional examples:

// JSON object with many keys
{
  "filename": "07b666fe0cc9659d99a52e6ce702c0ebc711bdb0f37dba6fbdad86a311c2c881.json",
  "description": "{key0, key1, key2, ..., key9, ...} (100 keys)"
}

// Non-JSON content
{
  "filename": "180848db337c96a2ef319f810c8bae74337a5c57a29623e1e3f2968595dcafdd.json",
  "description": "text content"
}

Implementation Details

  • Token threshold: 16,000 tokens using 4 characters per token approximation (consistent with existing parsers)
  • File naming: SHA256 hash of content prevents collisions and enables content-based lookup
  • File extension: Always .json since MCP tools return JSON
  • Schema generation: Compact, human-readable schema that helps jq and AI agents understand the data structure
  • Location: Same /tmp/gh-aw/safe-outputs/ directory used for other artifacts
  • No breaking changes: Normal-sized content (<16,000 tokens) continues to work exactly as before

Testing

Added comprehensive test coverage with 4 new test scenarios:

  • ✅ Large text content (70KB) → writes to .json file with schema description
  • ✅ Large JSON array (187KB) → writes to .json file with array schema showing keys and count
  • ✅ Large JSON object → writes to .json file with object schema showing keys
  • ✅ Normal content → returns standard success response

All 568 JavaScript tests and all Go unit tests passing.

Fixes the issue where large tool outputs needed automatic file handling with meaningful schema descriptions for agent understanding.

Original prompt

Update the mcp-server command to automatically write the tool output to a file if it is larger than 16000 tokens (aproximated).

In that case, write file in the log folder (use sha of context + file extension) and return a prompt the agent specifying where the file is. Must be JSON Format like { filename: "....", description: "generated content large!" }


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI self-assigned this Oct 21, 2025
Copilot AI and others added 3 commits October 21, 2025 02:12
…ture

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Implement token counting using 4 chars = 1 token estimate
- Detect content exceeding 16000 tokens in tool outputs
- Write large content to files in /tmp/gh-aw/safe-outputs/ with SHA256 hash filenames
- Automatically detect file extension (.json, .md, .txt) based on content
- Return JSON format: { filename: "hash.ext", description: "generated content large!" }
- Replace large field in safe output with file reference
- Add comprehensive test coverage for all scenarios

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
All tests passing and manual verification complete

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Update mcp-server command for large output handling Add automatic file writing for large MCP tool outputs exceeding 16,000 tokens Oct 21, 2025
Copilot AI requested a review from pelikhan October 21, 2025 02:24
@pelikhan
Copy link
Contributor

@copilot the MCP tools return JSON so the generate file extension should be .json. The description field should be replaced by a compact schema of the data that the "jq" / agent can understand.

…h compact schema

- Always use .json extension for all large content (MCP tools return JSON)
- Replace static "generated content large!" with compact schema description
- Schema describes structure: arrays show keys and item count, objects show keys
- For non-JSON content, description is "text content"
- Updated all tests to verify new behavior

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Oct 21, 2025

@copilot the MCP tools return JSON so the generate file extension should be .json. The description field should be replaced by a compact schema of the data that the "jq" / agent can understand.

Updated in commit b63c8f8:

  • File extension is now always .json (since MCP tools return JSON)
  • Description field now contains a compact schema instead of "generated content large!"
    • Arrays show structure and count: [{id, name, data}] (2000 items)
    • Objects show keys: {key1, key2, ...} (N keys)
    • Non-JSON content: text content

This makes it easier for jq and agents to understand the data structure.

Copilot AI changed the title Add automatic file writing for large MCP tool outputs exceeding 16,000 tokens Add automatic file writing for large MCP tool outputs exceeding 16,000 tokens with compact schema descriptions Oct 21, 2025
@pelikhan pelikhan marked this pull request as ready for review October 21, 2025 03:23
@github-actions
Copy link
Contributor

Agentic Changeset Generator triggered by this pull request.

@pelikhan pelikhan merged commit 2ffd9c5 into main Oct 21, 2025
3 checks passed
@pelikhan pelikhan deleted the copilot/update-mcp-server-command branch October 21, 2025 03:26
github-actions bot added a commit that referenced this pull request Oct 21, 2025
Update CLI and MCP server documentation to reflect recent feature additions:

- Add --timeout option documentation for logs command with caching details
- Add --parse option documentation for audit command
- Add URL support documentation for audit command (cross-repo, GitHub Enterprise)
- Document continuation field in MCP server logs tool for pagination
- Document large output automatic file handling in MCP server (16K token threshold)

These changes document features from PRs #2066, #2064, #2060, #2058, #2052, and #2051.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants