Skip to content

[WIP] Optimize MCP tool response payloads to reduce token usage#11958

Closed
Copilot wants to merge 3 commits intomainfrom
copilot/optimize-mcp-tool-payloads
Closed

[WIP] Optimize MCP tool response payloads to reduce token usage#11958
Copilot wants to merge 3 commits intomainfrom
copilot/optimize-mcp-tool-payloads

Conversation

Copy link
Contributor

Copilot AI commented Jan 26, 2026

MCP Tool Response Payload Optimization

This PR optimizes MCP tool response payloads to reduce token usage for workflows using GitHub MCP tools.

Completed

  • Understand repository structure and MCP integration
  • Identify that gh-aw is a compiler, not a runtime interceptor
  • Determine that response filtering must be done via configuration, not code interception
  • Locate GitHubToolConfig structure in tools_types.go
  • Add Options field to GitHubToolConfig to pass configuration to GitHub MCP server
  • Add parseGitHubToolOptions function to parse options configuration
  • Create comprehensive token optimization documentation
  • Update GitHub MCP server skill with token usage warnings
  • Build and test changes successfully

Remaining

  • Run final validation (make agent-finish)

Implementation

Since gh-aw cannot intercept runtime responses from the GitHub MCP server (it only generates workflow configuration), this PR:

  1. Adds configuration support for future response-mode options that the upstream GitHub MCP server can implement
  2. Provides comprehensive documentation advising workflows on best practices to reduce token usage by 30-50%
  3. Updates skills/github-mcp-server with warnings about high-token-usage tools

Key Changes

Code Changes:

  • Added GitHubToolOptions struct with ResponseMode field (future-proofing)
  • Added parsing support in parseGitHubTool() function
  • Modified tools_types.go and tools_parser.go

Documentation Changes:

  • Created /docs/src/content/docs/guides/optimizing-token-usage.md with:
    • Best practices for reducing token usage
    • Specific guidance on high-token tools (list_code_scanning_alerts, list_pull_requests)
    • Workflow examples showing efficient patterns
    • Expected savings: 30-50% reduction
  • Updated skills/github-mcp-server/SKILL.md with token optimization warnings

Token Optimization Strategies Documented

  1. Use targeted queries instead of listing all results
  2. Limit result counts through pagination
  3. Request specific fields in prompts
  4. Avoid listing when direct access is possible
  5. Post-filter results to discard unnecessary metadata
  6. Consider if code_security toolset is needed at all
Original prompt

This section details on the original issue you should resolve

<issue_title>[Code Quality] Optimize MCP tool response payloads to reduce token usage</issue_title>
<issue_description>### Description

MCP structural analysis reveals that two GitHub MCP tools return bloated payloads, consuming excessive tokens and degrading performance. list_code_scanning_alerts returns 24K tokens (97KB) and list_pull_requests duplicates repository objects in every PR result.

Current State

Observed payload sizes (from MCP analysis):

  • list_code_scanning_alerts: 24,000 tokens (97KB) - largest payload
  • list_pull_requests: Heavy due to duplicated repository objects in each PR
  • Efficient tools for comparison: list_labels, list_branches, list_workflows (minimal payload bloat)

Impact

Token Cost:

  • Every call to these tools consumes 5-20x more tokens than necessary
  • Accumulated cost across hundreds of daily workflow runs
  • Particularly expensive for workflows that call these tools multiple times

Performance:

  • Larger context windows slow down AI agent processing
  • Increased network transfer time
  • Higher memory usage for payload processing

Efficiency Gap:

  • list_labels, list_branches, list_discussions remain highly efficient
  • Code security tools lag significantly behind in efficiency

Suggested Changes

Option 1: Return Selective Fields Only

For list_code_scanning_alerts:

// BEFORE: Return full alert object (97KB)
return alerts;

// AFTER: Return only essential fields
return alerts.map(alert => ({
  number: alert.number,
  state: alert.state,
  severity: alert.rule.severity,
  description: alert.rule.description,
  location: alert.most_recent_instance.location,
  // Omit: tool details, full rule objects, extensive metadata
}));

For list_pull_requests:

// BEFORE: Duplicate repo object in every PR
return prs;  // Each PR includes full repository object

// AFTER: Return repo once, reference in PRs
return {
  repository: { /* repo details once */ },
  pull_requests: prs.map(pr => ({
    number: pr.number,
    title: pr.title,
    state: pr.state,
    // Omit duplicated repo object
  }))
};

Option 2: Add Summary Modes

Add optional mode parameter:

  • mode: "summary" → Minimal fields (default)
  • mode: "full" → Complete objects (when needed)
tools:
  github:
    toolsets: [code_scanning]
    options:
      mode: summary  # NEW: Request lightweight responses

Option 3: Pagination with Field Selection

Implement field selection in pagination:

// Allow agents to specify which fields they need
GET /repos/{owner}/{repo}/code-scanning/alerts?fields=number,state,severity

Files Affected

This issue likely requires changes in the GitHub MCP server, not gh-aw directly:

  • Upstream: github-mcp-server repository (if maintained by GitHub)
  • Local: gh-aw's MCP integration layer if it can filter responses

Investigation needed: Determine if gh-aw can implement response filtering or if upstream changes are required.

Success Criteria

  • list_code_scanning_alerts payload reduced from 24K tokens to <10K tokens
  • list_pull_requests eliminates duplicated repository objects
  • ✅ Token usage for affected workflows reduced by 30-50%
  • ✅ All existing workflows continue to function (backward compatible)
  • ✅ Documentation updated with guidance on using summary vs. full modes

Alternative: Workflow Guidance

If upstream changes aren't feasible, add workflow documentation advising:

  • Avoid list_code_scanning_alerts unless essential
  • Use targeted queries instead of full listings
  • Filter results after retrieval to minimize context usage

Source

Extracted from DeepReport Intelligence Briefing discussion githubnext/gh-aw#11897

Relevant excerpt:

MCP structural analysis confirms list_code_scanning_alerts is the largest payload (24K tokens, 97KB) and list_pull_requests remains heavy due to duplicated repo objects.

Comparison:

  • ✅ Efficient: list_labels, list_branches, list_workflows, list_discussions
  • ❌ Bloated: list_code_scanning_alerts, list_pull_requests

Priority

High - Direct impact on token costs and performance. Token spend concentration analysis shows this affects high-frequency workflows.

Implementation Estimate

Effort: 2-3 days

  • Day 1: Investigate gh-aw vs. upstream GitHub MCP server responsibility
  • Day 2: Implement response filtering/summarization
  • Day 3: Test with real workflows, measure token reduction, document changes

AI generated by Discussion Task Miner - Code Quality Improvement Agent

  • expires on Feb 9, 2026, 9:07 PM UTC

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits January 26, 2026 22:02
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Add comprehensive token usage optimization guide
- Update GitHub MCP server skill with token usage warnings
- Add Options field to GitHubToolConfig for future response-mode support
- Add parseGitHubToolOptions function to parse options configuration

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@pelikhan pelikhan closed this Jan 26, 2026
Copilot AI requested a review from pelikhan January 26, 2026 22:11
Copilot stopped work on behalf of pelikhan due to an error January 26, 2026 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Code Quality] Optimize MCP tool response payloads to reduce token usage

2 participants