Skip to content

[Code Quality] Investigate and address January 2026 agent success rate decline #12495

@github-actions

Description

@github-actions

Description

Analysis of 4,389 copilot agent tasks shows a concerning decline in success rates during January 2026. Success rate dropped from 76.2% in November 2025 to 64.8% in January 2026 - an 11.4 percentage point decrease.

Trend Data

Month Tasks Success Rate Change
Oct 2025 397 73.3% baseline
Nov 2025 884 76.2% +2.9% ✅
Dec 2025 1,348 71.6% -4.6% ⚠️
Jan 2026 1,760 64.8% -11.4% 🚨

Impact

  • ~200 additional failed tasks in January compared to November baseline
  • Consistent downward trend since November peak
  • Affects all task types - not isolated to one category

Investigation Areas

1. Recent Agent Instruction Changes

  • Review changes to agent prompts in December-January
  • Check for new constraints or guidelines that may be too restrictive
  • Analyze prompt length and complexity trends

2. Infrastructure Changes

  • Check for GitHub Actions runner updates
  • Review MCP server availability and performance
  • Analyze network/timeout issues

3. Task Complexity Increase

  • Compare average files changed per task over time
  • Analyze commit count trends
  • Review task description length and clarity

4. Tooling Changes

  • Check for linter/formatter updates
  • Review dependency version changes
  • Analyze tool availability issues

Proposed Investigation Steps

  1. Data Analysis (Day 1)

    • Pull detailed metrics for Nov 2025 vs Jan 2026
    • Compare task types, complexity, and failure patterns
    • Identify specific failure categories driving the decline
  2. Code Review (Day 2)

    • Review commits between Nov-Jan affecting agent behavior
    • Check for prompt changes, validation changes, tool updates
    • Identify configuration changes
  3. Root Cause Analysis (Day 3)

    • Correlate changes with success rate drops
    • Test hypothesis with sample task replays
    • Document findings with evidence
  4. Remediation Plan (Day 3)

    • Propose specific fixes based on root cause
    • Prioritize by impact
    • Create follow-up issues for implementation

Success Criteria

  • Root cause(s) identified with supporting data
  • Correlation between changes and success rate decline established
  • Remediation plan with specific action items created
  • Follow-up issues created for each fix
  • Success rate returns to >70% baseline within 2 weeks of fixes

Priority

Critical - 11.4% decline affects hundreds of tasks and project velocity

Source

Extracted from Copilot Agent Prompt Clustering Analysis - January 2026

Estimated Effort

3-5 days - Requires comprehensive investigation across multiple areas

Data Sources

Follow-up Required

This is an investigation task that will likely spawn multiple implementation tasks once root causes are identified.

AI generated by Discussion Task Miner - Code Quality Improvement Agent

  • expires on Feb 12, 2026, 9:15 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions