Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 16, 2026

Metrics Collector workflow was failing with MCP Gateway v0.0.59 schema validation errors (runs #20-26, Jan 9-15). Issue #9898 reported the problem as unresolved.

Investigation Result

No code changes required. The fix was already deployed in commit ac0750f (Jan 16, 03:03 UTC), after the reported failures but before this investigation.

Current State

  • MCP Gateway: v0.0.59 → v0.0.60 ✅
  • Schema: Migrated from command/args to container/entrypoint/entrypointArgs
  • Lock file: Up-to-date, compiles without errors ✅
# Current configuration (already correct)
"agentic_workflows": {
  "type": "stdio",
  "container": "alpine:latest",
  "entrypoint": "/opt/gh-aw/gh-aw",
  "entrypointArgs": ["mcp-server"],
  "mounts": ["/opt/gh-aw:/opt/gh-aw:ro"],
  "env": {"GITHUB_TOKEN": "\${GITHUB_TOKEN}"}
}

Timeline

Closing as no action required. The reported failures occurred before the fix was deployed.

Original prompt

This section details on the original issue you should resolve

<issue_title>🚨 Metrics Collector - Infrastructure Failure (30% Success Rate)</issue_title>
<issue_description># Metrics Collector - Critical Infrastructure Failure

Status Summary

Priority: P1 - CRITICAL
Success Rate: 30% (3 success / 7 failures in last 10 runs)
Health Score: 30/100 🚨
Impact: Historical metrics collection severely limited, trending data unavailable

Failure Pattern

Recent Runs (Mostly Failed)

Pattern: 7 consecutive recent failures, 70% failure rate

Previous Issue Status

Issue #9898: Was closed but workflow continues to fail
Action Required: Verify if fix was deployed or issue needs reopening

Impact Assessment

Critical Impact

  • Historical Metrics: Daily metrics not being collected properly
  • Trend Analysis: Other meta-orchestrators cannot track workflow health trends
  • Data-Driven Decisions: Lack of metrics prevents optimization decisions
  • Shared Memory: Incomplete metrics at /tmp/gh-aw/repo-memory/default/metrics/

Affected Systems

  • Workflow Health Manager: Cannot calculate MTBF or track success rate trends
  • Agent Performance Analyzer: Missing workflow execution data for analysis
  • Campaign Manager: Cannot assess campaign health metrics
  • Dashboard: Stale metrics data (last successful collection: 2026-01-08)

Configuration Analysis

Workflow: .github/workflows/metrics-collector.md

engine: copilot
tools:
  agentic-workflows:
  github:
    toolsets: [default]
  repo-memory:
    branch-name: memory/meta-orchestrators
    file-glob: "metrics/**"
timeout-minutes: 15

Known Issues

Issue #9898 Context

  • Status: Closed (but workflow still failing)
  • Possible Causes:
    • Fix not deployed to production
    • Fix incomplete or didn't address root cause
    • New regression introduced after fix
    • MCP Gateway breaking change

Historical Data Gap

  • Last successful metrics: 2026-01-08
  • Current metrics show "filesystem_analysis" only (no GitHub API data)
  • Limitations documented: "No GitHub API access - cannot retrieve workflow run data"

Investigation Required

Immediate Actions

  1. Review Issue [P1] Metrics Collector Failing - MCP Gateway Schema Validation Error #9898 - Check what fix was applied
  2. Verify deployment - Ensure fix reached production
  3. Check MCP Gateway - Validate configuration and connectivity
  4. Test tool access - Verify agentic-workflows tool availability
  5. Analyze recent logs - Identify current failure pattern

Likely Root Causes

  • ✅ MCP Gateway configuration error
  • ✅ GitHub API authentication failure
  • ✅ Agentic-workflows tool not accessible
  • ✅ Repo-memory write permissions issue
  • ✅ Tool timeout or rate limiting

Recommended Fix Approach

  1. Reopen Issue [P1] Metrics Collector Failing - MCP Gateway Schema Validation Error #9898 if fix wasn't effective
  2. Test metrics collection with minimal tool configuration
  3. Verify GitHub MCP setup and authentication
  4. Check repo-memory branch permissions and access
  5. Compare with working workflows using similar tools

Success Criteria

  • Workflow runs successfully with >80% success rate
  • Daily metrics collected and stored in /tmp/gh-aw/repo-memory/default/metrics/daily/
  • latest.json updated with actual workflow run data (not just filesystem analysis)
  • Other meta-orchestrators can access historical trends

Related Issue: #9898
Workflow Run: https://github.com/githubnext/gh-aw/actions/runs/21053929777
Generated: 2026-01-16T02:53:12Z
Generated by: Workflow Health Manager

AI generated by Workflow Health Manager - Meta-Orchestrator

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Fix metrics collector infrastructure failure with low success rate Metrics Collector - No changes needed, already resolved Jan 16, 2026
Copilot AI requested a review from mnkiefer January 16, 2026 03:40
@pelikhan pelikhan closed this Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🚨 Metrics Collector - Infrastructure Failure (30% Success Rate)

3 participants