Skip to content

🚨 Metrics Collector - Infrastructure Failure (30% Success Rate) #10191

@github-actions

Description

@github-actions

Metrics Collector - Critical Infrastructure Failure

Status Summary

Priority: P1 - CRITICAL
Success Rate: 30% (3 success / 7 failures in last 10 runs)
Health Score: 30/100 🚨
Impact: Historical metrics collection severely limited, trending data unavailable

Failure Pattern

Recent Runs (Mostly Failed)

Pattern: 7 consecutive recent failures, 70% failure rate

Previous Issue Status

Issue #9898: Was closed but workflow continues to fail
Action Required: Verify if fix was deployed or issue needs reopening

Impact Assessment

Critical Impact

  • Historical Metrics: Daily metrics not being collected properly
  • Trend Analysis: Other meta-orchestrators cannot track workflow health trends
  • Data-Driven Decisions: Lack of metrics prevents optimization decisions
  • Shared Memory: Incomplete metrics at /tmp/gh-aw/repo-memory/default/metrics/

Affected Systems

  • Workflow Health Manager: Cannot calculate MTBF or track success rate trends
  • Agent Performance Analyzer: Missing workflow execution data for analysis
  • Campaign Manager: Cannot assess campaign health metrics
  • Dashboard: Stale metrics data (last successful collection: 2026-01-08)

Configuration Analysis

Workflow: .github/workflows/metrics-collector.md

engine: copilot
tools:
  agentic-workflows:
  github:
    toolsets: [default]
  repo-memory:
    branch-name: memory/meta-orchestrators
    file-glob: "metrics/**"
timeout-minutes: 15

Known Issues

Issue #9898 Context

  • Status: Closed (but workflow still failing)
  • Possible Causes:
    • Fix not deployed to production
    • Fix incomplete or didn't address root cause
    • New regression introduced after fix
    • MCP Gateway breaking change

Historical Data Gap

  • Last successful metrics: 2026-01-08
  • Current metrics show "filesystem_analysis" only (no GitHub API data)
  • Limitations documented: "No GitHub API access - cannot retrieve workflow run data"

Investigation Required

Immediate Actions

  1. Review Issue [P1] Metrics Collector Failing - MCP Gateway Schema Validation Error #9898 - Check what fix was applied
  2. Verify deployment - Ensure fix reached production
  3. Check MCP Gateway - Validate configuration and connectivity
  4. Test tool access - Verify agentic-workflows tool availability
  5. Analyze recent logs - Identify current failure pattern

Likely Root Causes

  • ✅ MCP Gateway configuration error
  • ✅ GitHub API authentication failure
  • ✅ Agentic-workflows tool not accessible
  • ✅ Repo-memory write permissions issue
  • ✅ Tool timeout or rate limiting

Recommended Fix Approach

  1. Reopen Issue [P1] Metrics Collector Failing - MCP Gateway Schema Validation Error #9898 if fix wasn't effective
  2. Test metrics collection with minimal tool configuration
  3. Verify GitHub MCP setup and authentication
  4. Check repo-memory branch permissions and access
  5. Compare with working workflows using similar tools

Success Criteria

  • Workflow runs successfully with >80% success rate
  • Daily metrics collected and stored in /tmp/gh-aw/repo-memory/default/metrics/daily/
  • latest.json updated with actual workflow run data (not just filesystem analysis)
  • Other meta-orchestrators can access historical trends

Related Issue: #9898
Workflow Run: https://github.com/githubnext/gh-aw/actions/runs/21053929777
Generated: 2026-01-16T02:53:12Z
Generated by: Workflow Health Manager

AI generated by Workflow Health Manager - Meta-Orchestrator

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions