Skip to content

Add shared metrics collection infrastructure for agent performance analysis#7561

Merged
pelikhan merged 5 commits intomainfrom
copilot/create-shared-metrics-infrastructure
Dec 24, 2025
Merged

Add shared metrics collection infrastructure for agent performance analysis#7561
pelikhan merged 5 commits intomainfrom
copilot/create-shared-metrics-infrastructure

Conversation

Copy link
Contributor

Copilot AI commented Dec 24, 2025

Meta-orchestrators (Agent Performance Analyzer, Campaign Manager, Workflow Health Manager) lacked centralized metrics for historical trend analysis, performance benchmarking, and early detection of degrading workflows.

Changes

New metrics-collector workflow (.github/workflows/metrics-collector.md)

  • Runs daily using agentic-workflows tool (status and logs) as primary data source to collect:
    • Per-workflow: safe outputs, run statistics (success/fail counts, rates, token usage, costs, duration), engagement (reactions, comments), quality indicators (PR merge rates, issue close times)
    • Ecosystem: total/active workflows, overall success rate, total tokens, total costs
  • Uses GitHub MCP server (default toolset) as secondary source for engagement metrics only (reactions, comments)
  • Stores 30-day rolling window in repo-memory: memory/meta-orchestrators/metrics/{daily/YYYY-MM-DD.json, latest.json}
  • Automatic cleanup of metrics older than 30 days

Updated meta-orchestrators to consume shared metrics:

  • agent-performance-analyzer.md: Load metrics for trend analysis, compare current vs. historical performance
  • campaign-manager.md: Use metrics for campaign health assessment, velocity calculations
  • workflow-health-manager.md: Leverage metrics for quality trend tracking, MTBF calculations

Metrics schema (JSON):

{
  "timestamp": "2024-12-24T00:00:00Z",
  "workflows": {
    "workflow-name": {
      "safe_outputs": { "issues_created": 5, "prs_created": 2, ... },
      "workflow_runs": { 
        "total": 7, 
        "successful": 6, 
        "failed": 1, 
        "success_rate": 0.857,
        "avg_duration_seconds": 180,
        "total_tokens": 45000,
        "total_cost_usd": 0.45
      },
      "engagement": { "issue_reactions": 12, "pr_comments": 8, ... },
      "quality_indicators": { "pr_merge_rate": 0.75, "avg_issue_close_time_hours": 48.5, ... }
    }
  },
  "ecosystem": {
    "total_workflows": 120,
    "active_workflows": 85,
    "overall_success_rate": 0.892,
    "total_tokens": 1250000,
    "total_cost_usd": 12.50
  }
}

Architecture Benefits:

  • Efficient: Agentic-workflows tool optimized for workflow log retrieval with structured data
  • Complete: Includes token usage and cost metrics automatically from execution logs
  • Accurate: Data from authoritative workflow execution logs, no manual footer parsing needed
  • Performant: Minimal API calls, efficient data processing

Enables historical trend analysis, anomaly detection, cost tracking, and reduces redundant GitHub API queries across meta-orchestrators.

Original prompt

This section details on the original issue you should resolve

<issue_title>Create Shared Metrics Collection Infrastructure for Agent Performance</issue_title>
<issue_description>## Problem

The agent ecosystem currently lacks centralized performance metrics collection and storage. This prevents:

  • Historical trend analysis
  • Performance benchmarking
  • Early detection of degrading agents
  • Data-driven optimization decisions
  • Objective performance comparisons

Meta-orchestrators (Agent Performance Analyzer, Campaign Manager, Workflow Health Manager) need shared metrics to coordinate effectively and make strategic decisions.

Solution

Create a shared metrics collection infrastructure using repo-memory for persistence.

Architecture

1. Metrics Schema (JSON format in repo-memory)

{
  "timestamp": "2024-12-24T12:00:00Z",
  "period": "daily",
  "workflows": {
    "workflow-name": {
      "safe_outputs": {
        "issues_created": 5,
        "prs_created": 2,
        "comments_added": 10,
        "discussions_created": 1
      },
      "workflow_runs": {
        "total": 7,
        "successful": 6,
        "failed": 1,
        "success_rate": 0.857
      },
      "engagement": {
        "issue_reactions": 12,
        "pr_comments": 8,
        "discussion_replies": 3
      },
      "quality_indicators": {
        "pr_merge_rate": 0.75,
        "avg_issue_close_time_hours": 48,
        "avg_pr_merge_time_hours": 72
      }
    }
  },
  "ecosystem": {
    "total_workflows": 120,
    "active_workflows": 85,
    "total_safe_outputs": 45,
    "overall_success_rate": 0.892
  }
}

2. Storage Location

/tmp/gh-aw/repo-memory-default/memory/meta-orchestrators/
├── metrics/
│   ├── daily/
│   │   ├── 2024-12-24.json
│   │   ├── 2024-12-25.json
│   │   └── ...
│   ├── weekly/
│   │   ├── 2024-W51.json
│   │   └── ...
│   └── latest.json (symlink or copy of most recent)
└── trends/
    ├── workflow-success-rates.json
    └── safe-output-volume.json

3. Collection Workflow

Create new workflow: .github/workflows/metrics-collector.md

---
description: Collects daily performance metrics for agent ecosystem
on:
  schedule:
    - cron: "0 0 * * *"  # Daily at midnight UTC
  workflow_dispatch:
permissions:
  contents: read
  issues: read
  pull-requests: read
  discussions: read
  actions: read
engine: copilot
tools:
  github:
    mode: remote
    toolsets: [default, actions]
  repo-memory:
    branch-name: memory/meta-orchestrators
    file-glob: "metrics/**/*"
timeout-minutes: 15
---

Responsibilities:

  • Query GitHub API for last 24 hours of activity
  • Calculate metrics per workflow
  • Store in daily JSON file
  • Update rolling aggregates
  • Clean up old daily files (keep 30 days)

4. Consumer Integration

Update meta-orchestrators to read metrics:

  • Agent Performance Analyzer: Read metrics for performance analysis
  • Campaign Manager: Use metrics for campaign health assessment
  • Workflow Health Manager: Incorporate metrics into health monitoring

Implementation Plan

Phase 1: Basic Collection (Week 1)

  1. Create metrics-collector workflow
  2. Implement basic safe output counting
  3. Store daily JSON files
  4. Test storage and retrieval

Phase 2: Enrichment (Week 2)

  1. Add workflow run statistics
  2. Calculate success rates
  3. Add engagement metrics (reactions, comments)
  4. Implement quality indicators

Phase 3: Integration (Week 3)

  1. Update Agent Performance Analyzer to consume metrics
  2. Update Campaign Manager to use metrics
  3. Update Workflow Health Manager integration
  4. Create shared query utilities

Phase 4: Visualization (Week 4)

  1. Add trend calculation
  2. Create summary dashboards
  3. Implement alerting for anomalies
  4. Document metrics usage

Expected Benefits

Enables:

  • Historical trend analysis (week-over-week, month-over-month)
  • Performance benchmarking (compare agents to ecosystem averages)
  • Anomaly detection (sudden drops in success rate)
  • Evidence-based prioritization
  • Objective performance rankings

Improves:

  • Meta-orchestrator coordination (shared data foundation)
  • Report accuracy and depth
  • Early problem detection
  • Strategic decision-making quality

Acceptance Criteria

  • Metrics collector workflow created and running daily
  • Metrics stored in repo-memory with defined schema
  • 7 days of historical data collected
  • Agent Performance Analyzer successfully reads metrics
  • Documentation for adding new metrics
  • No performance impact on existing workflows

Priority

High - Foundation for data-driven agent ecosystem management

Effort Estimate

Total: 12-16 hours across 4 weeks

  • Phase 1: 4-5 hours
  • Phase 2: 3-4 hours
  • Phase 3: 3-4 hours
  • Phase 4: 2-3 hours

Dependencies

  • Requires: GitHub API access for Agent Performance Analyzer (separate issue)
  • Blocks: Performance benchmarking, trend analysis, quality scoring

Related

  • Agent Performance...

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits December 24, 2025 22:07
…hared metrics

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
@pelikhan
Copy link
Contributor

@copilot agentic-workflows server (configured in tools) and logs tool to collect metrics

Copilot AI changed the title [WIP] Create shared metrics collection infrastructure for agent performance Add shared metrics collection infrastructure for agent performance analysis Dec 24, 2025
Copilot AI requested a review from mnkiefer December 24, 2025 22:18
…ctions API

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 24, 2025

@copilot agentic-workflows server (configured in tools) and logs tool to collect metrics

Updated to use the agentic-workflows tool (status and logs) as the primary data source for metrics collection. The GitHub MCP server is now used only for engagement metrics. This provides more efficient log retrieval, includes token usage and cost data, and simplifies data processing. Changes in commit aa8f667.

Copilot AI requested a review from pelikhan December 24, 2025 22:28
@pelikhan pelikhan marked this pull request as ready for review December 24, 2025 23:01
@pelikhan pelikhan merged commit 8e8b7f8 into main Dec 24, 2025
4 checks passed
@pelikhan pelikhan deleted the copilot/create-shared-metrics-infrastructure branch December 24, 2025 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create Shared Metrics Collection Infrastructure for Agent Performance

3 participants