-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Workflow Health Dashboard - 2026-01-04
Overview
- Total Workflows: 128
- Healthy: Unable to determine (no metrics data) 🔴
- Warning: 10 (7.8%) - Outdated lock files
⚠️ - Critical: 2 issues identified 🚨
- Inactive: Unknown (no metrics data)
Critical Issues 🚨
Issue 1: Metrics Collection System Down
- Status: No execution metrics available
- Error:
/tmp/gh-aw/repo-memory-default/memory/default/metrics/latest.jsondoes not exist - Impact: Cannot monitor workflow health, success rates, or failure patterns
- Root Cause:
metrics-collector.mdworkflow has outdated lock file - Action: Issue created for enabling metrics collection
- Priority: P0
Issue 2: 10 Workflows with Outdated Lock Files (7.8%)
- Status: Source
.mdfiles modified after.lock.ymlcompilation - Impact: Runtime behavior may not match source code
- Affected Workflows:
smoke-copilot-playwright.mdgo-fan.mdstale-repo-identifier.mdduplicate-code-detector.mdcopilot-pr-nlp-analysis.mdsmoke-srt.mdgithub-mcp-structural-analysis.mdmetrics-collector.md⚠️ Criticalincident-response.mdlayout-spec-maintainer.md
- Action: Issue created for recompilation
- Priority: P0
Structural Health Analysis ✅
Since execution metrics are unavailable, this assessment focuses on structural health:
Compilation Coverage
- ✅ 100% coverage: All 128 workflows have
.lock.ymlfiles ⚠️ 7.8% outdated: 10 workflows need recompilation
Engine Distribution
| Engine | Count | Percentage | Status |
|---|---|---|---|
| Copilot | 69 | 53.9% | ✅ Healthy diversity |
| Claude | 25 | 19.5% | ✅ Good alternative |
| Codex | 7 | 5.5% | ✅ Specialized use |
| Other | 27 | 21.1% |
Analysis: Healthy distribution prevents single point of failure. Copilot as primary engine is appropriate for GitHub integration.
Workflow Categories
| Category | Count | Notes |
|---|---|---|
| Campaign Workflows | 2 | Campaign orchestration |
| Smoke Tests | 10 | Testing infrastructure |
| Daily Scheduled | 17 | Regular maintenance |
| Weekly Scheduled | 1 | Long-term analysis |
| Hourly Scheduled | 1 | High-frequency monitoring |
| Event-Triggered | ~97 | Majority of workflows |
Analysis: Good balance of scheduled vs. event-triggered workflows. Scheduling spread reduces resource contention.
Tool Usage
| Tool | Workflows | Coverage | Status |
|---|---|---|---|
| GitHub MCP | 94 | 73% | ✅ Excellent adoption |
| Playwright | 11 | 9% | ✅ Appropriate for UI testing |
| Fetch | 8 | 6% | ✅ Web content retrieval |
Analysis: Heavy GitHub MCP usage is expected and healthy for repository operations.
Systemic Patterns
Positive Indicators ✅
- Complete compilation coverage: All workflows have lock files
- Strong naming conventions: Clear categorization (daily-, smoke-, etc.)
- Engine diversity: Multiple engines prevent vendor lock-in
- Standardized tooling: Widespread GitHub MCP adoption
- No orphaned lock files: Clean 1:1 mapping between source and compiled files
Areas of Concern ⚠️
- Meta-monitoring gap: Metrics collector itself is outdated
- No execution visibility: Cannot assess runtime health
- Missing metrics infrastructure: Need 7 days of data for trends
- Safe outputs visibility: Frontmatter declarations appear missing
Data Limitations 🔴
Current Analysis Limited By:
- ❌ No workflow execution metrics
- ❌ No failure rate data
- ❌ No runtime performance data
- ❌ No error pattern analysis
- ❌ Cannot calculate MTBF
- ❌ Cannot identify failing workflows
Reason: Metrics Collector workflow is outdated and metrics storage not populated.
Impact: This assessment can only evaluate structural health (compilation, configuration, categorization). Runtime health monitoring requires metrics data.
Recommendations
Immediate Actions (P0)
- ✅ Recompile outdated workflows - Issue created
- ✅ Enable metrics collection - Issue created
- ⏳ Verify metrics collection - Pending workflow fix
- ⏳ Wait for baseline data - Need 7 days of metrics
High Priority (P1)
- Establish monitoring alerts - Set up notifications for workflow failures
- Document workflow dependencies - Map inter-workflow relationships
- Verify safe outputs usage - Deep dive into workflow bodies
Medium Priority (P2)
- Analyze execution patterns - Once metrics available
- Optimize scheduling - Prevent resource contention
- Review smoke test coverage - Ensure critical paths tested
Low Priority (P3)
- Standardize frontmatter - Consistent metadata across workflows
- Add workflow descriptions - Improve discoverability
- Document engine selection - Guidelines for choosing engines
Actions Taken This Run
- ✅ Scanned 128 executable workflows
- ✅ Verified 100% compilation coverage
- ✅ Identified 10 outdated lock files
- ✅ Created 2 P0 issues for critical problems
- ✅ Saved analysis to shared repo memory
- ✅ Created coordination alerts for other meta-orchestrators
Trends
- Overall health score: Unable to calculate (no metrics data)
- Compilation health: 92.2% (118/128 up-to-date)
- New failures this week: Unknown (no metrics)
- Fixed issues this week: Unknown (no metrics)
- Average success rate: Unknown (no metrics)
Next Steps
- ⏳ Monitor recompilation issue resolution
- ⏳ Monitor metrics collection enablement
- ⏳ Wait 7 days for metrics baseline
- 🔄 Re-run comprehensive health analysis with execution data
- 🔄 Establish ongoing monitoring and alerting
Success Metrics Target
Once metrics are available, track:
- Overall workflow health score > 80/100
- Workflow success rate > 90%
- Mean time between failures (MTBF) > 7 days
- Outdated lock files < 5%
- Failed workflows detected within 24 hours
Last updated: 2026-01-04T02:59:53Z
Next check: After metrics collection enabled (7 days minimum for baseline)
Dashboard maintained by: Workflow Health Manager
Shared memory:/tmp/gh-aw/repo-memory/default/workflow-health-latest.md
AI generated by Workflow Health Manager - Meta-Orchestrator