[refactoring] Extract Workflow Run Data Fetch into shared component

### Skill Overview

Create a reusable shared component for fetching GitHub Actions workflow run data with caching, following the successful pattern established by `shared/issues-data-fetch.md` and `shared/copilot-session-data-fetch.md`.

Multiple workflows need to fetch workflow run data from the GitHub Actions API for analysis, monitoring, and reporting. These workflows currently duplicate the logic for querying workflow runs, processing logs, and caching results. A shared component would reduce duplication by approximately 300 lines and standardize workflow run data access.

### Current Usage

This skill pattern appears in at least **8+ workflows** that fetch or analyze workflow run data:

- [ ] `ci-doctor.md` - Investigates failed CI workflow runs (uses `workflow_run` trigger)
- [ ] `daily-cli-performance.md` - Analyzes CLI compilation performance over time
- [ ] `workflow-health-manager.md` - Monitors health of all agentic workflows
- [ ] `daily-firewall-report.md` - Reports on firewall configuration changes
- [ ] `daily-observability-report.md` - Generates observability metrics
- [ ] `metrics-collector.md` - Collects workflow execution metrics
- [ ] `repo-audit-analyzer.md` - Audits repository workflow configurations
- [ ] `dev-hawk.md` - Monitors development workflow patterns

### Proposed Shared Component

**File**: `.github/workflows/shared/workflow-runs-data-fetch.md`

**Configuration**:
``````yaml
---
tools:
  cache-memory:
    key: workflow-runs-data
  bash:
    - "gh api *"
    - "jq *"
    - "/tmp/gh-aw/jqschema.sh"
    - "mkdir *"
    - "date *"
    - "cp *"

steps:
  - name: Fetch workflow run data
    env:
      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
    run: |
      # Create output directories
      mkdir -p /tmp/gh-aw/workflow-runs-data
      mkdir -p /tmp/gh-aw/cache-memory
      
      # Get today's date for cache identification
      TODAY=$(date '+%Y-%m-%d')
      CACHE_DIR="/tmp/gh-aw/cache-memory"
      
      # Check if cached data exists from today
      if [ -f "$CACHE_DIR/workflow-runs-${TODAY}.json" ] && [ -s "$CACHE_DIR/workflow-runs-${TODAY}.json" ]; then
        echo "✓ Found cached workflow run data from ${TODAY}"
        cp "$CACHE_DIR/workflow-runs-${TODAY}.json" /tmp/gh-aw/workflow-runs-data/runs.json
        
        # Regenerate schema if missing
        if [ ! -f "$CACHE_DIR/workflow-runs-${TODAY}-schema.json" ]; then
          /tmp/gh-aw/jqschema.sh < /tmp/gh-aw/workflow-runs-data/runs.json > "$CACHE_DIR/workflow-runs-${TODAY}-schema.json"
        fi
        cp "$CACHE_DIR/workflow-runs-${TODAY}-schema.json" /tmp/gh-aw/workflow-runs-data/runs-schema.json
        
        echo "Using cached data from ${TODAY}"
        echo "Total workflow runs in cache: $(jq 'length' /tmp/gh-aw/workflow-runs-data/runs.json)"
      else
        echo "⬇ Downloading fresh workflow run data..."
        
        # Calculate date 30 days ago for filtering
        DATE_30_DAYS_AGO=$(date -d '30 days ago' '+%Y-%m-%d' 2>/dev/null || date -v-30d '+%Y-%m-%d')
        
        # Fetch workflow runs from the last 30 days
        echo "Fetching workflow runs from the last 30 days..."
        gh api "repos/${{ github.repository }}/actions/runs" \
          --paginate \
          --jq ".workflow_runs[] | select(.created_at >= \"${DATE_30_DAYS_AGO}\") | {id, name, workflow_id, head_branch, head_sha, event, status, conclusion, created_at, updated_at, run_started_at, html_url, run_attempt, path}" \
          | jq -s '.' \
          > /tmp/gh-aw/workflow-runs-data/runs.json
        
        # Generate JSON schema
        /tmp/gh-aw/jqschema.sh < /tmp/gh-aw/workflow-runs-data/runs.json > /tmp/gh-aw/workflow-runs-data/runs-schema.json
        
        # Cache for future runs
        cp /tmp/gh-aw/workflow-runs-data/runs.json "$CACHE_DIR/workflow-runs-${TODAY}.json"
        cp /tmp/gh-aw/workflow-runs-data/runs-schema.json "$CACHE_DIR/workflow-runs-${TODAY}-schema.json"
        
        echo "Fetched $(jq 'length' /tmp/gh-aw/workflow-runs-data/runs.json) workflow runs"
      fi
      
      echo "Workflow run data available at /tmp/gh-aw/workflow-runs-data/runs.json"
      echo "Schema available at /tmp/gh-aw/workflow-runs-data/runs-schema.json"
---

# Workflow Run Data

The workflow run data has been pre-fetched and is available at `/tmp/gh-aw/workflow-runs-data/runs.json`.

**Data Schema**: See `/tmp/gh-aw/workflow-runs-data/runs-schema.json` for the complete JSON schema.

**Usage Example**:
``````bash
# Count total workflow runs
jq 'length' /tmp/gh-aw/workflow-runs-data/runs.json

# Filter failed workflow runs
jq '[.[] | select(.conclusion == "failure")]' /tmp/gh-aw/workflow-runs-data/runs.json

# Filter by workflow name
jq '[.[] | select(.name == "CI")]' /tmp/gh-aw/workflow-runs-data/runs.json

# Get runs from last 7 days
jq --arg date "$(date -d '7 days ago' -Iseconds)" \
  '[.[] | select(.created_at > $date)]' /tmp/gh-aw/workflow-runs-data/runs.json

# Count runs by conclusion status
jq 'group_by(.conclusion) | map({status: .[0].conclusion, count: length})' \
  /tmp/gh-aw/workflow-runs-data/runs.json
``````
``````

**Usage Example**:
``````yaml
imports:
  - shared/jqschema.md                 # Required dependency
  - shared/workflow-runs-data-fetch.md
``````

### Impact

- **Lines saved**: ~300 lines across 8+ workflows
- **Maintenance benefit**: Centralizes workflow run data fetching, making it easier to add new fields or optimize caching
- **Consistency**: Ensures all workflows use the same data structure for workflow runs
- **Performance**: Reduces redundant API calls through shared caching strategy

### Implementation Plan

1. [ ] Create `shared/workflow-runs-data-fetch.md` based on the pattern from `issues-data-fetch.md`
2. [ ] Add dependency on `shared/jqschema.md` for schema generation
3. [ ] Test the shared component with a simple workflow
4. [ ] Update `ci-doctor.md` to use the shared component (pilot migration)
5. [ ] Update `workflow-health-manager.md` to use the shared component
6. [ ] Update `daily-cli-performance.md` to use the shared component
7. [ ] Progressively migrate remaining workflows (5+ more)
8. [ ] Document advanced filtering patterns (by status, by date, by branch)
9. [ ] Update AGENTS.md to reference the new shared component

### Extension Opportunities

Once the base component is established, consider adding:
- **Job-level data**: Fetch detailed job information for failed runs
- **Log fetching**: Pre-fetch logs for failed jobs (following ci-doctor pattern)
- **Artifact data**: Include artifact information in the cached data
- **Timing metrics**: Calculate duration and performance statistics

### Related Analysis

This recommendation comes from the Workflow Skill Extractor analysis run on 2026-02-04 analyzing 145 workflows and 58 existing shared components.




> AI generated by [Workflow Skill Extractor](https://github.com/github/gh-aw/actions/runs/21693115438)
> - [x] expires  on Feb 7, 2026, 12:06 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[refactoring] Extract Workflow Run Data Fetch into shared component #13818

Skill Overview

Current Usage

Proposed Shared Component

Impact

Implementation Plan

Extension Opportunities

Related Analysis

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[refactoring] Extract Workflow Run Data Fetch into shared component #13818

Description

Skill Overview

Current Usage

Proposed Shared Component

Impact

Implementation Plan

Extension Opportunities

Related Analysis

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions