Skip to content

[refactoring] Extract Workflow Run Data Fetch into shared component #13818

@github-actions

Description

@github-actions

Skill Overview

Create a reusable shared component for fetching GitHub Actions workflow run data with caching, following the successful pattern established by shared/issues-data-fetch.md and shared/copilot-session-data-fetch.md.

Multiple workflows need to fetch workflow run data from the GitHub Actions API for analysis, monitoring, and reporting. These workflows currently duplicate the logic for querying workflow runs, processing logs, and caching results. A shared component would reduce duplication by approximately 300 lines and standardize workflow run data access.

Current Usage

This skill pattern appears in at least 8+ workflows that fetch or analyze workflow run data:

  • ci-doctor.md - Investigates failed CI workflow runs (uses workflow_run trigger)
  • daily-cli-performance.md - Analyzes CLI compilation performance over time
  • workflow-health-manager.md - Monitors health of all agentic workflows
  • daily-firewall-report.md - Reports on firewall configuration changes
  • daily-observability-report.md - Generates observability metrics
  • metrics-collector.md - Collects workflow execution metrics
  • repo-audit-analyzer.md - Audits repository workflow configurations
  • dev-hawk.md - Monitors development workflow patterns

Proposed Shared Component

File: .github/workflows/shared/workflow-runs-data-fetch.md

Configuration:

---
tools:
  cache-memory:
    key: workflow-runs-data
  bash:
    - "gh api *"
    - "jq *"
    - "/tmp/gh-aw/jqschema.sh"
    - "mkdir *"
    - "date *"
    - "cp *"

steps:
  - name: Fetch workflow run data
    env:
      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
    run: |
      # Create output directories
      mkdir -p /tmp/gh-aw/workflow-runs-data
      mkdir -p /tmp/gh-aw/cache-memory
      
      # Get today's date for cache identification
      TODAY=$(date '+%Y-%m-%d')
      CACHE_DIR="/tmp/gh-aw/cache-memory"
      
      # Check if cached data exists from today
      if [ -f "$CACHE_DIR/workflow-runs-${TODAY}.json" ] && [ -s "$CACHE_DIR/workflow-runs-${TODAY}.json" ]; then
        echo "✓ Found cached workflow run data from ${TODAY}"
        cp "$CACHE_DIR/workflow-runs-${TODAY}.json" /tmp/gh-aw/workflow-runs-data/runs.json
        
        # Regenerate schema if missing
        if [ ! -f "$CACHE_DIR/workflow-runs-${TODAY}-schema.json" ]; then
          /tmp/gh-aw/jqschema.sh < /tmp/gh-aw/workflow-runs-data/runs.json > "$CACHE_DIR/workflow-runs-${TODAY}-schema.json"
        fi
        cp "$CACHE_DIR/workflow-runs-${TODAY}-schema.json" /tmp/gh-aw/workflow-runs-data/runs-schema.json
        
        echo "Using cached data from ${TODAY}"
        echo "Total workflow runs in cache: $(jq 'length' /tmp/gh-aw/workflow-runs-data/runs.json)"
      else
        echo "⬇ Downloading fresh workflow run data..."
        
        # Calculate date 30 days ago for filtering
        DATE_30_DAYS_AGO=$(date -d '30 days ago' '+%Y-%m-%d' 2>/dev/null || date -v-30d '+%Y-%m-%d')
        
        # Fetch workflow runs from the last 30 days
        echo "Fetching workflow runs from the last 30 days..."
        gh api "repos/${{ github.repository }}/actions/runs" \
          --paginate \
          --jq ".workflow_runs[] | select(.created_at >= \"${DATE_30_DAYS_AGO}\") | {id, name, workflow_id, head_branch, head_sha, event, status, conclusion, created_at, updated_at, run_started_at, html_url, run_attempt, path}" \
          | jq -s '.' \
          > /tmp/gh-aw/workflow-runs-data/runs.json
        
        # Generate JSON schema
        /tmp/gh-aw/jqschema.sh < /tmp/gh-aw/workflow-runs-data/runs.json > /tmp/gh-aw/workflow-runs-data/runs-schema.json
        
        # Cache for future runs
        cp /tmp/gh-aw/workflow-runs-data/runs.json "$CACHE_DIR/workflow-runs-${TODAY}.json"
        cp /tmp/gh-aw/workflow-runs-data/runs-schema.json "$CACHE_DIR/workflow-runs-${TODAY}-schema.json"
        
        echo "Fetched $(jq 'length' /tmp/gh-aw/workflow-runs-data/runs.json) workflow runs"
      fi
      
      echo "Workflow run data available at /tmp/gh-aw/workflow-runs-data/runs.json"
      echo "Schema available at /tmp/gh-aw/workflow-runs-data/runs-schema.json"
---

# Workflow Run Data

The workflow run data has been pre-fetched and is available at `/tmp/gh-aw/workflow-runs-data/runs.json`.

**Data Schema**: See `/tmp/gh-aw/workflow-runs-data/runs-schema.json` for the complete JSON schema.

**Usage Example**:
``````bash
# Count total workflow runs
jq 'length' /tmp/gh-aw/workflow-runs-data/runs.json

# Filter failed workflow runs
jq '[.[] | select(.conclusion == "failure")]' /tmp/gh-aw/workflow-runs-data/runs.json

# Filter by workflow name
jq '[.[] | select(.name == "CI")]' /tmp/gh-aw/workflow-runs-data/runs.json

# Get runs from last 7 days
jq --arg date "$(date -d '7 days ago' -Iseconds)" \
  '[.[] | select(.created_at > $date)]' /tmp/gh-aw/workflow-runs-data/runs.json

# Count runs by conclusion status
jq 'group_by(.conclusion) | map({status: .[0].conclusion, count: length})' \
  /tmp/gh-aw/workflow-runs-data/runs.json

**Usage Example**:
``````yaml
imports:
  - shared/jqschema.md                 # Required dependency
  - shared/workflow-runs-data-fetch.md

Impact

  • Lines saved: ~300 lines across 8+ workflows
  • Maintenance benefit: Centralizes workflow run data fetching, making it easier to add new fields or optimize caching
  • Consistency: Ensures all workflows use the same data structure for workflow runs
  • Performance: Reduces redundant API calls through shared caching strategy

Implementation Plan

  1. Create shared/workflow-runs-data-fetch.md based on the pattern from issues-data-fetch.md
  2. Add dependency on shared/jqschema.md for schema generation
  3. Test the shared component with a simple workflow
  4. Update ci-doctor.md to use the shared component (pilot migration)
  5. Update workflow-health-manager.md to use the shared component
  6. Update daily-cli-performance.md to use the shared component
  7. Progressively migrate remaining workflows (5+ more)
  8. Document advanced filtering patterns (by status, by date, by branch)
  9. Update AGENTS.md to reference the new shared component

Extension Opportunities

Once the base component is established, consider adding:

  • Job-level data: Fetch detailed job information for failed runs
  • Log fetching: Pre-fetch logs for failed jobs (following ci-doctor pattern)
  • Artifact data: Include artifact information in the cached data
  • Timing metrics: Calculate duration and performance statistics

Related Analysis

This recommendation comes from the Workflow Skill Extractor analysis run on 2026-02-04 analyzing 145 workflows and 58 existing shared components.

AI generated by Workflow Skill Extractor

  • expires on Feb 7, 2026, 12:06 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions