Skip to content

[refactoring] Extract GitHub Data Analysis Framework into shared component #10308

@github-actions

Description

@github-actions

Pattern Overview

The GitHub Data Analysis Framework pattern appears in 19+ workflows that fetch GitHub data (issues, PRs, workflow runs, metrics), store results in persistent memory (repo-memory or cache-memory), perform analysis with trend calculations, and generate reports. Extracting this into a shared component would eliminate ~2,850 lines of duplicated code.

Current Usage

This pattern appears in the following workflows:

  • audit-workflows.md (lines 14-18, 176-220)
  • workflow-health-manager.md (lines 15-17, 176-220)
  • daily-code-metrics.md (lines 11-17, 43-90)
  • daily-issues-report.md (lines 31-34, 59-68)
  • copilot-pr-merged-report.md (repo memory not used, but similar pattern)
  • daily-team-status.md
  • metrics-collector.md
  • daily-firewall-report.md
  • daily-performance-summary.md
  • org-health-report.md
  • daily-copilot-token-report.md
  • daily-secrets-analysis.md
  • daily-team-evolution-insights.md
  • agent-performance-analyzer.md
  • campaign-generator.md
  • copilot-session-insights.md
  • portfolio-analyst.md
  • repo-audit-analyzer.md
  • repository-quality-improver.md

Proposed Shared Component

File: .github/workflows/shared/github-data-analysis-framework.md

Configuration:
```yaml

tools:
repo-memory:
description: "Shared data analysis storage"
file-glob: [".json", ".jsonl", ".csv", ".md"]
max-file-size: 102400
bash:
- "jq *"
- "date *"
- "mkdir *"
- "cp *"
- "cat *"
- "bc *"

steps:

  • name: Setup analysis environment
    run: |

    Create structured directories for data analysis

    mkdir -p /tmp/gh-aw/analysis/{data,historical,output}
    mkdir -p /tmp/gh-aw/repo-memory/default/metrics/daily
    echo "Analysis environment ready"
    echo "Current run: $(date -u +%Y-%m-%dT%H:%M:%SZ)"

  • name: Load historical context
    run: |

    Load last 30 days of historical data for trend analysis

    HISTORY_DIR="/tmp/gh-aw/repo-memory/default/metrics/daily"
    if [ -d "$HISTORY_DIR" ]; then
    # Copy last 30 days of data
    find "$HISTORY_DIR" -name "*.json" -mtime -30
    -exec cp {} /tmp/gh-aw/analysis/historical/ ; 2>/dev/null || true

    HIST_COUNT=$(ls -1 /tmp/gh-aw/analysis/historical/ 2>/dev/null | wc -l)
    echo "Loaded $HIST_COUNT days of historical data"
    

    else
    echo "No historical data found (first run)"
    fi


GitHub Data Analysis Framework

This shared component provides a standardized framework for workflows that analyze GitHub data with persistent storage and trend tracking.

Features

  • Persistent storage with repo-memory for historical data
  • Structured directories for organized data management
  • Historical data loading (last 30 days automatically)
  • Trend calculation helpers for 7-day and 30-day comparisons
  • Standardized metrics storage format

Directory Structure

```
/tmp/gh-aw/analysis/
├── data/ # Current run data and inputs
├── historical/ # Last 30 days for comparison
└── output/ # Analysis results and reports

/tmp/gh-aw/repo-memory/default/
└── metrics/
└── daily/ # Daily metrics stored as YYYY-MM-DD.json
```

Trend Calculation Helpers

```bash

Calculate percentage change between two values

calculate_trend() {
local current=$1
local previous=$2

if [ -z "$previous" ] || [ "$previous" = "0" ] || [ "$previous" = "null" ]; then
echo "N/A"
return
fi

local change=$(echo "scale=2; (($current - $previous) / $previous) * 100" | bc)
printf "%.1f" "$change"
}

Get trend indicator emoji

get_trend_indicator() {
local change=$1

if [ "$change" = "N/A" ]; then
echo "➡️"
elif (( $(echo "$change > 10" | bc -l) )); then
echo "⬆️"
elif (( $(echo "$change < -10" | bc -l) )); then
echo "⬇️"
else
echo "➡️"
fi
}

Get value from N days ago

get_historical_value() {
local metric_path=$1
local days_ago=$2

local target_date=$(date -d "$days_ago days ago" '+%Y-%m-%d' 2>/dev/null ||
date -v-${days_ago}d '+%Y-%m-%d')
local hist_file="/tmp/gh-aw/analysis/historical/${target_date}.json"

if [ -f "$hist_file" ]; then
jq -r "$metric_path // null" "$hist_file"
else
echo "null"
fi
}
```

Storage Pattern

Store daily metrics in standardized format:

```bash

Store current run metrics

TODAY=$(date +%Y-%m-%d)
METRICS_FILE="/tmp/gh-aw/repo-memory/default/metrics/daily/${TODAY}.json"

jq -n
--arg date "$TODAY"
--arg timestamp "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
--argjson metrics "$metrics_json"
'{
date: $date,
timestamp: $timestamp,
metrics: $metrics
}' > "$METRICS_FILE"

echo "Metrics stored: $METRICS_FILE"
```

Trend Analysis Pattern

```bash

Calculate 7-day and 30-day trends

current_value=100
value_7d_ago=$(get_historical_value '.metrics.total_count' 7)
value_30d_ago=$(get_historical_value '.metrics.total_count' 30)

trend_7d=$(calculate_trend "$current_value" "$value_7d_ago")
trend_30d=$(calculate_trend "$current_value" "$value_30d_ago")

indicator_7d=$(get_trend_indicator "$trend_7d")
indicator_30d=$(get_trend_indicator "$trend_30d")

echo "Current: $current_value"
echo "7-day change: ${indicator_7d} ${trend_7d}%"
echo "30-day change: ${indicator_30d} ${trend_30d}%"
```

Usage Example

```yaml
imports:

  • shared/github-data-analysis-framework.md
  • shared/issues-data-fetch.md

Your workflow-specific tools

tools:
github:
toolsets: [default]
```

Then in your workflow prompt:

```bash

Analysis environment is already set up

Load your current data

cp /tmp/gh-aw/issues-data/issues.json /tmp/gh-aw/analysis/data/

Perform analysis

current_total=$(jq 'length' /tmp/gh-aw/analysis/data/issues.json)

Calculate trends using helpers

value_7d_ago=$(get_historical_value '.metrics.total_issues' 7)
trend_7d=$(calculate_trend "$current_total" "$value_7d_ago")
indicator_7d=$(get_trend_indicator "$trend_7d")

Store results

TODAY=$(date +%Y-%m-%d)
jq -n
--arg date "$TODAY"
--argjson total "$current_total"
--arg trend_7d "$trend_7d"
'{date: $date, metrics: {total_issues: $total, trend_7d: $trend_7d}}' \

"/tmp/gh-aw/repo-memory/default/metrics/daily/${TODAY}.json"
```

Best Practices

  1. Consistent metric names: Use same field names across runs for trend tracking
  2. Date-based files: Always use YYYY-MM-DD.json format for daily metrics
  3. Null handling: Check for null/missing historical data before calculations
  4. Cross-platform: Use compatible date commands (GNU date with BSD fallback)
  5. Validation: Verify historical files exist before loading
    ```

Usage Example:
```yaml

In a workflow

imports:

  • shared/github-data-analysis-framework.md
  • shared/issues-data-fetch.md
    ```

Impact

  • Workflows affected: 19 workflows
  • Lines saved: ~2,850 lines (150 per workflow)
  • Maintenance benefit: Single source of truth for data analysis patterns, easier to add features like new trend calculations or historical comparisons

Implementation Plan

  1. Create shared component at .github/workflows/shared/github-data-analysis-framework.md
  2. Pilot: Update audit-workflows.md to use the shared framework
  3. Pilot: Update workflow-health-manager.md to use the shared framework
  4. Validate pilots work correctly with historical data
  5. Update daily-code-metrics.md
  6. Update daily-issues-report.md
  7. Update remaining 15 workflows in batches of 3-4
  8. Test all updated workflows
  9. Update documentation with usage examples

Related Analysis

This recommendation comes from the Workflow Pattern Harvester analysis run on 2026-01-16.

AI generated by Workflow Pattern Harvester

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions