Skip to content

[refactoring] Extract "GitHub GraphQL Data Fetching with Caching" into shared component #15037

@github-actions

Description

@github-actions

Skill Overview

Multiple workflows fetch GitHub data using gh api graphql with custom GraphQL queries for issues, pull requests, discussions, commits, and releases. This pattern is repeated across workflows with slight variations, leading to duplication and inconsistency.

Why this should be shared: The existing issues-data-fetch.md component only uses gh issue list (REST API), which is limited. Many workflows need the full power of GraphQL to fetch nested data (labels, comments, reviews) in a single query. A comprehensive shared component would provide standardized GraphQL queries with intelligent caching.

Current Usage

This skill appears in the following workflows:

  • daily-news.md (lines 96-242) - Fetches issues, PRs, discussions, commits, releases using GraphQL
  • copilot-pr-merged-report.md - Fetches PR data with reviews and comments
  • weekly-issue-summary.md - Fetches recent issues with labels and comments
  • daily-team-status.md - Fetches team activity data
  • copilot-session-insights.md - Uses GraphQL for session data
  • github-mcp-tools-report.md - Fetches repository metadata
  • daily-observability-report.md - Fetches workflow run data
  • org-health-report.md - Fetches organization-level data
  • Additional 5-7 workflows use similar patterns

Total: ~12-15 workflows use GraphQL queries

Proposed Shared Component

File: .github/workflows/shared/github-graphql-data-fetch.md

Configuration:

---
tools:
  cache-memory:
    key: github-graphql-data
  bash:
    - "gh api *"
    - "jq *"
    - "mkdir *"
    - "date *"
    - "cp *"

steps:
  - name: Setup data directories and cache
    env:
      GITHUB_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
      GH_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
    run: |
      set -e
      
      # Create directories
      mkdir -p /tmp/gh-aw/github-data
      mkdir -p /tmp/gh-aw/cache-memory/github-graphql-data
      
      # Check cache validity (< 24 hours)
      TODAY=$(date '+%Y-%m-%d')
      CACHE_VALID=false
      CACHE_TIMESTAMP_FILE="/tmp/gh-aw/cache-memory/github-graphql-data/.timestamp"
      
      if [ -f "$CACHE_TIMESTAMP_FILE" ]; then
        CACHE_AGE=$(($(date +%s) - $(cat "$CACHE_TIMESTAMP_FILE")))
        if [ $CACHE_AGE -lt 86400 ]; then
          echo "✓ Found valid cached data (age: \$\{CACHE_AGE}s)"
          CACHE_VALID=true
        fi
      fi
      
      echo "cache_valid=$CACHE_VALID" >> "$GITHUB_OUTPUT"

  - name: Fetch issues with GraphQL
    if: steps.setup.outputs.cache_valid != 'true'
    env:
      GITHUB_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
      GH_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
    run: |
      set -e
      echo "Fetching issues data..."
      gh api graphql -f query="
        query(\$owner: String!, \$repo: String!) {
          repository(owner: \$owner, name: \$repo) {
            openIssues: issues(first: 100, states: OPEN, orderBy: {field: UPDATED_AT, direction: DESC}) {
              nodes {
                number
                title
                state
                createdAt
                updatedAt
                author { login }
                labels(first: 10) { nodes { name color } }
                comments { totalCount }
                body
              }
            }
            closedIssues: issues(first: 100, states: CLOSED, orderBy: {field: UPDATED_AT, direction: DESC}) {
              nodes {
                number
                title
                state
                createdAt
                updatedAt
                closedAt
                author { login }
                labels(first: 10) { nodes { name color } }
              }
            }
          }
        }
      " -f owner="\$\{GITHUB_REPOSITORY_OWNER}" -f repo="\$\{GITHUB_REPOSITORY#*/}" > /tmp/gh-aw/github-data/issues.json
      echo "✓ Issues data fetched"

  - name: Fetch pull requests with GraphQL
    if: steps.setup.outputs.cache_valid != 'true'
    env:
      GITHUB_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
      GH_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
    run: |
      set -e
      echo "Fetching pull requests..."
      gh api graphql -f query="
        query(\$owner: String!, \$repo: String!) {
          repository(owner: \$owner, name: \$repo) {
            openPRs: pullRequests(first: 50, states: OPEN, orderBy: {field: UPDATED_AT, direction: DESC}) {
              nodes {
                number
                title
                state
                createdAt
                updatedAt
                author { login }
                additions
                deletions
                changedFiles
                reviews(first: 10) { totalCount }
                labels(first: 10) { nodes { name color } }
              }
            }
            mergedPRs: pullRequests(first: 50, states: MERGED, orderBy: {field: UPDATED_AT, direction: DESC}) {
              nodes {
                number
                title
                state
                createdAt
                updatedAt
                mergedAt
                author { login }
                additions
                deletions
              }
            }
          }
        }
      " -f owner="\$\{GITHUB_REPOSITORY_OWNER}" -f repo="\$\{GITHUB_REPOSITORY#*/}" > /tmp/gh-aw/github-data/pull_requests.json
      echo "✓ Pull requests data fetched"

  - name: Fetch discussions with GraphQL
    if: steps.setup.outputs.cache_valid != 'true'
    env:
      GITHUB_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
      GH_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
    run: |
      set -e
      echo "Fetching discussions..."
      gh api graphql -f query="
        query(\$owner: String!, \$repo: String!) {
          repository(owner: \$owner, name: \$repo) {
            discussions(first: 50, orderBy: {field: UPDATED_AT, direction: DESC}) {
              nodes {
                number
                title
                createdAt
                updatedAt
                author { login }
                category { name }
                comments { totalCount }
                url
              }
            }
          }
        }
      " -f owner="\$\{GITHUB_REPOSITORY_OWNER}" -f repo="\$\{GITHUB_REPOSITORY#*/}" > /tmp/gh-aw/github-data/discussions.json
      echo "✓ Discussions data fetched"

  - name: Cache fetched data
    if: steps.setup.outputs.cache_valid != 'true'
    env:
      GITHUB_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
      GH_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
    run: |
      set -e
      echo "Caching data for future runs..."
      cp -r /tmp/gh-aw/github-data/* /tmp/gh-aw/cache-memory/github-graphql-data/
      date +%s > "/tmp/gh-aw/cache-memory/github-graphql-data/.timestamp"
      echo "✓ Data cached"

  - name: Restore from cache
    if: steps.setup.outputs.cache_valid == 'true'
    run: |
      set -e
      echo "Restoring cached data..."
      cp -r /tmp/gh-aw/cache-memory/github-graphql-data/* /tmp/gh-aw/github-data/
      echo "✓ Cached data restored"
---

# GitHub GraphQL Data Fetch

Pre-fetched GitHub data is available at `/tmp/gh-aw/github-data/`:

- **`issues.json`**: Open and recently closed issues (last 100 each) with labels, comments, body
- **`pull_requests.json`**: Open and merged PRs (last 50 each) with reviews, labels, stats
- **`discussions.json`**: Recent discussions (last 50) with category, comments, URL

### Intelligent Caching

Data is cached for 24 hours to reduce API calls and improve performance. Multiple workflows running on the same day share the same cached data.

### Usage Examples

``````bash
# Count open issues
jq '.repository.openIssues.nodes | length' /tmp/gh-aw/github-data/issues.json

# Get PRs merged today
TODAY=$(date '+%Y-%m-%d')
jq --arg date "$TODAY" '.repository.mergedPRs.nodes | map(select(.mergedAt | startswith($date)))' /tmp/gh-aw/github-data/pull_requests.json

# Get most active discussions
jq '.repository.discussions.nodes | sort_by(.comments.totalCount) | reverse | .[0:5]' /tmp/gh-aw/github-data/discussions.json

**Usage Example**:

``````yaml
# In a workflow
imports:
  - shared/github-graphql-data-fetch.md

# Data is automatically available at /tmp/gh-aw/github-data/
# No need to write GraphQL queries or caching logic

Impact

  • Workflows affected: 12-15 workflows
  • Lines saved: ~150-200 lines per workflow = 1,800-3,000 total lines
  • Maintenance benefit:
    • Single location to update GraphQL queries
    • Consistent data structure across all workflows
    • Intelligent caching reduces API rate limit usage
    • Easier to add new data types (commits, releases, workflow runs)
    • Reduces cognitive load - developers don't need to write GraphQL

Implementation Plan

  1. Create shared component at .github/workflows/shared/github-graphql-data-fetch.md
  2. Implement GraphQL queries for issues, PRs, discussions
  3. Add intelligent caching with 24-hour expiry
  4. Test with daily-news.md as proof-of-concept
  5. Add additional queries (commits, releases, workflow runs) based on needs
  6. Migrate 3-5 high-traffic workflows
  7. Document query customization patterns for special cases
  8. Gradually migrate remaining workflows

Example Before/After

Before (daily-news.md, lines 96-242):

steps:
  - name: Setup directories and check cache
    # ... 95 lines of caching logic ...
  
  - name: Fetch issues data
    # ... 35 lines of GraphQL query ...
  
  - name: Fetch pull requests data
    # ... 48 lines of GraphQL query ...
  
  - name: Fetch discussions data
    # ... 24 lines of GraphQL query ...
  
  - name: Cache downloaded data
    # ... 10 lines of caching logic ...

After:

imports:
  - shared/github-graphql-data-fetch.md

# All data available at /tmp/gh-aw/github-data/
# 212 lines replaced with 1 import!

Related Analysis

This recommendation comes from the Workflow Skill Extractor analysis run on 2026-02-11. This is the highest impact opportunity identified, saving ~1,800-3,000 lines across 12-15 workflows.

Additional Benefits

  1. Consistency: All workflows get the same data structure
  2. Performance: 24-hour cache reduces API calls dramatically
  3. Reliability: Centralized error handling and retry logic
  4. Extensibility: Easy to add new data types to the shared component
  5. Discovery: New workflow developers don't need to learn GraphQL

AI generated by Workflow Skill Extractor

  • expires on Feb 14, 2026, 12:05 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions