[refactoring] Extract "GitHub GraphQL Data Fetching with Caching" into shared component

### Skill Overview

Multiple workflows fetch GitHub data using `gh api graphql` with custom GraphQL queries for issues, pull requests, discussions, commits, and releases. This pattern is repeated across workflows with slight variations, leading to duplication and inconsistency.

**Why this should be shared**: The existing `issues-data-fetch.md` component only uses `gh issue list` (REST API), which is limited. Many workflows need the full power of GraphQL to fetch nested data (labels, comments, reviews) in a single query. A comprehensive shared component would provide standardized GraphQL queries with intelligent caching.

### Current Usage

This skill appears in the following workflows:

- [ ] `daily-news.md` (lines 96-242) - Fetches issues, PRs, discussions, commits, releases using GraphQL
- [ ] `copilot-pr-merged-report.md` - Fetches PR data with reviews and comments
- [ ] `weekly-issue-summary.md` - Fetches recent issues with labels and comments
- [ ] `daily-team-status.md` - Fetches team activity data
- [ ] `copilot-session-insights.md` - Uses GraphQL for session data
- [ ] `github-mcp-tools-report.md` - Fetches repository metadata
- [ ] `daily-observability-report.md` - Fetches workflow run data
- [ ] `org-health-report.md` - Fetches organization-level data
- [ ] Additional 5-7 workflows use similar patterns

**Total**: ~12-15 workflows use GraphQL queries

### Proposed Shared Component

**File**: `.github/workflows/shared/github-graphql-data-fetch.md`

**Configuration**:

``````yaml
---
tools:
  cache-memory:
    key: github-graphql-data
  bash:
    - "gh api *"
    - "jq *"
    - "mkdir *"
    - "date *"
    - "cp *"

steps:
  - name: Setup data directories and cache
    env:
      GITHUB_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
      GH_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
    run: |
      set -e
      
      # Create directories
      mkdir -p /tmp/gh-aw/github-data
      mkdir -p /tmp/gh-aw/cache-memory/github-graphql-data
      
      # Check cache validity (< 24 hours)
      TODAY=$(date '+%Y-%m-%d')
      CACHE_VALID=false
      CACHE_TIMESTAMP_FILE="/tmp/gh-aw/cache-memory/github-graphql-data/.timestamp"
      
      if [ -f "$CACHE_TIMESTAMP_FILE" ]; then
        CACHE_AGE=$(($(date +%s) - $(cat "$CACHE_TIMESTAMP_FILE")))
        if [ $CACHE_AGE -lt 86400 ]; then
          echo "✓ Found valid cached data (age: \$\{CACHE_AGE}s)"
          CACHE_VALID=true
        fi
      fi
      
      echo "cache_valid=$CACHE_VALID" >> "$GITHUB_OUTPUT"

  - name: Fetch issues with GraphQL
    if: steps.setup.outputs.cache_valid != 'true'
    env:
      GITHUB_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
      GH_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
    run: |
      set -e
      echo "Fetching issues data..."
      gh api graphql -f query="
        query(\$owner: String!, \$repo: String!) {
          repository(owner: \$owner, name: \$repo) {
            openIssues: issues(first: 100, states: OPEN, orderBy: {field: UPDATED_AT, direction: DESC}) {
              nodes {
                number
                title
                state
                createdAt
                updatedAt
                author { login }
                labels(first: 10) { nodes { name color } }
                comments { totalCount }
                body
              }
            }
            closedIssues: issues(first: 100, states: CLOSED, orderBy: {field: UPDATED_AT, direction: DESC}) {
              nodes {
                number
                title
                state
                createdAt
                updatedAt
                closedAt
                author { login }
                labels(first: 10) { nodes { name color } }
              }
            }
          }
        }
      " -f owner="\$\{GITHUB_REPOSITORY_OWNER}" -f repo="\$\{GITHUB_REPOSITORY#*/}" > /tmp/gh-aw/github-data/issues.json
      echo "✓ Issues data fetched"

  - name: Fetch pull requests with GraphQL
    if: steps.setup.outputs.cache_valid != 'true'
    env:
      GITHUB_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
      GH_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
    run: |
      set -e
      echo "Fetching pull requests..."
      gh api graphql -f query="
        query(\$owner: String!, \$repo: String!) {
          repository(owner: \$owner, name: \$repo) {
            openPRs: pullRequests(first: 50, states: OPEN, orderBy: {field: UPDATED_AT, direction: DESC}) {
              nodes {
                number
                title
                state
                createdAt
                updatedAt
                author { login }
                additions
                deletions
                changedFiles
                reviews(first: 10) { totalCount }
                labels(first: 10) { nodes { name color } }
              }
            }
            mergedPRs: pullRequests(first: 50, states: MERGED, orderBy: {field: UPDATED_AT, direction: DESC}) {
              nodes {
                number
                title
                state
                createdAt
                updatedAt
                mergedAt
                author { login }
                additions
                deletions
              }
            }
          }
        }
      " -f owner="\$\{GITHUB_REPOSITORY_OWNER}" -f repo="\$\{GITHUB_REPOSITORY#*/}" > /tmp/gh-aw/github-data/pull_requests.json
      echo "✓ Pull requests data fetched"

  - name: Fetch discussions with GraphQL
    if: steps.setup.outputs.cache_valid != 'true'
    env:
      GITHUB_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
      GH_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
    run: |
      set -e
      echo "Fetching discussions..."
      gh api graphql -f query="
        query(\$owner: String!, \$repo: String!) {
          repository(owner: \$owner, name: \$repo) {
            discussions(first: 50, orderBy: {field: UPDATED_AT, direction: DESC}) {
              nodes {
                number
                title
                createdAt
                updatedAt
                author { login }
                category { name }
                comments { totalCount }
                url
              }
            }
          }
        }
      " -f owner="\$\{GITHUB_REPOSITORY_OWNER}" -f repo="\$\{GITHUB_REPOSITORY#*/}" > /tmp/gh-aw/github-data/discussions.json
      echo "✓ Discussions data fetched"

  - name: Cache fetched data
    if: steps.setup.outputs.cache_valid != 'true'
    env:
      GITHUB_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
      GH_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}
    run: |
      set -e
      echo "Caching data for future runs..."
      cp -r /tmp/gh-aw/github-data/* /tmp/gh-aw/cache-memory/github-graphql-data/
      date +%s > "/tmp/gh-aw/cache-memory/github-graphql-data/.timestamp"
      echo "✓ Data cached"

  - name: Restore from cache
    if: steps.setup.outputs.cache_valid == 'true'
    run: |
      set -e
      echo "Restoring cached data..."
      cp -r /tmp/gh-aw/cache-memory/github-graphql-data/* /tmp/gh-aw/github-data/
      echo "✓ Cached data restored"
---

# GitHub GraphQL Data Fetch

Pre-fetched GitHub data is available at `/tmp/gh-aw/github-data/`:

- **`issues.json`**: Open and recently closed issues (last 100 each) with labels, comments, body
- **`pull_requests.json`**: Open and merged PRs (last 50 each) with reviews, labels, stats
- **`discussions.json`**: Recent discussions (last 50) with category, comments, URL

### Intelligent Caching

Data is cached for 24 hours to reduce API calls and improve performance. Multiple workflows running on the same day share the same cached data.

### Usage Examples

``````bash
# Count open issues
jq '.repository.openIssues.nodes | length' /tmp/gh-aw/github-data/issues.json

# Get PRs merged today
TODAY=$(date '+%Y-%m-%d')
jq --arg date "$TODAY" '.repository.mergedPRs.nodes | map(select(.mergedAt | startswith($date)))' /tmp/gh-aw/github-data/pull_requests.json

# Get most active discussions
jq '.repository.discussions.nodes | sort_by(.comments.totalCount) | reverse | .[0:5]' /tmp/gh-aw/github-data/discussions.json
``````
``````

**Usage Example**:

``````yaml
# In a workflow
imports:
  - shared/github-graphql-data-fetch.md

# Data is automatically available at /tmp/gh-aw/github-data/
# No need to write GraphQL queries or caching logic
``````

### Impact

- **Workflows affected**: 12-15 workflows
- **Lines saved**: ~150-200 lines per workflow = **1,800-3,000 total lines**
- **Maintenance benefit**:
  - Single location to update GraphQL queries
  - Consistent data structure across all workflows
  - Intelligent caching reduces API rate limit usage
  - Easier to add new data types (commits, releases, workflow runs)
  - Reduces cognitive load - developers don't need to write GraphQL

### Implementation Plan

1. [ ] Create shared component at `.github/workflows/shared/github-graphql-data-fetch.md`
2. [ ] Implement GraphQL queries for issues, PRs, discussions
3. [ ] Add intelligent caching with 24-hour expiry
4. [ ] Test with `daily-news.md` as proof-of-concept
5. [ ] Add additional queries (commits, releases, workflow runs) based on needs
6. [ ] Migrate 3-5 high-traffic workflows
7. [ ] Document query customization patterns for special cases
8. [ ] Gradually migrate remaining workflows

### Example Before/After

**Before** (daily-news.md, lines 96-242):
``````yaml
steps:
  - name: Setup directories and check cache
    # ... 95 lines of caching logic ...
  
  - name: Fetch issues data
    # ... 35 lines of GraphQL query ...
  
  - name: Fetch pull requests data
    # ... 48 lines of GraphQL query ...
  
  - name: Fetch discussions data
    # ... 24 lines of GraphQL query ...
  
  - name: Cache downloaded data
    # ... 10 lines of caching logic ...
``````

**After**:
``````yaml
imports:
  - shared/github-graphql-data-fetch.md

# All data available at /tmp/gh-aw/github-data/
# 212 lines replaced with 1 import!
``````

### Related Analysis

This recommendation comes from the Workflow Skill Extractor analysis run on 2026-02-11. This is the **highest impact** opportunity identified, saving ~1,800-3,000 lines across 12-15 workflows.

### Additional Benefits

1. **Consistency**: All workflows get the same data structure
2. **Performance**: 24-hour cache reduces API calls dramatically
3. **Reliability**: Centralized error handling and retry logic
4. **Extensibility**: Easy to add new data types to the shared component
5. **Discovery**: New workflow developers don't need to learn GraphQL




> AI generated by [Workflow Skill Extractor](https://github.com/github/gh-aw/actions/runs/21927904792)
> - [x] expires  on Feb 14, 2026, 12:05 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[refactoring] Extract "GitHub GraphQL Data Fetching with Caching" into shared component #15037

Skill Overview

Current Usage

Proposed Shared Component

Impact

Implementation Plan

Example Before/After

Related Analysis

Additional Benefits

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[refactoring] Extract "GitHub GraphQL Data Fetching with Caching" into shared component #15037

Description

Skill Overview

Current Usage

Proposed Shared Component

Impact

Implementation Plan

Example Before/After

Related Analysis

Additional Benefits

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions