diff --git a/.github/workflows/gpclean.lock.yml b/.github/workflows/gpclean.lock.yml index d7df0053b3..3746fd4594 100644 --- a/.github/workflows/gpclean.lock.yml +++ b/.github/workflows/gpclean.lock.yml @@ -25,7 +25,7 @@ # Imports: # - shared/mood.md # -# frontmatter-hash: aafe3ec8d8d71f5ebd296fb924a841a88c013cfe6de4d129ca25854e409717d5 +# frontmatter-hash: 5bd33f57a833417278fc0221fb727e25d1c8c976e8ebf421df3bf6f7bf084fd6 name: "GPL Dependency Cleaner (gpclean)" "on": @@ -113,6 +113,12 @@ jobs: persist-credentials: false - name: Create gh-aw temp directory run: bash /opt/gh-aw/actions/create_gh_aw_tmp_dir.sh + - env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + name: Download SBOM from GitHub Dependency Graph API + run: "set -e\necho \"📦 Downloading SBOM from GitHub Dependency Graph API...\"\n\n# Download SBOM using gh CLI (requires contents: read permission)\ngh api \\\n -H \"Accept: application/vnd.github+json\" \\\n -H \"X-GitHub-Api-Version: 2022-11-28\" \\\n \"/repos/${{ github.repository }}/dependency-graph/sbom\" \\\n > /tmp/sbom.json\n\necho \"✅ SBOM downloaded successfully to /tmp/sbom.json\"\n\n# Show SBOM summary\nif command -v jq &> /dev/null; then\n PACKAGE_COUNT=$(jq '.sbom.packages | length' /tmp/sbom.json 2>/dev/null || echo \"unknown\")\n echo \"📊 SBOM contains ${PACKAGE_COUNT} packages\"\nfi" + # Cache memory file share configuration from frontmatter processed below - name: Create cache-memory directory run: bash /opt/gh-aw/actions/create_cache_memory_dir.sh diff --git a/.github/workflows/gpclean.md b/.github/workflows/gpclean.md index 0b9353cf32..b0dd2a4615 100644 --- a/.github/workflows/gpclean.md +++ b/.github/workflows/gpclean.md @@ -38,6 +38,31 @@ strict: true imports: - shared/mood.md + +# Pre-download SBOM to get accurate dependency information +steps: + - name: Download SBOM from GitHub Dependency Graph API + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + run: | + set -e + echo "📦 Downloading SBOM from GitHub Dependency Graph API..." + + # Download SBOM using gh CLI (requires contents: read permission) + gh api \ + -H "Accept: application/vnd.github+json" \ + -H "X-GitHub-Api-Version: 2022-11-28" \ + "/repos/${{ github.repository }}/dependency-graph/sbom" \ + > /tmp/sbom.json + + echo "✅ SBOM downloaded successfully to /tmp/sbom.json" + + # Show SBOM summary + if command -v jq &> /dev/null; then + PACKAGE_COUNT=$(jq '.sbom.packages | length' /tmp/sbom.json 2>/dev/null || echo "unknown") + echo "📊 SBOM contains ${PACKAGE_COUNT} packages" + fi --- # GPL Dependency Cleaner (gpclean) @@ -49,28 +74,45 @@ Systematically detect Go dependencies that introduce non-MIT friendly (GPL-type) ## Current Context - **Repository**: ${{ github.repository }} - **Go Module File**: `go.mod` in repository root +- **SBOM Source**: GitHub Dependency Graph API (SPDX format) - **Cache Memory**: `/tmp/gh-aw/cache-memory/gpclean/` for round-robin module tracking ## Your Tasks -### Phase 0: Round-Robin Module Selection +### Phase 0: Download SBOM and Round-Robin Module Selection + +Use the repository's SBOM (Software Bill of Materials) to get accurate dependency information, then select one module to analyze in a round-robin fashion. -Use cache-memory to ensure we focus on one dependency per run in a systematic round-robin fashion. +1. **Download SBOM from GitHub**: + ```bash + # Download SBOM using gh CLI (requires contents: read permission) + gh api "repos/${{ github.repository }}/dependency-graph/sbom" \ + -H "Accept: application/vnd.github+json" \ + -H "X-GitHub-Api-Version: 2022-11-28" \ + > /tmp/sbom.json + ``` + + **Note**: The workflow already has `contents: read` permission which is required to access the dependency graph SBOM API. -1. **Load tracking state** from `/tmp/gh-aw/cache-memory/gpclean/state.json`: +2. **Extract dependencies from SBOM**: + - Parse the SBOM JSON file (SPDX format) + - Look for packages in `sbom.packages[]` array + - Filter for Go packages (those with `purl` starting with `pkg:golang/`) + - Extract the package names (module paths) from the `purl` field + - Focus on direct dependencies (not dev dependencies or build tools) + - Save the list of dependencies to `/tmp/go-dependencies.txt` + +3. **Load tracking state** from `/tmp/gh-aw/cache-memory/gpclean/state.json`: - If file doesn't exist, create it with initial state: `{"last_checked_module": "", "checked_modules": []}` - State tracks which modules have been checked recently -2. **Get all direct dependencies** from `go.mod`: - - Extract all `require` statements (excluding indirect dependencies for now) - - Create list of direct dependency module paths - -3. **Select next module to check**: +4. **Select next module to check**: + - Use the dependencies list from SBOM (`/tmp/go-dependencies.txt`) - Find the first module NOT in `checked_modules` list - If all modules have been checked, reset `checked_modules` to empty array and start over - Update state with selected module and save to cache-memory -4. **Focused analysis**: Analyze only the selected module and its transitive dependencies in this run +5. **Focused analysis**: Analyze only the selected module and its transitive dependencies in this run ### Phase 1: License Detection for Selected Module @@ -302,6 +344,16 @@ After creating the issue: ## Important Guidelines +### SBOM Usage + +- **Download SBOM first** at the beginning of each run to get the latest dependency information +- **Use `gh api`** to download SBOM - the workflow has `contents: read` permission which is required for the dependency graph API +- SBOM is in SPDX format with packages listed in `sbom.packages[]` array +- Go packages have `purl` (Package URL) in format: `pkg:golang/github.com/org/repo@version` +- Parse the SBOM to extract all Go dependencies before license checking +- SBOM provides a comprehensive view including transitive dependencies +- If SBOM download fails, fall back to parsing `go.mod` directly + ### Focus on One Dependency - **Only analyze ONE direct dependency per run** (round-robin via cache-memory) @@ -352,6 +404,7 @@ After creating the issue: ## Error Handling +- If SBOM download fails, fall back to parsing `go.mod` directly to extract dependencies - If `go mod graph` fails, report the error and exit - If license detection fails for a module, document it in the issue and recommend manual review - If no direct dependencies exist, exit successfully @@ -359,10 +412,10 @@ After creating the issue: ## Example Module Selection Flow -**Run 1**: Check `github.com/spf13/cobra` → No GPL found → Add to checked_modules -**Run 2**: Check `github.com/spf13/viper` → No GPL found → Add to checked_modules -**Run 3**: Check `github.com/cli/go-gh` → GPL found in transitive dep → Create issue, add to checked_modules -**Run 4**: Check `gopkg.in/yaml.v3` → No GPL found → Add to checked_modules -**Run 5**: All modules checked → Reset checked_modules, start from beginning +**Run 1**: Download SBOM → Extract Go dependencies → Check `github.com/spf13/cobra` → No GPL found → Add to checked_modules +**Run 2**: Check `github.com/spf13/viper` (from SBOM) → No GPL found → Add to checked_modules +**Run 3**: Check `github.com/cli/go-gh` (from SBOM) → GPL found in transitive dep → Create issue, add to checked_modules +**Run 4**: Check `gopkg.in/yaml.v3` (from SBOM) → No GPL found → Add to checked_modules +**Run 5**: All modules from SBOM checked → Reset checked_modules, start from beginning This ensures systematic coverage without duplicate work.