Add SBOM-based dependency discovery to gpclean workflow#15028
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
@copilot use gh to make rest calls, make sure permissions are configure to query this value |
|
@copilot add this call to the steps in the front matter, make sure the permissions: contents: read is configured as well. (patch owner/repo) |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
... Added SBOM download as a frontmatter step. The step runs before the agent and downloads the SBOM to |
| -H "Accept: application/vnd.github+json" \ | ||
| -H "X-GitHub-Api-Version: 2022-11-28" \ | ||
| "/repos/${{ github.repository }}/dependency-graph/sbom" \ | ||
| > /tmp/sbom.json |
There was a problem hiding this comment.
Pull request overview
This pull request adds SBOM (Software Bill of Materials) based dependency discovery to the gpclean workflow to improve accuracy and efficiency when detecting GPL-licensed Go dependencies. The SBOM is pre-downloaded via GitHub's Dependency Graph API in a frontmatter step before the agent executes, providing immediate access to comprehensive dependency information.
Changes:
- Added frontmatter
steps:section to pre-download SBOM from GitHub Dependency Graph API to/tmp/sbom.json - Updated agent instructions to extract Go dependencies from SBOM instead of parsing go.mod directly
- Added documentation for SBOM usage, permissions, and error handling
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| .github/workflows/gpclean.md | Added frontmatter step to pre-download SBOM; updated Phase 0 instructions to use SBOM for dependency discovery; added SBOM usage documentation and updated examples |
| .github/workflows/gpclean.lock.yml | Compiled workflow with updated frontmatter-hash and new SBOM download step |
Comments suppressed due to low confidence (3)
.github/workflows/gpclean.md:93
- The SBOM is already pre-downloaded in the frontmatter steps section (lines 43-65) and saved to
/tmp/sbom.json. The agent instructions should tell the agent to USE the pre-downloaded SBOM file, not download it again. This duplicates effort and wastes tokens.
Following the pattern from daily-news.md (lines 292-303), this section should instead instruct the agent: "The SBOM has been pre-downloaded to /tmp/sbom.json to avoid excessive API calls. Use this file instead of downloading the SBOM yourself."
Remove the download instructions here and replace with instructions to parse the existing file at /tmp/sbom.json.
1. **Download SBOM from GitHub**:
```bash
# Download SBOM using gh CLI (requires contents: read permission)
gh api "repos/${{ github.repository }}/dependency-graph/sbom" \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
> /tmp/sbom.json
**.github/workflows/gpclean.md:349**
* This documentation contradicts the workflow design. The SBOM is already pre-downloaded in the frontmatter steps section (lines 43-65), so the agent should NOT download it again. This instruction should be updated to reflect that the SBOM is pre-fetched and available at `/tmp/sbom.json`.
Change to: "The SBOM has been pre-downloaded to `/tmp/sbom.json` - use this file to get the latest dependency information. If the file doesn't exist or is invalid, fall back to parsing `go.mod` directly."
- Download SBOM first at the beginning of each run to get the latest dependency information
**.github/workflows/gpclean.md:415**
* This example incorrectly includes "Download SBOM" as part of Run 1. The SBOM is pre-downloaded in the frontmatter steps before the agent runs, so it should not appear in the run examples. All runs should start with "Extract Go dependencies" or directly with module checking.
Change to: "**Run 1**: Extract Go dependencies from pre-downloaded SBOM → Check `github.com/spf13/cobra` → No GPL found → Add to checked_modules"
Run 1: Download SBOM → Extract Go dependencies → Check github.com/spf13/cobra → No GPL found → Add to checked_modules
</details>
---
💡 <a href="/github/gh-aw/new/main/.github/instructions?filename=*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.
| # Download SBOM using gh CLI (requires contents: read permission) | ||
| gh api \ | ||
| -H "Accept: application/vnd.github+json" \ | ||
| -H "X-GitHub-Api-Version: 2022-11-28" \ | ||
| "/repos/${{ github.repository }}/dependency-graph/sbom" \ | ||
| > /tmp/sbom.json | ||
|
|
||
| echo "✅ SBOM downloaded successfully to /tmp/sbom.json" | ||
|
|
||
| # Show SBOM summary | ||
| if command -v jq &> /dev/null; then | ||
| PACKAGE_COUNT=$(jq '.sbom.packages | length' /tmp/sbom.json 2>/dev/null || echo "unknown") | ||
| echo "📊 SBOM contains ${PACKAGE_COUNT} packages" |
There was a problem hiding this comment.
The step uses set -e which will cause the entire workflow to fail if the SBOM download fails. However, the workflow documentation states "If SBOM download fails, fall back to parsing go.mod directly" (lines 355, 407).
To enable the fallback strategy, the step should handle errors gracefully. Either:
- Remove
set -eand add explicit error checking with conditional output, OR - Add error handling within the step to continue on failure (e.g.,
gh api ... > /tmp/sbom.json || echo '{"sbom":{"packages":[]}}' > /tmp/sbom.json)
The agent should then check if the SBOM is valid/non-empty before using it.
| # Download SBOM using gh CLI (requires contents: read permission) | |
| gh api \ | |
| -H "Accept: application/vnd.github+json" \ | |
| -H "X-GitHub-Api-Version: 2022-11-28" \ | |
| "/repos/${{ github.repository }}/dependency-graph/sbom" \ | |
| > /tmp/sbom.json | |
| echo "✅ SBOM downloaded successfully to /tmp/sbom.json" | |
| # Show SBOM summary | |
| if command -v jq &> /dev/null; then | |
| PACKAGE_COUNT=$(jq '.sbom.packages | length' /tmp/sbom.json 2>/dev/null || echo "unknown") | |
| echo "📊 SBOM contains ${PACKAGE_COUNT} packages" | |
| # Download SBOM using gh CLI (requires contents: read permission). | |
| # If download fails, write an empty SBOM and allow downstream fallback | |
| # (e.g., parsing go.mod directly) to be used instead. | |
| if ! gh api \ | |
| -H "Accept: application/vnd.github+json" \ | |
| -H "X-GitHub-Api-Version: 2022-11-28" \ | |
| "/repos/${{ github.repository }}/dependency-graph/sbom" \ | |
| > /tmp/sbom.json; then | |
| echo "⚠️ Failed to download SBOM from GitHub Dependency Graph API; falling back to go.mod parsing." | |
| echo '{"sbom":{"packages":[]}}' > /tmp/sbom.json | |
| else | |
| echo "✅ SBOM downloaded successfully to /tmp/sbom.json" | |
| fi | |
| # Show SBOM summary if jq is available and the SBOM is valid/non-empty | |
| if command -v jq &> /dev/null; then | |
| if jq -e '.sbom.packages' /tmp/sbom.json > /dev/null 2>&1; then | |
| PACKAGE_COUNT=$(jq '.sbom.packages | length' /tmp/sbom.json 2>/dev/null || echo "unknown") | |
| echo "📊 SBOM contains ${PACKAGE_COUNT} packages" | |
| else | |
| echo "⚠️ SBOM is invalid or empty; dependency analysis will rely on go.mod parsing if configured." | |
| fi |
| - Look for packages in `sbom.packages[]` array | ||
| - Filter for Go packages (those with `purl` starting with `pkg:golang/`) | ||
| - Extract the package names (module paths) from the `purl` field | ||
| - Focus on direct dependencies (not dev dependencies or build tools) |
There was a problem hiding this comment.
The instruction to "Focus on direct dependencies (not dev dependencies or build tools)" may be misleading. The GitHub Dependency Graph SBOM includes all dependencies (both direct and transitive), and there's no standardized way to distinguish direct from transitive dependencies in the SPDX SBOM format using just the purl field.
The instruction should clarify how to identify direct dependencies. Options:
- Use the SBOM's relationship data if available (check
sbom.packages[].dependenciesor relationship fields) - Cross-reference with
go.modto identify direct dependencies - Remove this filtering requirement and process all Go dependencies from the SBOM
Additionally, Go doesn't have the concept of "dev dependencies" like npm does, so that part of the instruction should be removed or clarified.
| - Focus on direct dependencies (not dev dependencies or build tools) | |
| - Include all Go dependencies listed in the SBOM. The SBOM may contain both direct and transitive dependencies; if you need to restrict to direct dependencies, use SBOM relationship data (e.g., `dependencies` fields) or cross-reference with `go.mod`. |
| ## Your Tasks | ||
|
|
||
| ### Phase 0: Round-Robin Module Selection | ||
| ### Phase 0: Download SBOM and Round-Robin Module Selection |
There was a problem hiding this comment.
The phase title "Download SBOM and Round-Robin Module Selection" is misleading because the SBOM is pre-downloaded in the frontmatter steps section (lines 43-65) before the agent runs. The title should be updated to reflect that the agent uses the pre-downloaded SBOM rather than downloading it.
Suggested title: "Phase 0: Extract Dependencies from SBOM and Round-Robin Module Selection" or "Phase 0: Parse Pre-Downloaded SBOM and Round-Robin Module Selection"
This issue also appears in the following locations of the same file:
- line 86
- line 349
- line 415
See below for a potential fix:
### Phase 0: Extract Dependencies from SBOM and Round-Robin Module Selection
Use the repository's SBOM (Software Bill of Materials) to get accurate dependency information, then select one module to analyze in a round-robin fashion.
1. **Use pre-downloaded SBOM**:
The workflow frontmatter has already downloaded the SBOM from the GitHub Dependency Graph API
and saved it to `/tmp/sbom.json` before this agent runs. Assume this file exists and use it as
the source of truth for dependency information; do not attempt to re-download the SBOM here.
**Note**: The pre-download step relies on the workflow's `contents: read` permission, which is
required to access the dependency graph SBOM API.
| 1. **Download SBOM from GitHub**: | ||
| ```bash | ||
| # Download SBOM using gh CLI (requires contents: read permission) | ||
| gh api "repos/${{ github.repository }}/dependency-graph/sbom" \ |
There was a problem hiding this comment.
Inconsistency in the API endpoint path: the frontmatter step uses /repos/${{ github.repository }}/dependency-graph/sbom (line 56) with a leading slash, but the agent instructions use repos/${{ github.repository }}/dependency-graph/sbom (line 89) without a leading slash.
While both formats may work with gh api, they should be consistent. The leading slash format is more explicit and aligns with REST API conventions. Update line 89 to include the leading slash for consistency.
| gh api "repos/${{ github.repository }}/dependency-graph/sbom" \ | |
| gh api "/repos/${{ github.repository }}/dependency-graph/sbom" \ |
The gpclean workflow now uses GitHub's Dependency Graph SBOM API as the primary source for discovering Go dependencies to analyze for GPL licenses.
Changes
gh api "/repos/{owner}/{repo}/dependency-graph/sbom"gh apiinstead ofcurl: Follows GitHub workflow best practices for API calls with automatic authenticationcontents: readpermission required for dependency graph access/tmp/sbom.jsonbefore agent execution for immediate availabilitypackages[]array, filters forpkg:golang/*purls, extracts module pathsBenefits
/tmp/sbom.jsonwhen agent startsgh apifor cleaner, more maintainable workflow codeImplementation
Frontmatter Step (Pre-execution):
Agent Instructions (Phase 0):
purlstarts withpkg:golang//tmp/go-dependencies.txtThe workflow maintains round-robin module selection via cache-memory, now using SBOM-derived dependency list. The
gh apicommand automatically handles authentication viaGITHUB_TOKEN, and the existingcontents: readpermission enables access to the dependency graph API.✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.