-
Notifications
You must be signed in to change notification settings - Fork 36
Fix GitHub domain redaction in agent log ingestion #3145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add GITHUB_SERVER_URL and GITHUB_API_URL environment variables to sanitization step - Update sanitize.cjs to extract and allow GitHub domains from environment - Add extractDomainsFromUrl helper function to parse GitHub URLs - Update tests to handle GitHub environment variables properly - Ensure GitHub domains are always allowed in addition to configured domains Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
@copilot also support raw content domains |
- Update extractDomainsFromUrl to include raw.githubusercontent.com for github.com - Add raw.hostname variation for GitHub Enterprise domains - Add wildcard *.githubusercontent.com support - Add test for raw.githubusercontent.com support - Update existing test to verify raw content domains are allowed Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Added raw content domain support in commit 0d86a5b. The For GitHub.com:
For GitHub Enterprise:
Raw content URLs are now automatically allowed based on the |
|
@copilot merge main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the URL sanitization system to automatically allow GitHub domains from environment variables, enabling better support for GitHub Enterprise deployments with custom domains.
- Adds
extractDomainsFromUrl()function to extract domain variations from GitHub server and API URLs - Updates sanitization logic to dynamically include GitHub domains from
GITHUB_SERVER_URLandGITHUB_API_URLenvironment variables - Adds comprehensive test coverage for custom domain handling and environment variable cleanup
Reviewed Changes
Copilot reviewed 62 out of 62 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/workflow/js/sanitize.cjs | Adds domain extraction function and dynamic GitHub domain detection |
| pkg/workflow/js/sanitize_output.test.cjs | Adds tests for environment variable handling and domain extraction |
| pkg/workflow/js/collect_ndjson_output.test.cjs | Updates test to properly clean up environment variables |
| pkg/workflow/compiler_yaml.go | Adds GITHUB_SERVER_URL and GITHUB_API_URL to workflow environment |
| .github/workflows/*.lock.yml | Generated workflow files with updated sanitization logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…og-ingestion-phase # Conflicts: # .github/workflows/smoke-copilot.firewall.lock.yml # .github/workflows/smoke-opencode.lock.yml
Fix GitHub Domain Redaction in Agent Log Ingestion ✅
Problem
GitHub domains were being redacted during the agent log ingestion phase. The sanitization logic in
sanitize.cjsonly allowed domains from a static allowlist, which didn't account for:api.github.comfor GitHub.com, but different for GitHub Enterprise)raw.githubusercontent.comand variations)This caused legitimate GitHub URLs to be replaced with
(redacted)in agent logs, making debugging and analysis difficult.Solution ✅
sanitize.cjsto extract and allow domains from GitHub context URLsImplementation Details
1. Compiler Changes (
pkg/workflow/compiler_yaml.go)Added two new environment variables to the "Ingest agent output" step:
These variables provide the current GitHub deployment's server and API URLs to the sanitization JavaScript code.
2. JavaScript Sanitization Updates (
pkg/workflow/js/sanitize.cjs)New
extractDomainsFromUrl()function:github.com:api.github.com(API endpoint)raw.githubusercontent.com(raw content)*.githubusercontent.com(wildcard for all githubusercontent subdomains)api.prefix (e.g.,api.github.example.com)raw.prefix (e.g.,raw.github.example.com)Updated
sanitizeContent()function:GITHUB_SERVER_URLandGITHUB_API_URLfrom environmentextractDomainsFromUrl()Setfor efficient deduplication3. Test Updates
Modified tests (to handle GitHub environment variables in test environment):
sanitize_output.test.cjs- "should respect custom allowed domains from environment"sanitize_output.test.cjs- "should handle empty environment variable gracefully"collect_ndjson_output.test.cjs- "should handle custom allowed domains from environment"New tests:
sanitize_output.test.cjs- "should allow GitHub domains from environment variables"sanitize_output.test.cjs- "should allow raw.githubusercontent.com for github.com"raw.githubusercontent.comsupportAll tests properly save/restore environment variables to prevent cross-test interference.
How It Works
The sanitization process now follows these steps:
GH_AW_ALLOWED_DOMAINS(from workflow network config)GITHUB_SERVER_URL(e.g.,github.com→[github.com, api.github.com, raw.githubusercontent.com, *.githubusercontent.com])GITHUB_API_URL(e.g.,api.github.com)Examples
GitHub.com Environment
GitHub Enterprise Environment
Verification ✅
Impact
This fix ensures that GitHub URLs (including raw content URLs) are never redacted in agent logs, regardless of:
GitHub domains are now always allowed in addition to any configured allowed domains, making log analysis and debugging significantly easier.
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.